10
Deletion-Robust Submodular Maximization: Data Summarization with ”the Right to be Forgotten” Baharan Mirzasoleiman 1 Amin Karbasi 2 Andreas Krause 1 Abstract How can we summarize a dynamic data stream when elements selected for the summary can be deleted at any time? This is an important chal- lenge in online services, where the users gener- ating the data may decide to exercise their right to restrict the service provider from using (part of) their data due to privacy concerns. Moti- vated by this challenge, we introduce the dy- namic deletion-robust submodular maximization problem. We develop the first resilient streaming algorithm, called ROBUST-STREAMING, with a constant factor approximation guarantee to the optimum solution. We evaluate the effectiveness of our approach on several real-world applica- tions, including summarizing (1) streams of geo- coordinates (2); streams of images; and (3) click- stream log data, consisting of 45 million feature vectors from a news recommendation task. 1. Introduction Streams of data of massive and increasing volume are gen- erated every second, and demand fast analysis and efficient storage, including massive clickstreams, stock market data, image and video streams, sensor data for environmental or health monitoring, to name a few. To make efficient and re- liable decisions we usually need to react in real-time to the data. However, big and fast data makes it difficult to store, analyze, or make predictions. Therefore, data summariza- tion – mining and extracting useful information from large data sets – has become a central topic in machine learning and information retrieval. A recent body of research on data summarization relies on utility/scoring functions that are submodular. Intuitively, submodularity (Krause & Golovin, 2013) states that select- 1 ETH Zurich, Switzerland 2 Yale University, New Haven, USA. Correspondence to: Baharan Mirzasoleiman <baha- [email protected]>. Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s). ing any given data point earlier helps more than selecting it later. Hence, submodular functions can score both diversity and representativeness of a subset w.r.t. the entire dataset. Thus, many problems in data summarization require max- imizing submodular set functions subject to cardinality (or more complicated hereditary constraints). Numer- ous examples include exemplar-based clustering (Dueck & Frey, 2007), document (Lin & Bilmes, 2011) and corpus summarization (Sipos et al., 2012), recommender systems (El-Arini & Guestrin, 2011), search result diversification (Rakesh Agrawal, 2009), data subset selection (Wei et al., 2015), and social networks analysis (Kempe et al., 2003). Classical methods, such as the celebrated greedy algorithm (Nemhauser et al., 1978) or its accelerated versions (Mirza- soleiman et al., 2015; Badanidiyuru & Vondr´ ak, 2014) require random access to the entire data, make multiple passes, and select elements sequentially in order to pro- duce near optimal solutions. Naturally, such solutions can- not scale to large instances. The limitations of central- ized methods inspired the design of streaming algorithms that are able to gain insights from data as it is being col- lected (Badanidiyuru et al., 2014; Chakrabarti & Kale, 2014; Chekuri et al., 2015; Mirzasoleiman et al., 2017). While extracting useful information from big data in real- time promises many benefits, the development of more sophisticated methods for extracting, analyzing and using personal information has made privacy a major public is- sue. Various web services rely on the collection and com- bination of data about individuals from a wide variety of sources. At the same time, the ability to control the in- formation an individual can reveal about herself in online applications has become a growing concern. The “right to be forgotten” (with a specific mandate for protection in the European Data Protection Regulation (2012), and concrete guidelines released in 2014) allows individuals to claim the ownership of their personal infor- mation and gives them the authority to their online activ- ities (videos, photos, tweets, etc). As an example, con- sider a road traffic information system that monitors traffic speeds, travel times and incidents in real time. It combines the massive amount of control messages available at the cellular network with their geo-coordinates in order to gen-

New Deletion-Robust Submodular Maximization: Data …baharanm/papers/mirzasoleiman17... · 2019. 8. 15. · The “right to be forgotten” (with a specific mandate for protection

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Deletion-Robust Submodular Maximization:Data Summarization with ”the Right to be Forgotten”

    Baharan Mirzasoleiman 1 Amin Karbasi 2 Andreas Krause 1

    AbstractHow can we summarize a dynamic data streamwhen elements selected for the summary can bedeleted at any time? This is an important chal-lenge in online services, where the users gener-ating the data may decide to exercise their rightto restrict the service provider from using (partof) their data due to privacy concerns. Moti-vated by this challenge, we introduce the dy-namic deletion-robust submodular maximizationproblem. We develop the first resilient streamingalgorithm, called ROBUST-STREAMING, with aconstant factor approximation guarantee to theoptimum solution. We evaluate the effectivenessof our approach on several real-world applica-tions, including summarizing (1) streams of geo-coordinates (2); streams of images; and (3) click-stream log data, consisting of 45 million featurevectors from a news recommendation task.

    1. IntroductionStreams of data of massive and increasing volume are gen-erated every second, and demand fast analysis and efficientstorage, including massive clickstreams, stock market data,image and video streams, sensor data for environmental orhealth monitoring, to name a few. To make efficient and re-liable decisions we usually need to react in real-time to thedata. However, big and fast data makes it difficult to store,analyze, or make predictions. Therefore, data summariza-tion – mining and extracting useful information from largedata sets – has become a central topic in machine learningand information retrieval.

    A recent body of research on data summarization relies onutility/scoring functions that are submodular. Intuitively,submodularity (Krause & Golovin, 2013) states that select-

    1ETH Zurich, Switzerland 2Yale University, New Haven,USA. Correspondence to: Baharan Mirzasoleiman .

    Proceedings of the 34 th International Conference on MachineLearning, Sydney, Australia, PMLR 70, 2017. Copyright 2017by the author(s).

    ing any given data point earlier helps more than selecting itlater. Hence, submodular functions can score both diversityand representativeness of a subset w.r.t. the entire dataset.Thus, many problems in data summarization require max-imizing submodular set functions subject to cardinality(or more complicated hereditary constraints). Numer-ous examples include exemplar-based clustering (Dueck &Frey, 2007), document (Lin & Bilmes, 2011) and corpussummarization (Sipos et al., 2012), recommender systems(El-Arini & Guestrin, 2011), search result diversification(Rakesh Agrawal, 2009), data subset selection (Wei et al.,2015), and social networks analysis (Kempe et al., 2003).

    Classical methods, such as the celebrated greedy algorithm(Nemhauser et al., 1978) or its accelerated versions (Mirza-soleiman et al., 2015; Badanidiyuru & Vondrák, 2014)require random access to the entire data, make multiplepasses, and select elements sequentially in order to pro-duce near optimal solutions. Naturally, such solutions can-not scale to large instances. The limitations of central-ized methods inspired the design of streaming algorithmsthat are able to gain insights from data as it is being col-lected (Badanidiyuru et al., 2014; Chakrabarti & Kale,2014; Chekuri et al., 2015; Mirzasoleiman et al., 2017).

    While extracting useful information from big data in real-time promises many benefits, the development of moresophisticated methods for extracting, analyzing and usingpersonal information has made privacy a major public is-sue. Various web services rely on the collection and com-bination of data about individuals from a wide variety ofsources. At the same time, the ability to control the in-formation an individual can reveal about herself in onlineapplications has become a growing concern.

    The “right to be forgotten” (with a specific mandate forprotection in the European Data Protection Regulation(2012), and concrete guidelines released in 2014) allowsindividuals to claim the ownership of their personal infor-mation and gives them the authority to their online activ-ities (videos, photos, tweets, etc). As an example, con-sider a road traffic information system that monitors trafficspeeds, travel times and incidents in real time. It combinesthe massive amount of control messages available at thecellular network with their geo-coordinates in order to gen-

  • Deletion-Robust Submodular Maximization

    erate the area-wide traffic information service. Some con-sumers, while using the service and providing data, maynot be willing to share information about specific locationsin order to protect their own privacy. With the right to beforgotten, an individual can have certain data deleted fromonline database records so that third parties (e.g., search en-gines) can no longer trace them (Weber, 2011). Note thatthe data could be in many forms, including a) user’s poststo an online social media, b) visual data shared by wear-able cameras (e.g., Google Glass), c) behavioral patterns orfeedback obtained from clicking on advertisement or news.

    In this paper, we propose the first framework that offers in-stantaneous data summarization while preserving the rightof an individual to be forgotten. We cast this problem asan instance of robust streaming submodular maximizationwhere the goal is to produce a concise real-time summaryin the face of data deletion requested by users. We developROBUST-STREAMING, a method that for a generic stream-ing algorithm STREAMINGALG with approximation guar-antee α, ROBUST-STREAMING outputs a robust solution,against any m deletions from the summary at any giventime, while preserving the same approximation guarantee.To the best of our knowledge, ROBUST-STREAMING is thefirst algorithm with such strong theoretical guarantees. Ourexperimental results also demonstrate the effectiveness ofROBUST-STREAMING on several practical applications.

    2. Background and Related WorkSeveral streaming algorithms for submodular maximiza-tion have been recently developed. For monotone func-tions, Gomes & Krause (2010) first developed a multi-pass algorithm with 1/2−� approximation guarantee sub-ject to a cardinality constraint k, using O(k) memory, un-der strong assumptions on the way the data is generated.Later, Badanidiyuru et al. (2014) proposed the first sin-gle pass streaming algorithm with 1/2 − � approximationunder a cardinality constraint. They made no assump-tions on the order of receiving data points, and only re-quire O(k log k/�) memory. Following the same line of in-quiry, Chakrabarti & Kale (2014) developed a single passalgorithm with 1/4p approximation guarantee for handlingmore general constraints such as intersections of p ma-troids. The required memory is unbounded and increasespolylogarithmically with the size of the data. For generalsubmodular functions, Chekuri et al. (2015) presented arandomized algorithm subject to a broader range of con-straints, namely p-matchoids. Their method gives a (2 −o(1))/(8+e)p approximation usingO(k log k/�2) memory(k is the size of the largest feasible solution). Very recently,Mirzasoleiman et al. (2017) introduced a (4p−1)/4p(8p+2d − 1)-approximation algorithm under a p-system and dknapsack constraints, using O(pk log2(k)/�2) memory.

    An important requirement, which frequently arises in prac-tice, is robustness. Krause et al. (2008) proposed theproblem of robust submodular observation selection, wherewe want to solve max|A|≤k mini∈[`] fi(A), for normal-ized monotonic fi. Submodular maximization of f robustagainst m deletions can be cast as an instance of the aboveproblem: max|A|≤k min|B|≤m f(A\B). The running time,however, will be exponential in m. Recently, Orlin et al.(2016) developed an algorithm with an asymptotic guar-antee 0.285 for deletion-robust submodular maximizationunder up to m = o(

    √k) deletions. The results can be im-

    proved for only 1 or 2 deletions.

    The aforementioned approaches aim to construct solutionsthat are robust against deletions in a batch mode way, with-out being able to update the solution set after each deletion.To the best of our knowledge, this is the first to address thegeneral deletion-robust submodular maximization problemin the streaming setting. We also highlight the fact that ourmethod does not require m, the number of deletions, to bebounded by k, the size of the largest feasible solution.

    Very recently, submodular optimization over sliding win-dows has been considered, where we want to maintain asolution that considers only the last W items (Epasto et al.,2017; Jiecao et al., 2017). This is in contrast to our setting,where the guarantee is with respect to all the elements re-ceived from the stream, except those that have been deleted.The sliding window model can be easily incorporated intoour solution to get a robust sliding window streaming algo-rithm with the possibility of m deletions in the window.

    3. Deletion-Robust ModelWe review the static submodular data summarization prob-lem. We then formalize a novel dynamic variant, and con-straints on time and memory that algorithms need to obey.

    3.1. Static Submodular Data Summarization

    In static data summarization, we have a large but fixeddataset V of size n, and we are interested in finding asummary that best represents the data. The representa-tiveness of a subset is defined based on a utility func-tion f : 2V → R+ where for any A ⊂ V the functionf(A) quantifies how well A represents V . We define themarginal gain of adding an element e ∈ V to a summaryA ⊂ V by ∆(e|A) = f(A ∪ {e}) − f(A). In many datasummarization applications, the utility function f satisfiessubmodularity, i.e., for all A ⊆ B ⊆ V and e ∈ V \B,

    ∆(e|A) ≥ ∆(e|B).

    Many data summarization applications can be cast as aninstance of a constrained submodular maximization:

    OPT = maxA∈I

    f(A), (1)

  • Deletion-Robust Submodular Maximization

    where I ⊂ 2V is a given family of feasible solutions.We will denote by A∗ the optimal solution, i.e. A∗ =arg maxA∈I f(A). A common type of constraint is a car-dinality constraint, i.e., I = {A ⊆ 2V , s.t., |A| ≤ k}.Finding A∗ even under cardinality constraint is NP-hard,for many classes of submodular functions (Feige, 1998).However, a seminal result by Nemhauser et al. (1978) statesthat for a non-negative and monotone submodular functiona simple greedy algorithm that starts with the empty setS0 = ∅, and at each iteration augments the solution with theelement with highest marginal gain, obtains a (1−1/e) ap-proximation to the optimum solution. For small, static data,the centralized greedy algorithm or its accelerated variantsproduce near-optimal solutions. However, such methodsfail to scale to truly large problems.

    3.2. Dynamic Data: Additions and Deletions

    In dynamic deletion-robust submodular maximizationproblem, the data V is generated at a fast pace and in real-time, such that at any point t in time, a subset Vt ⊆ V ofthe data has arrived. Naturally, we assume that V1 ⊆ V2 ⊆· · · ⊆ Vn, with no assumption made on the order or the sizeof the datastream. Importantly, we allow data to be deleteddynamically as well. We use Dt to refer to data deleted bytime t, where again D1 ⊆ D2 ⊆ · · · ⊆ Dn. Without lossof generality, below we assume that at every time step t ex-actly one element et ∈ V is either added or deleted, i.e.,|Dt \ Dt−1| + |Vt \ Vt−1| = 1. We now seek to solve adynamic variant of Problem (1)OPTt = max

    At∈Itf(At) s.t. It = {A : A ∈ I∧A ⊆ Vt \Dt}.

    (2)Note that in general a feasible solution at time t might notbe a feasible solution at a later time t′. This is particularlyimportant in practical situations where a subset of the ele-ments Dt should be removed from the solution. We do notmake any assumptions on the order or the size of the datastream V , but we assume that the total number of deletionsis limited to m , i.e., |Dn| ≤ m.

    3.3. Dealing with Limited Time and Memory

    In principle, we could solve Problem (2) by repeatedly – atevery time t – solving a static Problem (1) by restricting theground set V to Vt \Dt. This is impractical even for mod-erate problem sizes. For large problems, we may not evenbe able to fit Vt into the main memory of the computingdevice (space constraints). Moreover, in real-time appli-cations, one needs to make decisions in a timely mannerwhile the data is continuously arriving (time constraints).

    We hence focus on streaming algorithms which may main-tain a limited memory Mt ⊂ Vt \ Dt, and must have anupdated feasible solution {At |At ⊆Mt, At ∈ It} to out-put at any given time t. Ideally, the capacity of the memory

    |Mt| should not depend on t and Vt. Whenever a new el-ement is received, the algorithm can choose 1) to insert itinto its memory, provided that the memory does not ex-ceed a pre-specified capacity bound, 2) to replace it withone or a subset of elements in the memory (in the preemp-tive model), or otherwise 3) the element gets discarded andcannot be used later by the algorithm. If the algorithm re-ceives a deletion request for a subset Dt ⊂ Vt at time t(in which case It will be updated to accommodate this re-quest) it has to drop Dt from Mt in addition to updatingAt to make sure that the current solution is feasible (allsubsetsA′t ⊂ Vt that contain an element fromDt are infea-sible, i.e., A′t /∈ It). To account for such losses, the stream-ing algorithm can only use other elements maintained in itsmemory in order to produce a feasible candidate solution,i.e. At ⊆ Mt ⊆ ((Vt \ Vt−1) ∪Mt−1) \Dt. We say thatthe streaming algorithm is robust against m deletions, if itcan provide a feasible solution At ∈ It at any given time tsuch that f(At) ≥ τOPTt for some constant τ > 0. Later,we show how robust streaming algorithms can be obtainedby carefully increasing the memory and running multipleinstances of existing streaming methods simultaneously.

    4. Example ApplicationsWe now discuss three concrete applications, with theirsubmodular objective functions f , where the size of thedatasets and the nature of the problem often require adeletion-robust streaming solution.

    4.1. Summarizing Click-stream and Geolocation Data

    There exists a tremendous opportunity of harnessing preva-lent activity logs and sensing resources. For instance, GPStraces of mobile phones can be used by road traffic infor-mation systems (such as Google traffic, TrafficSense, Nav-igon) to monitor travel times and incidents in real time. Inanother example, stream of user activity logs is recordedwhile users click on various parts of a webpage such as adsand news while browsing the web, or using social media.Continuously sharing all collected data is problematic forseveral reasons. First, memory and communication con-straints may limit the amount of data that can be stored andtransmitted across the network. Second, reasonable privacyconcerns may prohibit continuous tracking of users.

    In many such applications, the data can be described interms of a kernel matrix K which encodes the similaritybetween different data elements. The goal is to select asmall subset (active set) of elements while maintaining acertain diversity. Very often, the utility function boils downto the following monotone submodular function (Krause &Golovin, 2013) where α > 0 andKS,S is the principal sub-matrix of K indexed by the set S.

    f(S) = log det(I + αKS,S) (3)

  • Deletion-Robust Submodular Maximization

    In light of privacy concerns, it is natural to consider partic-ipatory models that empower users to decide what portionof their data could be made available. If a user decides notto share, or to revoke information about parts of their ac-tivity, the monitoring system should be able to update thesummary to comply with users’ preferences. Therefore, weuse ROBUST-STREAMING to identify a robust set of the kmost informative data points by maximizing Eq. (3).

    4.2. Summarizing Image Collections

    Given a collection of images, one might be interestedin finding a subset that best summarizes and representsthe collection. This problem has recently been addressedvia submodular maximization. More concretely, Tschi-atschek et al. (2014) designed several submodular objec-tives f1, . . . , fl, which quantify different characteristicsthat good summaries should have, e.g., being representa-tive w.r.t. commonly reoccurring motives. Each functioneither captures coverage (including facility location, sum-coverage, and truncated graph cut, or rewards diversity(such as clustered facility location, and clustered diversity).Then, they optimize a weighted combination of such func-tions

    fw(A) =

    l∑i=1

    wifi(A), (4)

    where weights are non-negative, i.e., wi ≥ 0, and learnedvia a large-margin structured prediction. We use theirlearned mixtures of submodular functions in our imagesummarization experiments. Now, consider a situationwhere a user wants to summarize a large collection of herphotos. If she decides to delete some of the selected pho-tos in the summary, she should be able to update the re-sult without processing the whole collection from scratch.ROBUST-STREAMING can be used as an appealing method.

    5. Robust-Streaming AlgorithmIn this section, we first elaborate on why naively increas-ing the solution size does not help. Then, we present ourmain algorithm, ROBUST-STREAMING, for deletion-robuststreaming submodular maximization. Our approach buildson the following key ideas: 1) simultaneously construct-ing non-overlapping solutions, and 2) appropriately merg-ing solutions upon deleting an element from the memory.

    5.1. Increasing the Solution Size Does Not Help

    One of the main challenges in designing streaming solu-tions is to immediately discover whether an element re-ceived from the data stream at time t is good enough to beadded to the memory Mt. This decision is usually madebased on the added value or marginal gain of the new el-ement which in turn depends on the previously chosen el-ements in the memory, i.e., Mt−1. Now, let us consider

    the opposite scenario when an element e should be deletedfrom the memory at time t. Since now we have a smallercontext, submodularity guarantees that the marginal gainsof the elements added to the memory after e was added,could have only increased if e was not part of the stream(diminishing returns). Hence, if some elements had largemarginal values to be included in the memory before thedeletion, they still do after the deletion. Based on this in-tuition, a natural idea is to keep a solution of a bigger size,saym+k (rather than k) for at mostm deletions. However,this idea does not work as shown by the following example.

    Bad Example (Coverage): Consider a collection of nsubsets V = {B1, . . . , Bn}, where Bi ⊆ {1, . . . , n}, and acoverage function f(A) = |∪i∈ABi|,A ⊆ V . Suppose wereceive B1 = {1, . . . , n}, and then Bi = {i} for 2≤ i ≤ nfrom the stream. Streaming algorithms that select elementsaccording to their marginal gain and are allowed to pickk + m elements, will only pick up B1 upon encounter (asother elements provide no gain), and returnAn = {B1} af-ter processing the stream. Hence, if B1 is deleted after thestream is received, these algorithms return the empty setAn = ∅ (with f(An) = 0). An optimal algorithm whichknows that elementB1 will be deleted, however, will returnset An = {B2, . . . , Bk+2}, with value f(An) = k + 1.Hence, standard streaming algorithms fail arbitrarily badlyeven under a single deletion (i.e., m = 1), even when weallow them to pick sets larger than k.

    In the following we show how we can solve the above issueby carefully constructing not one but multiple solutions.

    5.2. Building Multiple Solutions

    As stated earlier, the existing one-pass streaming algo-rithms for submodular maximization work by identifyingelements with marginal gains above a carefully chosenthreshold. This ensures that any element received from thestream which is fairly similar to the elements of the solu-tion set is discarded by the algorithm. Since elements arechosen as diverse as possible, the solution may suffer dra-matically in case of a deletion.

    One simple idea is to try to findm (near) duplicates for eachelement e in the memory, i.e., find e′ such that f(e′) = f(e)and ∆(e′|e) = 0 (Orlin et al., 2016). This way if we facemdeletions we can still find a good solution. The drawbackis that even one duplicate may not exist in the data stream(see the bad example above), and we may not be able torecover for the deleted element. Instead, what we will dois to construct non-overlapping solutions such that once weexperience a deletion, only one solution gets affected.

    In order to be robust against m deletions, we run a cas-cading chain of r instances of STREAMINGALGs as fol-lows. Let Mt = M (1)t ,M

    (2)t , . . . ,M

    (r)t denote the con-

  • Deletion-Robust Submodular Maximization

    Data Stream

    discarded forever

    1

    2 r

    1 2 r

    Figure 1: ROBUST-STREAMING uses r instances of ageneric STREAMINGALG to construct r non-overlappingmemories at any given time t, i.e., M (1)t , M

    (2)t , . . . , M

    (r)t .

    Each instance produces a solution S(i)t and the solution re-turned by ROBUST-STREAMING is the first valid solutionSt ={S(i)t |i=min j∈ [1 · · ·r],M

    (i)t 6=null}.

    tent of their memories at time t. When we receive anew element e ∈ Vt from the data stream at time t, wepass it to the first instance of STREAMINGALG(1). IfSTREAMINGALG(1) discards e, the discarded element iscascaded in the chain and is passed to its successive al-gorithm, i.e. STREAMINGALG(2). If e is discarded bySTREAMINGALG(2), the cascade continues and e is passedto STREAMINGALG(3). This process continues until eithere is accepted by one of the instances or discarded for good.Now, let us consider the case where e is accepted by thei-th instance, SIEVE-STREAMING(i), in the chain. As dis-cussed in Section 3.3, STREAMINGALG may choose to dis-card a set of points R(i)t ⊂ M

    (i)t from its memory before

    inserting e, i.e., M (i)t ←M(i)t ∪ {e} \R

    (i)t . Note that R

    (i)t

    is empty, if e is inserted and no element is discarded fromM

    (i)t . For every discarded element r ∈ R

    (i)t , we start a new

    cascade from (i+ 1)-th instance, STREAMINGALG (i+1).

    Note that in the worst case, every element of the stream cango once through the whole chain during the execution of thealgorithm, and thus the processing time for each elementscales linearly by r. An important observation is that at anygiven time t, all the memories M (1)t ,M

    (2)t , · · · ,M

    (r)t con-

    tain disjoint sets of elements. Next, we show how this datastructure leads to a deletion-robust streaming algorithm.

    5.3. Dealing with Deletions

    Equipped with the above data structure shown in Fig. 1,we now demonstrate how deletions can be treated. As-sume an element ed is being deleted from the memoryof the j-th instance of STREAMINGALG(j) at time t, i.e.,M

    (j)t ← M

    (j)t \ {ed}. As discussed in Section 5.1, the

    solution of the streaming algorithm can suffer dramaticallyfrom a deletion, and we may not be able to restore the qual-ity of the solution by substituting similar elements. Sincethere is no guarantee for the quality of the solution aftera deletion, we remove STREAMINGALG (j) from the chainby makingR(j)t =null and for all the remaining elements in

    its memory M (j)t , namely, R(j)t ←M

    (j)t \ {ed}, we start a

    new cascade from j+1-th instance, STREAMINGALG(j+1).

    The key reason why the above algorithm works is that theguarantee provided by the streaming algorithm is indepen-dent of the order of receiving the data elements. Note thatat any point in time, the first instance i of the algorithmwith M (i)t 6= null has processed all the elements from thestream Vt (not necessarily in the order the stream is origi-nally received) except the ones deleted by time t, i.e., Dt.Therefore, we can guarantee that STREAMINGALG (i) pro-vides us with its inherent α-approximation guarantee forreading Vt \Dt. More precisely, f(S(i)t ) ≥ αOPTt, whereOPTt is the optimum solution for the constrained optimiza-tion problem (2) when we have m deletions.

    In case of adversary deletions, there will be one deletionfrom the solution of m instances of STREAMINGALG inthe chain. Therefore, having r = m+ 1 instances, we willremain with only one STREAMINGALG that gives us thedesired result. However, as shown later in this section, ifthe deletions are i.i.d. (which is often the case in practice),and we have m deletions in expectation, we need r to bemuch smaller than m+1. Finally, note that we do not needto assume that m ≤ k where k is the size of the largestfeasible solution. The above idea works for arbitrarym≤n.

    The pseudocode of ROBUST-STREAMING is given in Al-gorithm 1. It uses r ≤ m+ 1 instances of STREAMIN-GALG as subroutines in order to produce r solutions. Wedenote by S(1)t , S

    (1)t , . . . , S

    (r)t the solutions of the r

    STREAMINGALGs at any given time t. We assume thatan instance i of STREAMINGALG(i) receive an input ele-ment and produces a solution S(i)t based on the input. Itmay also change its memory content M (i)t , and discard aset R(i)t . Among all the remained solutions (i.e., the onesthat are not ”null”), it returns the first solution in the chain,i.e. the one with the lowest index.

    Theorem 1 Let STREAMINGALG be a 1-pass streamingalgorithm that achieves an α-approximation guarantee forthe constrained maximization problem (2) with an updatetime of T , and a memory of size M when there is no dele-tion. Then ROBUST-STREAMING uses r ≤ m + 1 in-stances of STREAMINGALGs to produce a feasible solu-tion St ∈ It (now It encodes deletions in addition to con-straints) such that f(St) = αOPTt as long as no more thanm elements are deleted from the data stream. Moreover,ROBUST-STREAMING uses a memory of size rM , and hasworst case update time of O(r2MT ), and average updatetime of O(rT ).

    The proofs can be found in the appendix. In Table 1 wecombine the result of Theorem 1 with the existing stream-ing algorithms that satisfy our requirements.

  • Deletion-Robust Submodular Maximization

    Algorithm 1 ROBUST-STREAMINGInput: data stream Vt, deletion set Dt, r ≤ m+1.Output: solution St at any time t.

    1: t = 1, M (i)t = 0, S(i)t = ∅ ∀i ∈ [1 · · · r]

    2: while ({Vt \ Vt−1} ∪ {Dt \Dt−1} 6= ∅) do3: if {Dt \Dt−1} 6= ∅ then4: ed ← {Dt \Dt−1}5: Delete(ed)6: else7: et ← {Vt \ Vt−1}8: Add(1, et)9: end if

    10: t = t+ 111: St =

    {S(i)t | i = min{j ∈ [1 · · · r], M

    (j)t 6= null}

    }12: end while

    13: function Add(j, Ej)14: for i = j to r do15: Ei+1 = ∅16: for e ∈ Ei do17: [R(i)t ,M

    (i)t , S

    (i)t ] =STREAMINGALG

    (i)(e)

    18: Ei+1 = Ei+1 ∪R(i)t19: end for20: end for21: end function

    22: function Delete(e)23: for i = 1 to r do24: if e ∈M (i)t then25: R(i)t = M

    (i)t \ {e}

    26: M (i)t ← null27: Add(i+ 1, R(i)t )28: return29: end if30: end for31: end function

    Theorem 2 Assume each element of the stream is deletedwith equal probability p = m/n, i.e., in expectation wehave m deletions from the stream. Then, with probability1− δ, ROBUST-STREAMING provides an α-approximationas long as

    r ≥(

    1

    1− p

    )klog(1/δ).

    Theorem 2 shows that for fixed k, δ and p, a constant num-ber r of STREAMINGALGs is sufficient to support m = pn(expected) deletions independently of n. In contrast, foradversarial deletions, as analyzed in Theorem 1, pn + 1copies of STREAMINGALG are required, which grows lin-early in n. Hence, the required dependence of r on m ismuch milder for random than adversarial deletions. This isalso verified by our experiments in Section 6.

    6. ExperimentsWe address the following questions: 1) How much canROBUST-STREAMING recover and possibly improve theperformance of STREAMINGALG in case of deletions? 2)How much does the time of deletions affect the perfor-mance? 3) To what extent does deleting representativevs. random data points affect the performance? To thisend, we run ROBUST-STREAMING on the applications wedescribed in Section 4, namely, image collection summa-rization, summarizing stream of geolocation sensor data,as well as summarizing a clickstream of size 45 million.

    Throughout this section we consider the following stream-ing algorithms: SIEVE-STREAMING (Badanidiyuru et al.,2014), STREAM-GREEDY (Gomes & Krause, 2010), andSTREAMING-GREEDY (Chekuri et al., 2015). We allowall streaming algorithms, including the non-preemptiveSIEVE-STREAMING, to update their solution after eachdeletion. We also consider a stronger variant of SIEVE-STREAMING, called EXTSIEVE, that aims to pick k ·r el-ements to protect for deletions, i.e., is allowed the samememory as ROBUST-STREAMING. After the deletions, theremaining solution is pruned to k elements.

    To compare the effect of deleting representative elements tothe that of deleting random elements from the stream, weuse two stochastic variants of the greedy algorithm, namely,STOCHASTIC-GREEDY (Mirzasoleiman et al., 2015) andRANDOM-GREEDY (Buchbinder et al., 2014). This waywe introduce randomness into the deletion process in aprincipled way. Hence, we have:

    STOCHASTIC-GREEDY (SG): Similar to the the greedyalgorithm, STOCHASTIC-GREEDY starts with an empty setand adds one element at each iteration until obtains a solu-tion of sizem. But in each step it first samples a random setR of size (n/m) log(1/�) and then adds an element fromR to the solution which maximizes the marginal gain.

    RANDOM-GREEDY (RG): RANDOM-GREEDY iterativelyselects a random element from the top m elements withthe highest marginal gains, until finds a solution of size m.

    For each deletion method, the m data points are deletedeither while receiving the data (where the steaming algo-rithms have the chance to update their solutions by select-ing new elements) or after receiving the data (where thereis no chance of updating the solution with new elements).Finally, the performance of all algorithms are normalizedagainst the utility obtained by the centralized algorithmthat knows the set of deleted elements in advance.

    6.1. Image Collection Summarization

    We first apply ROBUST-STREAMING to a collection of100 images from Tschiatschek et al. (2014). We used

  • Deletion-Robust Submodular Maximization

    Algorithm Problem Constraint Appr. Fact. MemoryROBUST + SIEVE-STREAMING [BMKK’14] Mon. Subm. Cardinality 1/2− � O(mk log k/�)ROBUST + f -MSM [CK’14] Mon. Subm. p-matroids 14p O(mk(log |V |)

    O(1))

    ROBUST + STREAMING-GREEDY [CGQ’15] Non-mon. Subm. p-matchoid (1−ε)(2−o(1))(8+e)p O(mk log k/�2)

    ROBUST + STREAMING-LOCAL SEARCH[MJK’17] Non-mon. Subm.

    p-system +d knapsack

    (1−ε)(4p−1)4p(8p+2d−1) O(mpk log

    2 k/�2)

    Table 1: ROBUST-STREAMING combined with 1-pass streaming algorithms can make them robust against m deletions.

    the weighted combination of 594 submodular functions ei-ther capturing coverage or rewarding diversity (c.f. Sec-tion 4.2). Here, despite the small size of the dataset, com-puting the weighted combination of 594 functions makesthe function evaluation considerably expensive.

    Fig. 2a compares the performance of SIEVE-STREAMINGwith its robust version ROBUST-STREAMING for r = 3 andsolution size k=5. Here, we vary the numberm of deletionsfrom 1 to 20 after the whole stream is received. We seethat ROBUST-STREAMING maintains its performance byupdating the solution after deleting subsets of data pointsimposed by different deletion strategies. It can be seenthat, even for a larger number m of deletions, ROBUST-STREAMING, run with parameter r < m, is able to return asolution competitive with the strong centralized benchmarkthat knows the deleted elements beforehand. For the imagecollection, we were not able to compare the performance ofSTREAM-GREEDY with its robust version due to the pro-hibitive running time. Fig. 2b shows an example of an up-dated image summary returned by ROBUST-STREAMINGafter deleting the first image from the summary.

    Number of deletions5 10 15

    Nor

    mal

    ized

    obj

    ectiv

    e va

    lue

    0.86

    0.88

    0.9

    0.92

    0.94

    0.96

    0.98 Robust-RGRobust-SG

    Sieve-SG

    Sieve-RG

    ExtSieve-RG

    ExtSieve-SG

    (a) Images (b) Images

    Figure 2: Performance of ROBUST-STREAMING vsSIEVE-STREAMING for different deletion strategies (SG,RG), at the end of stream, on a collection of 100 images.Here we fixk= 5and r = 3. a) performance of ROBUST-STREAMING and SIEVE-STREAMING normalized by theutility obtained by greedy that knows the deleted elementsbeforehand. b) updated solution returned by ROBUST-STREAMING after deleting the first images in the summary.

    6.2. Summarizing a stream of geolocation dataNext we apply ROBUST-STREAMING to the active set se-lection objective described in Section 4.1. Our dataset con-

    sists of 3,607 geolocations, collected during a one hour bikeride around Zurich (Fatio, 2015). For each pair of points iand j we used the corresponding (latitude, longitude) coor-dinates to calculate their distance in meters di,j and chosea Gaussian kernel Ki,j = exp(−d2i,j/h2) with h=1500.Fig. 3e shows the dataset where red and green tri-angles show a summary of size 10 found by SIEVE-STREAMING, and the updated summary provided byROBUST-STREAMING with r = 5 after deleting m= 70%of the datapoints. Fig. 3a and 3c compare the perfor-mance of SIEVE-STREAMING with its robust version whenthe data is deleted after or during the stream, respectively.As we see, ROBUST-STREAMING provides a solution veryclose to the hindsight centralized method. Fig. 3b and3d show similar behavior for STREAM-GREEDY. Notethat deleting data points via STOCHASTIC-GREEDY orRANDOM-GREEDY are much more harmful on the qual-ity of the solution provided by STREAM-GREEDY. We re-peated the same experiment by dividing the map into gridsof length 2km. We then considered a partition matroid byrestricting the number of points selected from each gridto be 1. The red and green triangles in Fig. 3f are thesummary found by STREAMING-GREEDY and the updatedsummary provided by ROBUST-STREAMING after deletingthe shaded area in the figure.

    6.3. Large scale click through predictionFor our large-scale experiment we consider again the activeset selection objective, described in Section 4.1. We usedYahoo! Webscope data set containing 45,811,883 user clicklogs for news articles displayed in the Featured Tab of theToday Module on Yahoo! Front Page during the first tendays in May 2009 (Yahoo, 2012). For each visit, both theuser and shown articles are associated with a feature vectorof dimension 6. We take their outer product, resulting in afeature vector of size 36.

    The goal was to predict the user behavior for each displayedarticle based on historical clicks. To do so, we consideredthe first 80% of the data (for the fist 8 days) as our trainingset, and the last 20% (for the last 2 days) as our test set.We used Vowpal-Wabbit (Langford et al., 2007) to train alinear classifier on the full training set. Since only 4% ofthe data points are clicked, we assign a weight of 10 to each

  • Deletion-Robust Submodular Maximization

    Number of deletions20 40 66 80 100 120

    Nor

    mal

    ized

    obj

    ectiv

    e va

    lue

    0.93

    0.935

    0.94

    0.945

    0.95

    0.955

    0.96

    0.965

    0.97 Robust-RG

    Sieve-SG

    Robust-SG

    ExtSieve-SGExtSieve-RG

    Sieve-RG

    (a) SIEVE-STREAMING, at endNumber of deletions

    0 20 40 60 80 100 120

    Norm

    alize

    d ob

    ject

    ive v

    alue

    0.65

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    1 Robust-SGRobust-RG

    ExtSieve-SGExtSieve-RG

    Sieve-RG

    Sieve-SG

    (b) STREAM-GREEDY, at end

    Number of deletions0 20 40 60 80 100 120

    Norm

    alize

    d ob

    ject

    ive v

    alue

    0.75

    0.751

    0.752

    0.753

    0.754

    0.755

    0.756

    0.757

    0.758

    ExtSieve-RG

    Sieve-SG

    Sieve-RG

    ExtSieve-SG

    Robust-RGRobust-SG

    (c) SIEVE-STREAMING, duringNumber of deletions

    0 20 40 60 80 100 120

    Norm

    alize

    d ob

    ject

    ive v

    alue

    0.74

    0.75

    0.76

    0.77

    0.78

    0.79

    0.8

    0.81

    0.82Robust-RG

    ExtSieve-RGExtSieve-SG

    Robust-SG

    Sieve-RG

    Sieve-SG

    (d) STREAM-GREEDY, during

    (e) Cardinality constraints (f) Matroid constraints

    Figure 3: ROBUST-STREAMING vs SIEVE-STREAMINGand STREAM-GREEDY for different deletion strategies(SG, RG) on geolocation data. We fix k = 20 and r = 5.a) and c) show the performance of robustified SIEVE-STREAMING, whereas b) and d) show performance for ro-bustified STREAM-GREEDY. a) and b) consider the per-formance after deletions at the end of the stream, while c)and d) consider average performance while deletions hap-pen during the stream. e) red and green triangles show a setof size 10 found by SIEVE-STREAMING and the updatedsolution found by ROBUST-STREAMING where 70% of thepoints are deleted. f) set found by STREAMING-GREEDY,constrained to pick at most 1 point per grid cell (matroidconstraint). Here r = 5, and we deleted the shaded area.

    clicked vector. The AUC score of the trained classifier onthe test set was 65%. We then used ROBUST-STREAMINGand SIEVE-STREAMING to find a representative subset ofsize k consisting of k/2 clicked and k/2 not-clicked ex-amples from the training data. Due to the massive size ofthe dataset, we used Spark on a cluster of 15 quad-coremachines with 32GB of memory each. We partitioned thetraining data to the machines keeping its original order. We

    ran ROBUST-STREAMING on each machine to find a sum-mary of size k/15, and merged the results to obtain the fi-nal summary of size k. We then start deleting the data uni-formly at random until we left with only 1% of the data, andtrained another classifier on the remaining elements fromthe summary.

    Fig. 4a compares the performance of ROBUST-STREAMING for a fixed active set of size k = 10, 000,and r = 2 with random selection, randomly selectingequal numbers of clicked and not-clicked vectors, andusing SIEVE-STREAMING for selecting equal numbers ofclicked and not-clicked data points. The y-axis shows theimprovement in AUC score of the classifier trained on asummary obtained by different algorithms over randomguessing (AUC=0.5), normalized by the AUC score of theclassifier trained on the whole training data. To maximizefairness, we let other baselines select a subset of r.kelements before deletions. Fig. 4b shows the same quantityfor r = 5. It can be seen that a slight increase in theamount of memory helps boosting the performance for allthe algorithms. However, ROBUST-STREAMING benefitsfrom the additional memory the most, and can almostrecover the performance of the classifier trained on the fulltraining data, even after 99% deletion.

    Random RandEqual ExtSieve Robust

    Nor

    mal

    ized

    AU

    C im

    prov

    emen

    t

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    (a) Yahoo! Webscope, r = 2Random RandEqual ExtSieve Robust

    Norm

    alize

    d AU

    C im

    prov

    emen

    t

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    (b) Yahoo! Webscope, r = 5

    Figure 4: ROBUST-STREAMING vs random unbalancedand balanced selection and SIEVE-STREAMING selectingequal numbers of clicked and not-clicked data points, on45,811,883 feature vectors from Yahoo! Webscope data.We fix k = 10, 000 and delete 99% of the data points.

    7. ConclusionWe have developed the first deletion-robust streaming algo-rithm – ROBUST-STREAMING – for constrained submod-ular maximization. Given any single-pass streaming al-gorithm STREAMINGALG with α-approximation guaran-tee, ROBUST-STREAMING outputs a solution that is robustagainst m deletions. The returned solution also satisfies anα-approximation guarantee w.r.t. to the solution of the op-timum centralized algorithm that knows the set of m dele-tions in advance. We have demonstrated the effectivenessof our approach through an extensive set of experiments.

  • Deletion-Robust Submodular Maximization

    AcknowledgementsThis research was supported by ERC StG 307036, a Mi-crosoft Faculty Fellowship, DARPA Young Faculty Award(D16AP00046), Simons-Berkeley fellowship and an ETHFellowship. This work was done in part while Amin Kar-basi, and Andreas Krause were visiting the Simons Institutefor the Theory of Computing.

    ReferencesBabaei, Mahmoudreza, Mirzasoleiman, Baharan, Jalili,

    Mahdi, and Safari, Mohammad Ali. Revenue maximiza-tion in social networks through discounting. Social Net-work Analysis and Mining, 3(4):1249–1262, 2013.

    Badanidiyuru, Ashwinkumar and Vondrák, Jan. Fast algo-rithms for maximizing submodular functions. In SODA,2014.

    Badanidiyuru, Ashwinkumar, Mirzasoleiman, Baharan,Karbasi, Amin, and Krause, Andreas. Streaming sub-modular maximization: Massive data summarization onthe fly. In KDD, 2014.

    Buchbinder, Niv, Feldman, Moran, Naor, Joseph Seffi, andSchwartz, Roy. Submodular maximization with cardi-nality constraints. In SODA, 2014.

    Chakrabarti, Amit and Kale, Sagar. Submodular maximiza-tion meets streaming: Matchings, matroids, and more.IPCO, 2014.

    Chekuri, Chandra, Gupta, Shalmoli, and Quanrud, Kent.Streaming algorithms for submodular function maxi-mization. In ICALP, 2015.

    Dueck, Delbert and Frey, Brendan J. Non-metric affinitypropagation for unsupervised image categorization. InICCV, 2007.

    El-Arini, Khalid and Guestrin, Carlos. Beyond keywordsearch: Discovering relevant scientificliterature. In KDD,2011.

    Epasto, Alessandro, Lattanzi, Silvio, Vassilvitskii, Sergei,and Zadimoghaddam, Morteza. Submodular optimiza-tion over sliding windows. In WWW, 2017.

    Fatio, Philipe. https://refind.com/fphilipe/topics/open-data, 2015.

    Feige, Uriel. A threshold of ln n for approximating setcover. Journal of the ACM, 1998.

    Gomes, Ryan and Krause, Andreas. Budgeted nonparamet-ric learning from data streams. In ICML, 2010.

    guidelines. http://data.consilium.europa.eu/doc/document/ST-9565-2015-INIT/en/pdf.

    Jiecao, Chen, Nguyen, Huy L., and Zhang, Qin. Submod-ular optimization over sliding windows. 2017. preprint,https://arxiv.org/abs/1611.00129.

    Kempe, David, Kleinberg, Jon, and Tardos, Éva. Maximiz-ing the spread of influence through a social network. InKDD, 2003.

    Krause, Andreas and Golovin, Daniel. Submodularfunction maximization. In Tractability: PracticalApproaches to Hard Problems. Cambridge UniversityPress, 2013.

    Krause, Andreas, McMahon, H Brendan, Guestrin, Carlos,and Gupta, Anupam. Robust submodular observationselection. Journal of Machine Learning Research, 2008.

    Langford, John, Li, Lihong, and Strehl, Alex. Vowpal wab-bit online learning project, 2007.

    Lin, Hui and Bilmes, Jeff. A class of submodular functionsfor document summarization. In NAACL/HLT, 2011.

    Mirzasoleiman, Baharan, Badanidiyuru, Ashwinkumar,Karbasi, Amin, Vondrak, Jan, and Krause, Andreas.Lazier than lazy greedy. In AAAI, 2015.

    Mirzasoleiman, Baharan, Jegelka, Stefanie, and Krause,Andreas. Streaming non-monotone submodular maxi-mization: Personalized video summarization on the fly.2017. preprint, https://arxiv.org/abs/1706.03583.

    Nemhauser, George L., Wolsey, Laurence A., and Fisher,Marshall L. An analysis of approximations for maxi-mizing submodular set functions - I. Mathematical Pro-gramming, 1978.

    Orlin, James B, Schulz, Andreas S, and Udwani, Rajan.Robust monotone submodular function maximization.IPCO, 2016.

    Rakesh Agrawal, Sreenivas Gollapudi, Alan HalversonSamuel Ieong. Diversifying search results. In WSDM,2009.

    Regulation, European Data Protection. http://ec.europa.eu/justice/data-protection/document/review2012/com_2012_11_en.pdf, 2012.

    Sipos, Ruben, Swaminathan, Adith, Shivaswamy, Pannaga,and Joachims, Thorsten. Temporal corpus summariza-tion using submodular word coverage. In CIKM, 2012.

    https://refind.com/fphilipe/topics/open-datahttps://refind.com/fphilipe/topics/open-datahttp://data.consilium.europa.eu/doc/document/ST-9565-2015-INIT/en/pdfhttp://data.consilium.europa.eu/doc/document/ST-9565-2015-INIT/en/pdfhttp://data.consilium.europa.eu/doc/document/ST-9565-2015-INIT/en/pdfhttps://arxiv.org/abs/1611.00129https://arxiv.org/abs/1706.03583https://arxiv.org/abs/1706.03583http://ec.europa.eu/justice/data-protection/document/review2012/com_2012_11_en.pdfhttp://ec.europa.eu/justice/data-protection/document/review2012/com_2012_11_en.pdfhttp://ec.europa.eu/justice/data-protection/document/review2012/com_2012_11_en.pdfhttp://ec.europa.eu/justice/data-protection/document/review2012/com_2012_11_en.pdf

  • Deletion-Robust Submodular Maximization

    Tschiatschek, Sebastian, Iyer, Rishabh K, Wei, Haochen,and Bilmes, Jeff A. Learning mixtures of submodularfunctions for image collection summarization. In NIPS,2014.

    Weber, R. The right to be forgotten: More than apandora’s box? https://www.jipitec.eu/issues/jipitec-2-2-2011/3084, 2011.

    Wei, Kai, Iyer, Rishabh, and Bilmes, Jeff. Submodularityin data subset selection and active learning. In ICML,2015.

    Yahoo. Yahoo! academic relations. r6a, yahoo! front pagetoday module user click log dataset, version 1.0, 2012.URL http://Webscope.sandbox.yahoo.com.

    https://www.jipitec.eu/issues/jipitec-2-2-2011/3084https://www.jipitec.eu/issues/jipitec-2-2-2011/3084http://Webscope.sandbox.yahoo.com