Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms

Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 1/25

Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms

Onur Küçüktunç1,2 Erik Saule1

Kamer Kaya1 Ümit V. Çatalyürek1,3

WWW 2013, May 13–17, 2013, Rio de Janeiro, Brazil.

1Dept. Biomedical Informatics 2Dept. of Computer Science and Engineering

3Dept. of Electrical and Computer Engineering The Ohio State University


Outline

•  Problem definition –  Motivation –  Result diversification algorithms

•  How to measure diversity

–  Classical relevance and diversity measures –  Bicriteria optimization?! –  Combined measures

•  Best Coverage method

–  Complexity, submodularity –  A greedy solution, relaxation

•  Experiments


Problem definition

G = (V,E)

Q ✓ V

Online shopping product co-purchasing

•  one product •  previous purchases •  page visit history

Academic paper-to-paper citations

•  paper/field of interest •  set of references •  researcher himself/herself

product recommendations “you might also like…” R ⇢ V references for related work

new collaborators

collaboration network

Social friendship network

•  user himself/herself •  set of people

friend recommendations “you might also know…”

Let G = (V,E) be an undirected graph.

Given a set of m seed nodes

Q = {q1, . . . , qm} s.t. Q ✓ V ,

and a parameter k, return top-k items

which are relevant to the ones in Q,

but diverse among themselves, covering

di↵erent aspects of the query.


Problem definition Let G = (V,E) be an undirected graph.

Given a set of m seed nodes

Q = {q1, . . . , qm} s.t. Q ✓ V ,

and a parameter k, return top-k items

which are relevant to the ones in Q,

but diverse among themselves, covering

di↵erent aspects of the query.

•  We assume that the graph itself is the only information we have, and no categories or intents are available

•  no comparisons to intent-aware algorithms [Agrawal09,Welch11,etc.] •  but we will compare against intent-aware measures

•  Relevance scores are obtained with Personalized PageRank (PPR) [Haveliwala02]

p⇤(v) =

(1/m, if v 2 Q0, otherwise.


Result diversification algorithms

•  GrassHopper [Zhu07] –  ranks the graph k times

•  turns the highest-ranked vertex into a sink node at each iteration

0 5 100

2

4

6

8

0 5 1005100

0.005

0.01

0.015g1

0 5 1005100

2

4

6g2

0 5 1005100

0.5

1

1.5g3

(a) (b) (c) (d)

Figure 1: (a) A toy data set. (b) The stationary distribution ! reflects centrality. The item with the largestprobability is selected as the first item g1. (c) The expected number of visits v to each node after g1 becomesan absorbing state. (d) After both g1 and g2 become absorbing states. Note the diversity in g1, g2, g3 as theycome from different groups.

Items at group centers have higher probabilities, andtighter groups have overall higher probabilities.However, the stationary distribution does not ad-

dress diversity at all. If we were to rank the itemsby their stationary distribution, the top list would bedominated by items from the center group in Fig-ure 1(b). Therefore we only use the stationary dis-tribution to find the first item, and use a methoddescribed in the next section to rank the remainingitems.Formally we first define an n ! n raw transition

matrix P by normalizing the rows of W : Pij =wij/

!nk=1 wik, so that Pij is the probability that the

walker moves to j from i. We then make the walka teleporting random walk P by interpolating eachrow with the user-supplied initial distribution r:

P = "P + (1 " ")1r!, (1)

where 1 is an all-1 vector, and 1r! is the outer prod-uct. If " < 1 and r does not have zero elements,our teleporting random walk P is irreducible (possi-ble to go to any state from any state by teleporting),aperiodic (the walk can return to a state after anynumber of steps), all states are positive recurrent (theexpected return time to any state is finite) and thusergodic (Grimmett and Stirzaker, 2001). ThereforeP has a unique stationary distribution ! = P!!.We take the state with the largest stationary proba-bility to be the first item g1 in GRASSHOPPER rank-ing: g1 = argmaxn

i=1 !i.

2.3 Ranking the Remaining Items

As mentioned early, the key idea of GRASSHOPPERis to turn ranked items into absorbing states. Wefirst turn g1 into an absorbing state. Once the ran-dom walk reaches an absorbing state, the walk is ab-sorbed and stays there. It is no longer informative tocompute the stationary distribution of an absorbingMarkov chain, because the walk will eventually beabsorbed. Nonetheless, it is useful to compute theexpected number of visits to each node before ab-sorption. Intuitively, those nodes strongly connectedto g1 will have many fewer visits by the randomwalk, because the walk tends to be absorbed soonafter visiting them. In contrast, groups of nodes faraway from g1 still allow the random walk to lingeramong them, and thus have more visits. In Fig-ure 1(c), once g1 becomes an absorbing node (rep-resented by a circle ‘on the floor’), the center groupis no longer the most prominent: nodes in this grouphave fewer visits than the left group. Note now they-axis is the number of visits instead of probability.GRASSHOPPER selects the second item g2 with the

largest expected number of visits in this absorbingMarkov chain. This naturally inhibits items similarto g1 and encourages diversity. In Figure 1(c), theitem near the center of the left group is selected asg2. Once g2 is selected, it is converted into an ab-sorbing state, too. This is shown in Figure 1(d). Theright group now becomes the most prominent, sinceboth the left and center groups contain an absorbingstate. The next item g3 in ranking will come from theright group. Also note the range of y-axis is smaller:

0 5 100

2

4

6

8

0 5 1005100

0.005

0.01

0.015g1

0 5 1005100

2

4

6g2

0 5 1005100

0.5

1

1.5g3

(a) (b) (c) (d)







P = "P + (1 " ")1r!, (1)


i=1 !i.




0 5 100

2

4

6

8

0 5 1005100

0.005

0.01

0.015g1

0 5 1005100

2

4

6g2

0 5 1005100

0.5

1

1.5g3

(a) (b) (c) (d)







P = "P + (1 " ")1r!, (1)


i=1 !i.




highest-ranked vertex

R = {g1}

R = {g1,g2}

R = {g1,g2,g3}

g1 turned into a sink node

highest-ranked in the next step





•  DivRank [Mei10] –  based on vertex-reinforced random walks (VRRW)

•  adjusts the transition matrix based on the number of visits to the vertices (rich-gets-richer mechanism)

(a) An illustrative network. (b) Weighting with PageRank. (c) A diverse weighting.

Figure 1: An illustration of diverse ranking in a toy network.

diverse information, like in Figure 1(c). Vertex 1, 5, and 4receive the majority weight, representing the three commu-nities. Vertex 1, which represents the largest community, isranked to the top. Although vertex 2 and 3 has a higherdegree than vertex 5, they are ranked lower because vertex1 has already partially covered their information.

A greedy vertex selection algorithm may achieve diver-sity by iteratively selecting the most prestigious vertex andthen penalizing the vertices“covered”by the already selectedones. An example is the Maximum Marginal Relevance [3].One may also consider first clustering the nodes (e.g., [17])and then selecting centroids of clusters. However, it’s di!-cult to predefine the number of clusters in this task. Therelacks a principled objective and a unified process that auto-matically balances centrality and diversity.

In this paper, we propose a novel and unified process thatbalances prestige and diversity in ranking, based on a time-variant random walk in the network. The proposed model,called DivRank (abbreviation for Diverse Rank), intro-duces the rich-gets-richer mechanism to PageRank style ran-dom walks with reinforcements on transition probabilities.In contrast to the greedy vertex selection methods, DivRankprovides a unified and intuitive stochastic process, as well asa principled optimization explanation. The process is wellconnected to a number of classical models in mathematicsand network science, such as the vertex-reinforced randomwalk, the Polya’s Urn, and the preferential attachment.

DivRank not only has a solid theoretical foundation, butalso presents a strong empirical performance. The resultpresented in Figure 1(c) is actually generated using DivRank.We compare DivRank with a number of representative meth-ods in literature using real world datasets and tasks. In allthese tasks, DivRank outperforms the state-of-the-art meth-ods in generating diverse top ranked results.

There are many potential applications of DivRank. Thetasks presented in our experiments (i.e., ranking actors insocial networks, ranking authors and publications, and textsummarization) are by no means the only possible tasks.One may expect DivRank be applied in diversifying searchresults, snippet generation, keyword selection, mobile search,expert finding, and in various recommender systems.

The rest of the paper is organized as follows. In Section 2,we briefly introduce the task of ranking in information net-works. In section 3, we formally introduce DivRank, includ-ing the general form and two practical approximations. Wethen provide an analytical discussion of DivRank in Sec-tion 4, followed by a comprehensive empirical analysis inSection 5. We discuss the related work in Section 6 andpresent our conclusions in Section 7.

2. RANKINGVERTICES IN INFORMATIONNETWORKS

In this section, we introduce the basic concepts and taskof ranking vertices in networks, followed by the commonlyused random walk processes for prestige measurement.

2.1 Information NetworksLet G = (V,E) be a graph (or a network) where V is a

finite set of vertices and E is a finite set of edges. We definean ordered pair (u, v) as an edge from vertex u to vertexv. When G is an undirected graph, we have (u, v) = (v, u);when G is a directed graph, we have (u, v) != (v, u). In a so-cial network, V refers to a set of social actors (people) andE refers to the social ties between actors. In an informationnetwork, V and E broadly correspond to any type of infor-mation objects and the relationships between objects. Wedefine the weight of an edge using w(u, v). Note that whenthe edge corresponds to a citation between two documentsor a hyperlink between two web pages, w(u, v) takes a bi-nary value. w(u, v) could take any non-negative real valuein other scenarios, e.g., when the edge corresponds to thesimilarity or cooccurrence of two objects, etc.

We then cast the task of ranking the vertices based ontheir prestige (or centrality in di"erent contexts, which wewill use interchangeably with prestige) in a network as find-ing a prestige function f : V " R

+. Beyond simple mea-sures such as degree, recent research focuses on a family ofcentrality measures based on the stationary distribution of arandom walk in the network, such as the well-known PageR-ank [14] and its counterpart in text networks, LexRank [5].

2.2 Prestige Ranking with RandomWalksA family of prestige measures in networks leverages the

stationary distribution of a random walk in the network. Arandom walk defines a Markov chain in the given (eitherdirected or undirected) network, where each vertex repre-sents a state and a walk transits from one state to anotherbased on a transition probability, denoted as p(u, v). Inother words, a random walk on G is defined by a transitionprobability function p : V # V " [0, 1]. Let us use pT (u) todenote the probability that the walk is at state u at time T .A standard random walk can be defined as

pT (v) =!

(u,v)!E

p(u, v)pT"1(u). (1)

If the Markov chain is ergodic, pT (v) converges to a sta-tionary distribution !(v) which is commonly used to mea-sure the importance of vertices.

Most existing random walk models assume that the tran-

1010

















pT (v) =!

(u,v)!E

p(u, v)pT"1(u). (1)



1010

















pT (v) =!

(u,v)!E

p(u, v)pT"1(u). (1)



1010

sample graph weighting with PPR diverse weighting





•  DivRank [Mei10] –  based on vertex-reinforced random walks (VRRW)

•  adjusts the transition matrix based on the number of visits to the vertices (rich-gets-richer mechanism)

•  Dragon [Tong11] –  based on optimizing the goodness measure

•  punishes the score when two neighbors are included in the results


Measuring diversity

Relevance measures •  Normalized relevance

•  Difference ratio

•  nDCG

Diversity measures •  l-step graph density

•  l-expansion ratio

rel(S) =

Pv2S ⇡v

Pki=1 ⇡i

di↵(S, S) = 1� |S \ S||S|

nDCGk =⇡s1 +

Pki=2

⇡silog2 i

⇡1

+Pk

i=2

⇡ilog2 i

dens`(S) =

Pu,v2S,u 6=v d`(u, v)

|S|⇥ (|S|� 1)

�`(S) =|N`(S)|

n

N`(S) = S [ {v 2 (V � S) : 9u 2 S, d(u, v) `}where


Bicriteria optimization measures

•  aggregate a relevance and a diversity measure •  [Carbonell98]

•  [Li11]

•  [Vieira11]

•  max-sum diversification, max-min diversification, k-similar diversification set, etc. [Gollapudi09]

fMMR(S) = (1� �)X

v2S

⇡v � �X

u2S

max

v2Su 6=v

sim(u, v)

fL(S) =X

v2S

⇡v + �|N(S)|

n

fMSD(S) = (k � 1)(1� �)X

v2S

⇡v + 2�X

u2S

X

v2Su 6=v

div(u, v)


Bicriteria optimization is not the answer

•  Objective: diversify top-10 results •  Two query-oblivious algorithms:

–  top-% + random

–  top-% + greedy-σ2


Bicriteria optimization is not the answer •  normalized relevance and 2-step graph density

•  evaluating result diversification as a bicriteria optimization problem with –  a relevance measure that ignores diversity, and –  a diversity measure that ignores relevancy.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

dens

2

rel

10-RLMBC1BC1VBC2BC2VBC1100BC11000BC150BC2100BC21000BC250CDivRankDRAGON

GRASSHOPPERGSPARSEIL1IL2LMPDivRankPRk-RLMtop-90%+randomtop-75%+randomtop-50%+randomtop-25%+randomAll Random

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

dens

2

rel



better 0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

dens

2

rel



0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

dens

2

rel



better 0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.2 0.4 0.6 0.8 1

σ2

rel


GRASSHOPPERGSPARSEIL1IL2LMPDivRankPRk-RLMtop-90%+greedy-top-75%+greedy-top-50%+greedy-top-25%+greedy-All greedy-

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.2 0.4 0.6 0.8 1

σ2

rel


GRASSHOPPERGSPARSEIL1IL2LMPDivRankPRk-RLMtop-90%+greedy-top-75%+greedy-top-50%+greedy-top-25%+greedy-All greedy-

better

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

othersothersothersothersothers

top-90%+randomtop-75%+randomtop-50%+randomtop-25%+random

All randomothersothersothers

top-90%+greedy-σ2top-75%+greedy-σ2top-50%+greedy-σ2top-25%+greedy-σ2

All greedy-σ2othersothersothers

other algorithms

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1






other algorithms


A better measure? Combine both

•  We need a combined measure that tightly integrates both relevance and diversity aspects of the result set

•  goodness [Tong11]

–  downside: highly dominated by relevance

fG(S) = 2X

i2S

⇡i � dX

i,j2S

A(j, i)⇡j

� (1�d)X

j2S

⇡j

X

i2S

p⇤(i)max-sum relevance

penalize the score when two results share an edge


Proposed measure: l-step expanded relevance

•  a combined measure of –  l-step expansion ratio (σ2) –  relevance scores (π)

•  quantifies: relevance of the covered region of the graph

•  do some sanity check with this new measure

`-step expanded relevance:

exprel`(S) =X

v2N`(S)

⇡v

where N`(S) is the `-step expansion

set of the result set S, and ⇡ is the

PPR scores of the items in the graph.

0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

expr

el2

k 0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

expr

el2

k

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1






other algorithms

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1






other algorithms


Correlations of the measures

rele

vanc

e di

vers

ity

goodness is dominated by the relevancy measures

exprel has no high correlations with other relevance or diversity measures


Proposed algorithm: Best Coverage •  Can we use -step expanded

relevance as an objective function? •  Define:

•  Complexity: generalization of weighted maximum coverage problem –  NP-hard! –  but exprell is a submodular function (Lemma 4.2) –  a greedy solution (Algorithm 1) that selects the item

with the highest marginal utility at each step is the best possible polynomial time approximation (proof based on [Nemhauser78])

•  Relaxation: computes BestCoverage on highest ranked vertices to improve runtime

exprel`-diversified top-k ranking (DTR`)S = argmax

S0✓V|S0|=k

exprel`(S0)

g(v, S) =P

v02N`({v})�N`(S) ⇡v0

ALGORITHM 1: BestCoverageInput: k,G,⇡, `Output: a list of recommendations SS = ;while |S| < k do

v⇤ argmaxv g(v, S)S S [ {v⇤}

return S

ALGORITHM 1: BestCoverage (relaxed)

ALGORITHM 2: BestCoverage (relaxed)

Input: k,G,⇡, `Output: a list of recommendations SS = ;Sort(V ) w.r.t ⇡i non-increasing

S1 V [1..k0], i.e., top-k0

vertices where k0= k¯�`

8v 2 S1, g(v) g(v, ;)8v 2 S1, c(v) Uncovered

while |S| < k dov⇤ argmaxv2S1 g(v)S S [ {v⇤}S2 N`({v⇤})for each v0 2 S2 do

if c(v0) = Uncovered thenS3 N`({v0})8u 2 S3, g(u) g(u)� ⇡v0

c(v0) Covered

return S

`


Experiments

•  5 target application areas, 5 graphs from SNAP

•  Queries generated based on 3 scenario types –  one random vertex –  random vertices from one area of interest –  multiple vertices from multiple areas of interest

Dataset |V | |E| � D D90% CCamazon0601 403.3K 3.3M 16.8 21 7.6 0.42ca-AstroPh 18.7K 396.1K 42.2 14 5.1 0.63cit-Patents 3.7M 16.5M 8.7 22 9.4 0.09soc-LiveJournal1 4.8M 68.9M 28.4 18 6.5 0.31web-Google 875.7K 5.1M 11.6 22 8.1 0.60


Results – relevance

•  Methods should trade-off relevance for better diversity •  Normalized relevance of top-k set is always 1 •  DRAGON always return results having 70% similar items

to top-k, with more than 80% rel score

amazon0601, combined

0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

rel

k

PPR (top-k)GrassHopperDragonPDivRankCDivRankk-RLMGSparseBC1BC2BC1 (relaxed)BC2 (relaxed)

0.1

1

10

100

5 10 20 50 100

time

(sec

)

k


0.1

1

10

100

5 10 20 50 100

time

(sec

)

k


0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

rel

k

PPR (top-k)GrassHopperDragonPDivRankCDivRankk-RLMGSparseBC1BC1 (relaxed)

soc-LiveJournal1, combined

10

100

1000

10000

5 10 20 50 100

time

(sec

)

k


10

100

1000

10000

5 10 20 50 100

time

(sec

)

k



0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

diff

k


0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

diff

k



Figure 2: Normalized relevance (rel) and di↵erentratio (di↵ ) scores with varying k. Dragon and GSparse

return results around 70% similar to the top-k rele-vant set, this is generally not enough to improve thediversity of the results.

5.3 ResultsWe experiment with the algorithms given in Section 2.3,

the datasets described in Section 5.1, and the queries definedin Section 5.2. For the methods that use the ranking scoresof PPR, we fix d = 0.9 and the number of PPR iterationsto 20 in order to be consistent between di↵erent queries.For the VRRW computation of DivRank methods, we set↵ = 0.25 and the number of iterations to 50 since VRRWusually takes more iterations to converge. All ranking func-tions are implemented e�ciently with sparse matrix-densevector multiplication (SpMxV) operations.

On amazon0601, ca-AstroPh, and soc-LiveJournal1

datasets, we observed that the results of di↵erent scenariosare similar. Hence, we combine the scenarios and displaythe results on all queries6. Also note that the results ofBC

2

and its relaxation are omitted from the plots of soc-

LiveJournal1 dataset because of the impractical runtimes.Normalized relevance (rel) and di↵erence ratio (di↵ ) plots

in Figure 2 show that Dragon and GSparse methods almostalways return the results having 70% similar items to top-krelevant set, and more than 80% rel score. A low rel scoreis not an indication of being dissimilar to the query (unlessrel ! 0); on the other hand, since the scores have a power-law distribution, a high rel score usually implies that thealgorithm ignored the diversity of the results and did notchange many results in order to keep the relevancy high.The actual di↵ measures are also given in Figure 2.

6Due to space limitation we only display one plot per observationhighlighted in the text. The complete set of plots for each dataset,scenario, and measure is provided in the supplementary material:http://bmi.osu.edu/hpc/data/Kucuktunc13WWW/results.pdf


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

5 10 20 50 100

σ2

k


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

5 10 20 50 100

σ2

k


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

5 10 20 50 100

σ2

k


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20 50 100

σ2

k


ca-AstroPh, combined

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20 50 100

σ2

k


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20 50 100

σ2

k


Figure 3: Coverage (�2

) of the algorithms with vary-ing k. BestCoverage and DivRank variants have thehighest coverage on the graphs while Dragon, GSparse,and k-RLM have similar coverages to top-k results.


0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k



0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


Figure 4: Expanded relevance (exprel2

) with vary-ing k. BC

1

and BC2

variants mostly score the best,GrassHopper performs high in soc-LiveJournal1. Al-though PDivRank gives the highest coverage on ama-zon0601 (Fig. 3), it fails to cover the relevant parts.

Based on the expansion ratios (�2

) in Figure 3, BestCov-erage and DivRank variants, especially PDivRank and BC

2

,have the highest scores, hence the highest coverage on thegraphs with their diversified result set. Dragon, GSparse,as well as k-RLM have expansion ratios similar to the top-kresults, meaning that these algorithms do not improve thecoverage of the given graphs enough. GSparse reduces theexpansion ratio even more than the top-k set, proving that itis inappropriate for the diversification task. It is importantto note that �

2

scores are meaningless by itself since query-oblivious greedy-�

2

algorithm would maximize the coverage.Figure 4 shows the proposed expanded relevance scores

(exprel2

) of the result sets. BC1

and BC2

variants are signif-icantly better than the other algorithms, where GrassHop-

per is able to score closer to BestCoverage only in soc-

LiveJournal1 dataset. Although DivRank variants per-form the highest based on expansion ratio (see Figure 3),their results are shown to be unable to cover the relevantparts of the graph as they score lower than BestCoverage

variants.For cit-Patents and web-Google datasets, we report

the results on queries of scenarios 1 and 3 separately. Herewe omit the results of scenario-2 queries since they are inbetween scenarios 1 and 3. These plots share the conclusionswe have made so far based on the results on previous threedatasets; however, they present di↵erent behavior based onthe chosen scenario, so we provide a deeper analysis on those.

top-k DRAGON


Results – coverage

•  l-step expansion ratio (σ2) gives the graph coverage of the result set: better coverage = better diversity

•  BestCoverage and DivRank variants, especially BC2 and PDivRank, have the highest coverage


0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

rel

k


0.1

1

10

100

5 10 20 50 100

time

(sec

)

k


0.1

1

10

100

5 10 20 50 100

time

(sec

)

k


0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

rel

k



10

100

1000

10000

5 10 20 50 100

time

(sec

)

k


10

100

1000

10000

5 10 20 50 100

time

(sec

)

k



0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

diff

k


0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

diff

k








datasets, we observed that the results of di↵erent scenariosare similar. Hence, we combine the scenarios and displaythe results on all queries6. Also note that the results ofBC

2

and its relaxation are omitted from the plots of soc-



6Due to space limitation we only display one plot per observationhighlighted in the text. The complete set of plots for each dataset,scenario, and measure is provided in the supplementary material:http://bmi.osu.edu/hpc/data/Kucuktunc13WWW/results.pdf


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

5 10 20 50 100

σ2

k


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

5 10 20 50 100

σ2

k


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

5 10 20 50 100

σ2

k


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20 50 100

σ2

k



0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20 50 100

σ2

k


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20 50 100

σ2

k



) of the algorithms with vary-ing k. BestCoverage and DivRank variants have thehighest coverage on the graphs while Dragon, GSparse,and k-RLM have similar coverages to top-k results.


0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k



0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k



) with vary-ing k. BC

1

and BC2

variants mostly score the best,GrassHopper performs high in soc-LiveJournal1. Al-though PDivRank gives the highest coverage on ama-zon0601 (Fig. 3), it fails to cover the relevant parts.

Based on the expansion ratios (�2

) in Figure 3, BestCov-erage and DivRank variants, especially PDivRank and BC

2

,have the highest scores, hence the highest coverage on thegraphs with their diversified result set. Dragon, GSparse,as well as k-RLM have expansion ratios similar to the top-kresults, meaning that these algorithms do not improve thecoverage of the given graphs enough. GSparse reduces theexpansion ratio even more than the top-k set, proving that itis inappropriate for the diversification task. It is importantto note that �

2

scores are meaningless by itself since query-oblivious greedy-�

2

algorithm would maximize the coverage.Figure 4 shows the proposed expanded relevance scores

(exprel2

) of the result sets. BC1

and BC2

variants are signif-icantly better than the other algorithms, where GrassHop-






Results – expanded relevance

•  combined measure for relevance and diversity •  BestCoverage variants and GrassHopper perform better •  Although PDivRank gives the highest coverage on

amazon graph, it fails to cover the relevant parts!

amazon0601, c o m b i n e d

0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

rel

k


0.1

1

10

100

5 10 20 50 100

time

(sec

)

k


0.1

1

10

100

5 10 20 50 100

time

(sec

)

k


0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

rel

k


soc-LiveJournal1, c o m b i n e d

10

100

1000

10000

5 10 20 50 100

time

(sec

)

k


10

100

1000

10000

5 10 20 50 100

time

(sec

)

k



0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

diff

k


0

0.2

0.4

0.6

0.8

1

5 10 20 50 100

diff

k








datasets, we observed that the results of di↵erent scenariosare similar. Hence, we combine the scenarios and displaythe results on all queries6 . Also note that the results ofBC2 and its relaxation are omitted from the plots of soc-



6 Due to space limitation we only display one plot per observationhighlighted in the text. The complete set of plots for each dataset,scenario, and measure is provided in the supplementary material:http://bmi.osu.edu/hpc/data/Kucuktunc13WWW/results.pdf


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

5 10 20 50 100

σ2

k


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

5 10 20 50 100

σ2

k


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

5 10 20 50 100σ

2k


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20 50 100

σ2

k


ca-AstroPh, c o m b i n e d

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20 50 100

σ2

k


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20 50 100

σ2

k


Figure 3: Coverage (�2 ) of the algorithms with vary-ing k. BestCoverage and DivRank variants have thehighest coverage on the graphs while Dragon, GSparse,and k-RLM have similar coverages to top-k results.


0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k



0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

5 10 20 50 100

expr

el2

k


Figure 4: Expanded relevance (exprel2 ) with vary-ing k. BC1 and BC2 variants mostly score the best,GrassHopper performs high in soc-LiveJournal1. Al-though PDivRank gives the highest coverage on ama-zon0601 (Fig. 3), it fails to cover the relevant parts.

Based on the expansion ratios (�2 ) in Figure 3, BestCov-erage and DivRank variants, especially PDivRank and BC2 ,have the highest scores, hence the highest coverage on thegraphs with their diversified result set. Dragon, GSparse,as well as k-RLM have expansion ratios similar to the top-kresults, meaning that these algorithms do not improve thecoverage of the given graphs enough. GSparse reduces theexpansion ratio even more than the top-k set, proving that itis inappropriate for the diversification task. It is importantto note that �2 scores are meaningless by itself since query-oblivious greedy-�2 algorithm would maximize the coverage.Figure 4 shows the proposed expanded relevance scores

(exprel2 ) of the result sets. BC1 and BC2 variants are signif-icantly better than the other algorithms, where GrassHop-





BC2

BC1

PDivRank

GrassHopper


Results – efficiency •  BC1 always performs

better, with a running time less than, DivRank and GrassHopper

•  BC1 (relaxed) offers reasonable diversity, with a very little overhead on top of the PPR computation

cit-Patents, scenario 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 10 20 50 100

expr

el2

k


0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

5 10 20 50 100

expr

el2

k



0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 10 20 50 100

expr

el2

k


0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 10 20 50 100

expr

el2

k


web-Google, scenario 1

0.8

0.81

0.82

0.83

0.84

0.85

0.86

0.87

0.88

5 10 20 50 100

expr

el2

k


0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0.86

5 10 20 50 100

expr

el2

k



0.8

0.81

0.82

0.83

0.84

0.85

0.86

0.87

0.88

5 10 20 50 100

expr

el2

k


0.8

0.81

0.82

0.83

0.84

0.85

0.86

0.87

0.88

5 10 20 50 100

expr

el2

k



) with varyingk. BestCoverage variants perform higher than usualon cit-Patents dataset with scenario-1 queries be-cause of the low average degree (� = 8.7) and lowclustering coe�cient (CC = 0.09) of the graph. Therelaxed algorithms perform closer to their originals,meaning that they were both e�cient and e↵ectiveon this type of sparsely connected graphs.


0

0.01

0.02

0.03

0.04

0.05

0.06

5 10 20 50 100

σ2

k


0

0.01

0.02

0.03

0.04

0.05

0.06

5 10 20 50 100

σ2

k


0

0.01

0.02

0.03

0.04

0.05

0.06

5 10 20 50 100

σ2

k


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 10 20 50 100

σ2

k



0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 10 20 50 100

σ2

k


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 10 20 50 100

σ2

k



) of the algorithms with vary-ing k. DivRank variants appear to be implicitly opti-mizing the size of the expansion set, without consid-ering whether those results are still relevant to thequery (cf. corresponding exprel

2

in Figure 5).

Figure 5 shows that the exprel

2

results on cit-Patents

dataset vary based on the scenario chosen to generate thequeries. In fact, the results are higher than normal forscenario-1 queries. This is because of the low average de-gree (� = 8.7) and low clustering coe�cient (CC = 0.09)of the graph. Also note that the relaxations of BC

1

and BC2

perform closer to BC1

and BC2

, meaning that the relaxed al-gorithms are both e�cient and also e↵ective on this type ofsparsely connected graphs.

It is also more clear on plots in Figure 6 that DivRank

variants implicitly optimize the expansion ratio (�2

) of the

0.01

0.1

1

10

5 10 20 50 100

time

(sec

)

k


0.01

0.1

1

10

5 10 20 50 100

time

(sec

)

k


0.01

0.1

1

10

5 10 20 50 100

time

(sec

)

k


0.01

0.1

1

10

5 10 20 50 100

time

(sec

)

k



10

100

1000

10000

5 10 20 50 100

time

(sec

)

k


0.1

1

10

100

5 10 20 50 100

time

(sec

)

k


10

100

1000

10000

5 10 20 50 100

time

(sec

)

k



10

100

1000

10000

5 10 20 50 100

time

(sec

)

k


10

100

1000

10000

5 10 20 50 100

time

(sec

)

k


10

100

1000

10000

5 10 20 50 100

time

(sec

)

k



1

10

100

1000

5 10 20 50 100

time

(sec

)

k



Figure 7: Running times of the algorithms withvarying k. BC

1

method always perform better with arunning time less than GrassHopper and DivRank vari-ants, while the relaxed versions score similarly witha slight overhead on top of the PPR computation.

results, but without considering whether those results arestill relevant to the query. As a striking example of scenario-1 queries on web-Google dataset, it is quite interesting tosee an algorithm to perform the best with respect to the sizeof the expansion set, but almost the worst with respect tothe relevancy of the same set (see Figure 5).With the runtime experiments shown in Figure 7, we

also confirm that the relaxed variants of BestCoverage per-form closer to their originals (see Figure 4) with an orderof magnitude or more gain in e�ciency. In all cases, evenin soc-LiveJournal1, which is the largest dataset in ourexperiments, the BC

1

method always performs better witha running time less than GrassHopper and DivRank vari-ants, while the relaxed version scores closer enough with arunning time slightly higher than the original PPR computa-tion. Therefore, in terms of the running times, the e�cientalgorithms are generally ordered according to PPR k-RLM BC

1

(relaxed) Dragon BC1

. Confirming the ob-servation in [16], DivRank variants are more e�cient thanGrassHopper for k > 10. Runtime of BC

2

depends on thedataset properties while its relaxed variant has comparablerunning times to DivRank variants. Both BC

2

and its varianthas a very high runtime on ca-AstroPh since this datasethas the highest average degree (� = 42.2) and the clusteringcoe�cient (CC = 0.63), hence, each exprel

2

computation ismore costly than the ones on other datasets.

5.4 Intent-aware resultsAmong the five datasets we selected for the experiments,

cit-Patents has the categorical information. One of the426 class labels was assigned to each patent, where thoseclasses hierarchically belong to 36 subtopics and 6 high-

BC1

BC1

BC1 BC1

DivRank DivRank

DivRank DivRank

BC1 (relaxed) BC1 (relaxed)

BC1 (relaxed) BC1 (relaxed)


Results – intent aware experiments

•  evaluation of intent-oblivious algorithms against intent-aware measures

•  two measures –  group coverage [Li11] –  S-recall [Zhai03]

•  cit-Patent dataset has the categorical information –  426 class labels, belong to 36 subtopics


Results – intent aware experiments

•  group coverage [Li11] –  How many different groups are covered by the results? –  omits the actual intent of the query

•  top-k results are not diverse enough •  AllRandom results cover the most number of groups •  PDivRank and BC2 follows

0

10

20

30

40

50

60

70

5 10 20 50 100

Cla

ss c

over

age

k

PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom

(a) Class coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

e

k


(b) Subtopic coverage

0

1

2

3

4

5

6

5 10 20 50 100

Topi

c co

vera

ge

k


(c) Topic coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

e

k


(d) S-recall on classes (e) S-recall on subtopics (f) S-recall on topics

Figure 8: Intent-aware results on cit-Patents dataset with scenario-3 queries.

level topics7. Here we present an evaluation of the intent-oblivious algorithms against intent-aware measures. Thisevaluation provides a validation of the diversification tech-niques with an external measure such as group coverage [14]and S-recall [23].

Intents of a query set Q is extracted by collecting theclasses, subtopics, and topics of each seed node. Since ouraim is to evaluate the results based on the coverage of dif-ferent groups, we only use scenario-3 queries that representmultiple interests.

One measure we are interested in is the group coverage asa diversity measure [14]. It computes the number of groupscovered by the result set and defined on classes, subtopics,and topics based on the intended level of granularity. How-ever, this measure omits the actual intent of a query, assum-ing that the intent is given with the classes of the seed nodes.

Subtopic recall (S-recall) has been defined as the percent-age of relevant subtopics covered by the result set [23]. Ithas also been redefined as Intent-Coverage [25], and used inthe experiments of [22]. S-recall of a result set S based onthe set of intents of the query I is computed with

S-recall(S, I) =1|I|

X

i2I

Bi

(S), (18)

where Bi

(S) is a binary variable indicating whether intent iis found in the results.

We give the results of group coverage and S-recall onclasses, subtopics, and topics in Figure 8. The algorithmsGrassHopper and GSparse are not included to the resultssince they perform worse than PPR. The results of AllRan-dom are included to give a comparison between the resultsof top-k relevant set (PPR) and ones chosen randomly.

As the group coverage plots show, top-k ranked items ofPPR do not have the necessary diversity in the result set,hence, the number of groups that are covered by these itemsare the lowest of all. On the other hand, a randomizedmethod brings irrelevant items from the search space with-out considering their relevance to the user query. The re-

7Available at: http://data.nber.org/patents/

sults of all of the diversification algorithms reside betweenthose two extremes, where the PDivRank covers the most,and Dragon covers the least number of groups.However, S-recall index measures whether a covered group

was actually useful or not. Obviously, AllRandom scores thelowest as it dismisses the actual query (you may omit the S-recall on topics since there are only 6 groups in this granular-ity level). Among the algorithms, BC

2

variants and BC1

scorethe best while BC

1

(relaxed) and DivRank variants havesimilar S-recall scores, even though BC

1

(relaxed) is a muchfaster algorithm than any DivRank variant (see Figure 7).

6. CONCLUSIONS AND FUTURE WORKIn this paper, we address the problem of evaluating re-

sult diversification as a bicriteria optimization problem witha relevance measure that ignores diversity, and a diversitymeasure that ignores relevance to the query. We prove it byrunning query-oblivious algorithms on two commonly usedcombination of objectives. Next, we argue that a result di-versification algorithm should be evaluated under a measurewhich tightly integrates the query in its value, and presenteda new measure called expanded relevance. Investigating var-ious quality indices by computing their pairwise correlation,we also show that this new measure has no direct correlationwith any other measure. In the second part of the paper,we analyze the complexity of the solution that maximizesthe expanded relevance of the results, and based on the sub-modularity property of the objective, we present a greedyalgorithm called BestCoverage, and its e�cient relaxation.We experimentally show that the relaxation carries no sig-nificant harm to the expanded relevance of the solution.As a future work, we plan to investigate the behavior of

the exprel

`

measure on social networks with ground-truthcommunities.

AcknowledgmentsThis work was supported in parts by the DOE grant DE-FC02-

06ER2775 and by the NSF grants CNS-0643969, OCI-0904809,

and OCI-0904802.

0

10

20

30

40

50

60

70

5 10 20 50 100

Cla

ss c

over

age

k


(a) Class coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

e

k



0

1

2

3

4

5

6

5 10 20 50 100

Topi

c co

vera

ge

k


(c) Topic coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

e

k









X

i2I

Bi

(S), (18)

where Bi







2

variants and BC1


1


1




the exprel

`




and OCI-0904802.

0

10

20

30

40

50

60

70

5 10 20 50 100

Cla

ss c

over

age

k


(a) Class coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

e

k



0

1

2

3

4

5

6

5 10 20 50 100

Topi

c co

vera

ge

k


(c) Topic coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

e

k









X

i2I

Bi

(S), (18)

where Bi







2

variants and BC1


1


1




the exprel

`




and OCI-0904802.

top-k top-k

random

BC2 BC2


0

10

20

30

40

50

60

70

5 10 20 50 100

Cla

ss c

over

age

k


(a) Class coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

e

k



0

1

2

3

4

5

6

5 10 20 50 100

Topi

c co

vera

ge

k


(c) Topic coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

e

k


0

0.1

0.2

0.3

0.4

0.5

0.6

5 10 20

S-re

call

(cla

ss)

k

(d) S-recall on classes

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 10 20S-

reca

ll (s

ubto

pic)

k

(e) S-recall on subtopics (f) S-recall on topics







X

i2I

Bi

(S), (18)

where Bi







2

variants and BC1


1


1




the exprel

`




and OCI-0904802.

Results – intent aware experiments •  S-recall [Zhai03], Intent-coverage [Zhu11]

–  percentage of relevant subtopics covered by the result set –  the intent is given with the classes of the seed nodes

•  AllRandom brings irrelevant items from the search space •  top-k results do not have the necessary diversity •  BC2 variants and BC1 perform better than DivRank •  BC1 (relaxed) and DivRank scores similar, but BC1r much faster

top-

k

rand

om

Div

Ran

k B

C

0

10

20

30

40

50

60

70

5 10 20 50 100

Cla

ss c

over

age

k


(a) Class coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

e

k



0

1

2

3

4

5

6

5 10 20 50 100

Topi

c co

vera

ge

k


(c) Topic coverage

0

5

10

15

20

25

30

5 10 20 50 100

Subt

opic

cov

erag

ek










X

i2I

Bi

(S), (18)

where Bi







2

variants and BC1


1


1




the exprel

`




and OCI-0904802.

rand

om

rand

om

rand

om

rand

om

rand

om

top-

k

top-

k

top-

k

top-

k

top-

k

Div

Ran

k

Div

Ran

k

Div

Ran

k

Div

Ran

k

DR

BC

BC

BC

BC

BC


Conclusions

•  Result diversification should not be evaluated as a bicriteria optimization problem with –  a relevance measure that ignores diversity, and –  a diversity measure that ignores relevancy

•  l-step expanded relevance is a simple measure that combines both relevance and diversity

•  BestCoverage, a greedy solution that maximizes exprell is a (1-1/e)-approximation of the optimal solution

•  BestCoverage variants perform better than others, its relaxation is extremely efficient

•  goodness in DRAGON is dominated by relevancy •  DivRank variants implicitly optimize expansion ratio


Thank you

•  For more information visit •  http://bmi.osu.edu/hpc

•  Research at the HPC Lab is funded by

Technology

Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms