A Novel Target Marketing Approach based on Influence Maximization

A Novel Target Marketing Approach based on Influence

Maximization

Motivation• “Businesses on Facebook and Twitter are reaching only 2% of

their fans and only 0.07% of follower actually interact with their post.” – Forrester Study, Nov. 17, 2014

• Local business owner need to target market people nearby, to increase footfall• Traditional methods of marketing like leafleting are inefficient• “82% people check review online before spending money on

product/service” – Nielsen Study, July 1, 2013

• Local businesses can use online review websites like Yelp, Zomato to target customers effectively.

Problem Statement• “To develop a novel approach for Identification of influential customers for target

marketing through Influence Maximization.”

Objectives

Fig. 1

Influence Maximization• It is problem to find K vertices in the graph such that under the diffusion model, the expected

number of vertices influenced by the K vertices (referred to as influence spread) is the largest possible• The Independent Cascade (IC) model is the simplest diffusion model. If j is a neighbor of i

then the probability of j being activated by i is:

Eq. 5

i j

pij

wij

Existing Work• Kemp et al. were first to study the optimization problem of influence maximization• Proved it to be a NP-hard problem, gave a time inefficient Greedy algorithm• GeneralGreedy repeats k rounds: in the ith round, select a node v that provides the largest

increase in influence spread• In each round influence spread is calculated by Monte-Carlo simulations.

Cont’d• Chen et al. developed NewGreedyIC, an improved Greedy algorithm• NewGreedyIC also runs Monte-Carlo simulations, but in each iteration it generates a random

graph G’ by randomly removing edges from the existing graph G. This makes the size of graph in that iteration smaller and hence is faster than GeneralGreedy method

Cont’d• Chen et al. also proposed a more efficient DegreeDiscount method• DegreeDiscount method doesn’t run Monte-Carlo simulations, it uses degree discount

heuristics where it is assumed that the spread increases with the degree of nodes. • It gives discount in the degree of a node by one if any of its neighbors have already been

selected in the set of active nodes.• It is 6 time faster than NewGreedyIC. It gives influence spread slightly lower than

NewGreedyIC.

[link]

A3

5

6

A

24

5

Inspiration from existing work• DegreeDiscount method eliminates need for Monte-Carlo simulations by using degree

heuristic.• This reduces running time compared to NewGreedy by manifold.

Research Gap• DegreeDiscount doesn’t take into account

the overlapping part of spread of two influential nodes• Due to which the total influence spread will

be lesser than sum of their individual influence spread• Our novel algorithm adds that node as kth

node which maximize difference between spread of already selected k-1 nodes and that of k nodes after addition• C-A has more difference in spread than B-A.

A

B

C

A

BC

Our Approach: ANIM(A Novel Influence Maximization approach)

Fig. 5

dbANIM

Yelp Dataset DescriptionName Attributes

User {user_id, name, review_count, average_stars, friends, yelping_since, elite}

Business {business_id, name, review_count, stars, address, latitude, longitude, categories}

Review {user_id, business_id, text, stars, votes}

Users 252,898

User-user edges(friendship) 955,999

Businesses 42,153

Reviews 1,125,458

Data and Preprocessing• The semi-structured data obtained from Yelp is stored in a Document Oriented database.• Preprocessing is done to clean the data.• Social network is formed from users who have reviewed similar nearby businesses.• Users are represented as nodes in the network, and two nodes are joined by an edge only if

they are friends.

Edge weight calculation in network • The weight of an edge between two users X and Y is calculated by the formula:

• w1 is the normalized count of mutual friends between X and Y

where nx and ny are the list of friends of user X and user Y.• w2 signifies the similarity in opinion of user X and user Y

where and • Xpos is the set of businesses that X rated positively; Xneg is the set of businesses that X rated

negatively. • We have considered a rating of 3 or below as negative review, and 4 or above as positive

review. [old]

Eq. 9

Eq. 7

Eq. 8

Propagation probability calculation in network • Propagation probability of an edge going from u to v was calculated by:

• Strength of an edge between u and v is the average of influence of u and v• Where• For popularity we used two attributes of the user, reviewCount and averageStars

• The clustering value is defined as the closeness of a node to a cluster of highly interconnected nodes.• C(v) is clustering value of a node given by:

• Quartiles were used for normalization.[link]

Eq. 17

Eq. 16

Eq. 15

Eq. 10

Eq. 11

wsANIM

Fig. 6

Our novel approach: spreadHeuristicIC Algorithm • Proposed algorithm is a greedy algorithm.• It iteratively finds a node and add it to the set S of top-K influential nodes.• While adding kth node to set S, it finds the node that maximize the difference between

spread of already selected k-1 nodes and spread of set S after adding that kth node.

A

BC

Pseudo code for the algorithm

Complexity Analysis• The algorithm take O(V) steps in line 3 and line 4 take O(T) time, where T is the time to

compute the coverage of a node in the graph G, and it takes O(IE) time (where I is the number of simulations for the Independent Cascade model, and E is the number of edges in graph G).• From lines 7-9, complexity of each line is O(VlgV) when we use sorting for union operation.• So, overall complexity of the algorithm is O(K(VIE + VlgV)).

Experiments and Results • We have conducted experiments for our algorithm and various other algorithms (i.e.- Degree

Discount algorithm, Single Discount algorithm, Degree Discount algorithm, General Greedy algorithm etc.) on Yelp’s network.• We find that the Spread Heuristic based algorithm has more influence spread compared to

the other algorithms. The ranking based on influence spread comes out to be:spreadHeuristicIC > newGreedyIC > degreeDiscountIC > random

Cont’dInfluence spread for G with n=1617, E=2058

0 10 20 30 40 50 60 70 800

50

100

150

200

250

300

degreeDiscountIC degreeDiscountIC2 degreeDiscountStardegreeHeuristic degreeHeuristic2 singleDiscounthighestDegree newGreedyIC randomHeuristicspreedHeuristic

K

Influ

ence

Spr

ead

Influence spread for G with n=4292, E=8147

0 10 20 30 40 50 60 70 800

50

100

150

200

250

degreeDiscountIC degreeDiscountIC2 degreeDiscountStardegreeHeuristic degreeHeuristic2 singleDiscounthighestDegree newGreedyIC randomHeuristicspreadHeuristic

K

Influ

ence

Spr

ead

Fig. 9Fig. 7

Cont’dRun time for G with n=1617, E=2058 Run time for G with n=4292, E=8147

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

degreeDiscountIC degreeDiscountIC2 degreeDiscountStardegreeHeuristic degreeHeuristic2 singleDiscounthighestDegree newGreedyIC randomHeuristicspreadHeuristic

K

Runn

ing

Tim

e (s

ec)

0 10 20 30 40 50 60 70 800

50

100

150

200

250

300

350

degreeDiscountIC degreeDiscountIC2 degreeDiscountStardegreeHeuristic degreeHeuristic2 singleDiscounthighestDegree newGreedyIC randomHeuristicspreedHeuristic

K

Runn

ing

Tim

e (s

ec)

Fig. 10Fig. 8

Conclusion• With respect to initial aims and objectives of this project, the final outcome is fairly

successful.• After series of experiments, we concluded that our algorithm outperforms existing influence

maximization algorithms.• We developed a dashboard for the businesses to visualize the influential users and their

spread among the people nearby.

References [1] M.E.J. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2) (2004) 026113. [2] Blondel, Vincent D., et al. "Fast unfolding of communities in large networks. "Journal of Statistical Mechanics: Theory and Experiment 2008.10 (2008):

P10008. [3]. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. S. Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th

ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 420–429, 2007. [4] “Yelp Dataset,” https://www.yelp.com/dataset challenge/dataset. [5]. D. Kempe, J. Kleinberg, E. Tardos. Maximizing the Spread of Influence through a Social Network. Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery

and Data Mining, 2003. [6]. M. Richardson, P. Domingos. Mining Knowledge-Sharing Sites for Viral Marketing. Eighth Intl. Conf. on Knowledge Discovery and Data Mining, 2002. [7] J. Goldenberg, B. Libai, E. Muller. Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth. Marketing Letters 12:3(2001),

211-223 [8] M. Granovetter, Threshold models of collective behavior, the American Journal of sociology, vol. 83, no. 6, pp.1420-1443, May 1978 [9] Chen, Wei, Yajun Wang, and Siyu Yang. "Efficient influence maximization in social networks." Proceedings of the 15th ACM SIGKDD international conference

on Knowledge discovery and data mining. ACM, 2009. [10] Kempe, David, Jon Kleinberg, and Éva Tardos. "Maximizing the spread of influence through a social network." Proceedings of the ninth ACM SIGKDD

international conference on Knowledge discovery and data mining. ACM, 2003. [11] Wang, Yu, et al. "Community-based greedy algorithm for mining top-k influential nodes in mobile social networks." Proceedings of the 16th ACM SIGKDD

international conference on Knowledge discovery and data mining. ACM, 2010. [12] Saito, Kazumi, Ryohei Nakano, and Masahiro Kimura. "Prediction of information diffusion probabilities for independent cascade model." Knowledge-Based

Intelligent Information and Engineering Systems. Springer Berlin Heidelberg, 2008. [13] Newman, Mark EJ. "Analysis of weighted networks." Physical Review E 70.5 (2004): 056131.

Documents

A Novel Target Marketing Approach based on Influence Maximization