Upload
surendra-gadwal
View
115
Download
0
Embed Size (px)
Citation preview
A Novel Target Marketing Approach based on Influence
Maximization
Motivation• “Businesses on Facebook and Twitter are reaching only 2% of
their fans and only 0.07% of follower actually interact with their post.” – Forrester Study, Nov. 17, 2014
• Local business owner need to target market people nearby, to increase footfall• Traditional methods of marketing like leafleting are inefficient• “82% people check review online before spending money on
product/service” – Nielsen Study, July 1, 2013
• Local businesses can use online review websites like Yelp, Zomato to target customers effectively.
Problem Statement• “To develop a novel approach for Identification of influential customers for target
marketing through Influence Maximization.”
Objectives
Fig. 1
Influence Maximization• It is problem to find K vertices in the graph such that under the diffusion model, the expected
number of vertices influenced by the K vertices (referred to as influence spread) is the largest possible• The Independent Cascade (IC) model is the simplest diffusion model. If j is a neighbor of i
then the probability of j being activated by i is:
Eq. 5
i j
pij
wij
Existing Work• Kemp et al. were first to study the optimization problem of influence maximization• Proved it to be a NP-hard problem, gave a time inefficient Greedy algorithm• GeneralGreedy repeats k rounds: in the ith round, select a node v that provides the largest
increase in influence spread• In each round influence spread is calculated by Monte-Carlo simulations.
Cont’d• Chen et al. developed NewGreedyIC, an improved Greedy algorithm• NewGreedyIC also runs Monte-Carlo simulations, but in each iteration it generates a random
graph G’ by randomly removing edges from the existing graph G. This makes the size of graph in that iteration smaller and hence is faster than GeneralGreedy method
Cont’d• Chen et al. also proposed a more efficient DegreeDiscount method• DegreeDiscount method doesn’t run Monte-Carlo simulations, it uses degree discount
heuristics where it is assumed that the spread increases with the degree of nodes. • It gives discount in the degree of a node by one if any of its neighbors have already been
selected in the set of active nodes.• It is 6 time faster than NewGreedyIC. It gives influence spread slightly lower than
NewGreedyIC.
[link]
A3
5
6
A
24
5
Inspiration from existing work• DegreeDiscount method eliminates need for Monte-Carlo simulations by using degree
heuristic.• This reduces running time compared to NewGreedy by manifold.
Research Gap• DegreeDiscount doesn’t take into account
the overlapping part of spread of two influential nodes• Due to which the total influence spread will
be lesser than sum of their individual influence spread• Our novel algorithm adds that node as kth
node which maximize difference between spread of already selected k-1 nodes and that of k nodes after addition• C-A has more difference in spread than B-A.
A
B
C
A
BC
Our Approach: ANIM(A Novel Influence Maximization approach)
Fig. 5
dbANIM
Yelp Dataset DescriptionName Attributes
User {user_id, name, review_count, average_stars, friends, yelping_since, elite}
Business {business_id, name, review_count, stars, address, latitude, longitude, categories}
Review {user_id, business_id, text, stars, votes}
Users 252,898
User-user edges(friendship) 955,999
Businesses 42,153
Reviews 1,125,458
Data and Preprocessing• The semi-structured data obtained from Yelp is stored in a Document Oriented database.• Preprocessing is done to clean the data.• Social network is formed from users who have reviewed similar nearby businesses.• Users are represented as nodes in the network, and two nodes are joined by an edge only if
they are friends.
Edge weight calculation in network • The weight of an edge between two users X and Y is calculated by the formula:
• w1 is the normalized count of mutual friends between X and Y
where nx and ny are the list of friends of user X and user Y.• w2 signifies the similarity in opinion of user X and user Y
where and • Xpos is the set of businesses that X rated positively; Xneg is the set of businesses that X rated
negatively. • We have considered a rating of 3 or below as negative review, and 4 or above as positive
review. [old]
Eq. 9
Eq. 7
Eq. 8
Propagation probability calculation in network • Propagation probability of an edge going from u to v was calculated by:
• Strength of an edge between u and v is the average of influence of u and v• Where• For popularity we used two attributes of the user, reviewCount and averageStars
• The clustering value is defined as the closeness of a node to a cluster of highly interconnected nodes.• C(v) is clustering value of a node given by:
• Quartiles were used for normalization.[link]
Eq. 17
Eq. 16
Eq. 15
Eq. 10
Eq. 11
wsANIM
Fig. 6
Our novel approach: spreadHeuristicIC Algorithm • Proposed algorithm is a greedy algorithm.• It iteratively finds a node and add it to the set S of top-K influential nodes.• While adding kth node to set S, it finds the node that maximize the difference between
spread of already selected k-1 nodes and spread of set S after adding that kth node.
A
BC
Pseudo code for the algorithm
Complexity Analysis• The algorithm take O(V) steps in line 3 and line 4 take O(T) time, where T is the time to
compute the coverage of a node in the graph G, and it takes O(IE) time (where I is the number of simulations for the Independent Cascade model, and E is the number of edges in graph G).• From lines 7-9, complexity of each line is O(VlgV) when we use sorting for union operation.• So, overall complexity of the algorithm is O(K(VIE + VlgV)).
Experiments and Results • We have conducted experiments for our algorithm and various other algorithms (i.e.- Degree
Discount algorithm, Single Discount algorithm, Degree Discount algorithm, General Greedy algorithm etc.) on Yelp’s network.• We find that the Spread Heuristic based algorithm has more influence spread compared to
the other algorithms. The ranking based on influence spread comes out to be:spreadHeuristicIC > newGreedyIC > degreeDiscountIC > random
Cont’dInfluence spread for G with n=1617, E=2058
0 10 20 30 40 50 60 70 800
50
100
150
200
250
300
degreeDiscountIC degreeDiscountIC2 degreeDiscountStardegreeHeuristic degreeHeuristic2 singleDiscounthighestDegree newGreedyIC randomHeuristicspreedHeuristic
K
Influ
ence
Spr
ead
Influence spread for G with n=4292, E=8147
0 10 20 30 40 50 60 70 800
50
100
150
200
250
degreeDiscountIC degreeDiscountIC2 degreeDiscountStardegreeHeuristic degreeHeuristic2 singleDiscounthighestDegree newGreedyIC randomHeuristicspreadHeuristic
K
Influ
ence
Spr
ead
Fig. 9Fig. 7
Cont’dRun time for G with n=1617, E=2058 Run time for G with n=4292, E=8147
0 10 20 30 40 50 60 70 800
10
20
30
40
50
60
70
80
degreeDiscountIC degreeDiscountIC2 degreeDiscountStardegreeHeuristic degreeHeuristic2 singleDiscounthighestDegree newGreedyIC randomHeuristicspreadHeuristic
K
Runn
ing
Tim
e (s
ec)
0 10 20 30 40 50 60 70 800
50
100
150
200
250
300
350
degreeDiscountIC degreeDiscountIC2 degreeDiscountStardegreeHeuristic degreeHeuristic2 singleDiscounthighestDegree newGreedyIC randomHeuristicspreedHeuristic
K
Runn
ing
Tim
e (s
ec)
Fig. 10Fig. 8
Conclusion• With respect to initial aims and objectives of this project, the final outcome is fairly
successful.• After series of experiments, we concluded that our algorithm outperforms existing influence
maximization algorithms.• We developed a dashboard for the businesses to visualize the influential users and their
spread among the people nearby.
References [1] M.E.J. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2) (2004) 026113. [2] Blondel, Vincent D., et al. "Fast unfolding of communities in large networks. "Journal of Statistical Mechanics: Theory and Experiment 2008.10 (2008):
P10008. [3]. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. S. Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 420–429, 2007. [4] “Yelp Dataset,” https://www.yelp.com/dataset challenge/dataset. [5]. D. Kempe, J. Kleinberg, E. Tardos. Maximizing the Spread of Influence through a Social Network. Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery
and Data Mining, 2003. [6]. M. Richardson, P. Domingos. Mining Knowledge-Sharing Sites for Viral Marketing. Eighth Intl. Conf. on Knowledge Discovery and Data Mining, 2002. [7] J. Goldenberg, B. Libai, E. Muller. Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth. Marketing Letters 12:3(2001),
211-223 [8] M. Granovetter, Threshold models of collective behavior, the American Journal of sociology, vol. 83, no. 6, pp.1420-1443, May 1978 [9] Chen, Wei, Yajun Wang, and Siyu Yang. "Efficient influence maximization in social networks." Proceedings of the 15th ACM SIGKDD international conference
on Knowledge discovery and data mining. ACM, 2009. [10] Kempe, David, Jon Kleinberg, and Éva Tardos. "Maximizing the spread of influence through a social network." Proceedings of the ninth ACM SIGKDD
international conference on Knowledge discovery and data mining. ACM, 2003. [11] Wang, Yu, et al. "Community-based greedy algorithm for mining top-k influential nodes in mobile social networks." Proceedings of the 16th ACM SIGKDD
international conference on Knowledge discovery and data mining. ACM, 2010. [12] Saito, Kazumi, Ryohei Nakano, and Masahiro Kimura. "Prediction of information diffusion probabilities for independent cascade model." Knowledge-Based
Intelligent Information and Engineering Systems. Springer Berlin Heidelberg, 2008. [13] Newman, Mark EJ. "Analysis of weighted networks." Physical Review E 70.5 (2004): 056131.