SOCIAL SCORES
Supervised by Dr. Dilum Bandara
E.A.M.M Edirisinghe138211N
Outline
2
• TunkRank – A Twitter Analog to PageRank• TwitterRank – Finding Topic-sensitive
Influential Twitterers• Influence Rank – An Efficient Social
Influence Measurement • Why your Klout score is meaningless• Research Questions Addressed
3
Research Questions• How do existing systems calculate social scores ?• Which parameters are representative of a user’s
true social influence ?• What are the desirable properties of a social score ?• How to vary the parameter weights across different
topics and applications ?• How to come up with a performance efficient
algorithm ?• How to calculate social score and update it in real
time ?
4
TunkRank A Twitter Analog to PageRank
5TunkRank – A Twitter Analog to PageRank
TunkRank• Proposed by Daniel Tunkelang in 2009• Implemented by Jason Adams• Assumptions:– Influence(X) – Expected number of people who will
read a tweet that X tweets– Probability that X will read a tweet posted by Y 1/||
Following(X)|| X - a member of Followers(Y) Following(X) - the set of people that X follows– If X reads a tweet from Y, there’s a constant probability
p that X will retweet it.
6
TunkRank
• Hard to game• Will address the inflation that occurs from people
who follow in the hopes of reciprocity• Doesn’t consider how a person allocates his
attention among the people he follow
TunkRank – A Twitter Analog to PageRank
7
TwitterRank Finding Topic-sensitive Influential
Users
8
TwitterRank
Two main contributions• Report homophily in Twitter• Introduce TwitterRank to measure
topic sensitive influence of twitterers
TwitterRank – Finding Topic-sensitive Influential Users
9
Framework for the Proposed Approach
Topic
Distillation
Topic-specificRelationship
NetworkConstruction
Topic-sensitive
User InfluenceRanking
TwitterRank – Finding Topic-sensitive Influential Users
10
Dataset
• Consider a set of top-1000 Singapore-based twitterers S, |S|=996.
• Crawled all followers and friends of each s S & ∈stored them in set S’.
• Let S’’= S S’, & S* = {s|s S’’, s is from Singapore}.∪ ∈
|S*| = 6748.
For each s S*, crawled all the tweets published, T. ∈|T|=1,021,039.
TwitterRank – Finding Topic-sensitive Influential Users
11
Reciprocity in Following Relationships
• 72.4% of the twitterers follow more than 80% of their followers
• 80.5% of the twitterers have 80% of their friends follow them back
Casual following or homophily?
TwitterRank – Finding Topic-sensitive Influential Users
12
Homophily in Twitter
•Question 1: Are twitterers with “following” relationships more similar than those without according to the topics they are interested in?•Question 2:
Are twitterers with reciprocal “following” relationships more similar than those without according to the topics they are interested in?
TwitterRank – Finding Topic-sensitive Influential Users
13
Topic Modeling
• Goal:
Automatically identify the topics that twitterers are
interested in based on the tweets they published.
• Latent Dirichlet Allocation (LDA) model is applied
TwitterRank – Finding Topic-sensitive Influential Users
14
Topic Modeling Results
DT — D×T matrix
D : No of users
T : No of topics
DTij : No of times a word in user si’s tweets has
been assigned to topic tj.
TwitterRank – Finding Topic-sensitive Influential Users
15
Hypothesis Testing• Applied on a set of twitterers who publish more than 10 tweets in total,
| | = 4050.
• Row normalize the DT matrix as DT’ such that ||DT’i ·||1=1 for each
row DT’i .
• Thus each row of matrix DT’ is basically the probability distribution of twitterer si’s interest over the T topics.
• Measure the topical difference between twitterers
• Formalize each question with two sample t-tests and proves the existence of homophily in the Twitter dataset
There are twitterers who are serious in following others.
*uS
TwitterRank – Finding Topic-sensitive Influential Users
*uS
16
Topic Specific Twitter Rank• Forms a directed graph D(V,E)– edge between two twitterers if there is “following” relationship
between them– edge is directed from follower to friend.
• A topic-specific random walk model is applied to calculate the
user’s influential score.• The transition matrix for topic t, denoted as Pt . The transition
probability of surfer from follower si to friend sj is:
:
| |( , ) * ( , )
| |i a
jt t
aa s s
Tp i j sim i j
T
' '( , ) 1 | |t it jtsim i j DT DT
TwitterRank – Finding Topic-sensitive Influential Users
17
Topic Specific Twitter Rank• Topic-specific teleportation:
• The influence scores of twitters are calculated iteratively:
• Aggregation of topic-specific TwitterRank:
''t tE DT
(1 )t t t tTR P TR E
t tt
TR r TR
TwitterRank – Finding Topic-sensitive Influential Users
18
Review
• Homophily does exist• Still some follow not because of the topical
similarity
• Easy to game• Need to discuss an incremental approach to
topic distillation
TwitterRank – Finding Topic-sensitive Influential Users
19
InfluenceRank An Efficient Social Influence
Measurement
20
InfluenceRank• Define the influence of a user from two perspectives
– Users Relative Influence – Users Network Global Influence
• Define the micro blog network asSN = (G,B,I)G - link network structure,B - set of interactive behaviors between each pair of associated
usersI - set of profile information of each user
• Graph G = (V,E) V - set of nodes represented by user’s index E - set of directed edges
InfluenceRank – An Efficient Social Influence Measurement
21
InfluenceRank• Define the set of behaviors B, B = (R,C,M) (R - Retweets), (C - comment), (M - mention)• Define the profile information set I, I = (P,T,K), P - set of number of postings T - set of users’ interest tags K - set of users’ content keywords.
InfluenceRank – An Efficient Social Influence Measurement
22
Metrics Explored• No of followers• Quality of followers• Quality of tweets
• Similarity of interests– Similarity of user interest tags
Similarity of user interest tags function TS(vi,vj)
Calculates the similarity of interest tags between node vi & vj
– Similarity of user content keywordsSimilarity of user content keywords function KS(vi,vj)
Calculates the similarity of interests between node vi & vj based on their content keyword set
Similarity of two users’ interestsSim(vi,vj) = TS(vi,vj) + KS(vi,vj)
InfluenceRank – An Efficient Social Influence Measurement
23
User Relative Influence Rank• Consider three aspects
– The quality of postings– Ratio of retweeting behavior– Similarity of interests
• Users Relative Influence Function RI(vi,vj) RI(vi,vj) = Q(vi) + R(vi,vj) + Sim(vi,vj)
R(vi,vj) represents the ratio of retweet of user vj to vi.
InfluenceRank – An Efficient Social Influence Measurement
24
Users Network Global Influence Rank• Combines structural features and users’ behavior
characteristics• User Network Global Influence Rank Function,
Influence(v)
Damping factor (λ) =0.85
InfluenceRank – An Efficient Social Influence Measurement
25
Influence Rank Algorithm
Time complexity - O(e)
InfluenceRank – An Efficient Social Influence Measurement
26
Influence Rank Algorithm• Evaluated with a dataset of Tencent Weibo, contrast
with the TunkRank algorithm • Emphasis on users’ interactive behaviors• Weight of each metric considered to measure the
user’s relative influence is taken as equal• Instead of similarity of topics considers the similarity
of keywords• Ignore the impact of negative comments and
conversations• Model is based on a snapshot of current relationships
and interactions
InfluenceRank – An Efficient Social Influence Measurement
27
Why your Klout score is meaningless
28
Why your Klout score is meaningless ?
Klout is far more similar to a derived measurement
inconsistent and not trustworthy individually
Why your Klout score is meaningless
29
What should Klout score satisfy ?
• Ordering by Klout should make sense in the real world • The score should not be easy to
game• The score should be monotonic
Why your Klout score is meaningless
30
Klout score comparisons
• A set of individuals with Klout in the 40-49 range• A set of individuals with Klout in the
55-64 range• A set of individuals with Klout in the
70-79 range• A set of individuals with Klout >= 80
Why your Klout score is meaningless
31
Group3 (Klout 70-79)• U1-Tim Ferriss – Author of the 4 Hour Workweek
and 4 Hour Body• U2-Jack Dorsey – Executive Chairman of Twitter
and CEO of Square• U3-Matt Cutts– Head of web spam team at Google• U4-MG Siegler – Writer for Techcrunch• U5-Klout – Influence score service• U6-David Pogue – Tech guy from the NYT• U7-Jeffrey Zeldman – designer, writer, and
publisher
Why your Klout score is meaningless
32
Group3 (Klout 70-79)U1 U2 U3 U4 U5 U6 U7
As per 29th May 2011
Why your Klout score is meaningless
33
Klout violates the Desirable Properties• Connecting an additional account will always increase the Klout
score.• The degree to which followers are influential seems to be
irrelevant or matter very little• The differential between number of people someone follow
seems to be irrelevant or matter very little.• In terms of value to the Klout score: follow < Retweets < unique Retwitters < unique mention
can be inconsistent• In terms of value to the Klout score: like < comment
can be inconsistent
Why your Klout score is meaningless
34
Research Questions Addressed
How do existing systems calculate social scores
35
Research Questions Addressed
Research Questions Addressed
TunkRank Probability that the follower will read a tweet posted by the followeeProbability a tweet read will be retweetedNumber of followers and their influence
TwitterRank Measures the topic-sensitive influence of twitterersConsiders the similarity between friends on topicsNumber of tweets published by all friends
InflueceRank Defines a user’s relative and global influenceNumber of followersQuality of followersQuality of tweetsSimilarity of interests
Influence Measure with a Network Amplification Score
Accounts the content and conversation generated by considering indegree and outdegree of the social network for multiple levels
Which parameters are representative of a user’s true social influence• It’s not just the number of followers or the number
of friends• Link structure• Following relationship• Similarity• Interactions• Topics and communities• Quality of followers, tweets etc.
36
Research Questions Addressed
Research Questions Addressed
What are the desirable properties of a social score• Ordering by the score should make sense in
the real world• The score should not be easy to game• The score should be monotonic• Equation should be simple & easy to
understand/interpret• Should be meaningful
37
Research Questions Addressed
Research Questions Addressed
• Daniel Tunkelang. (2009, Jan 13). A Twitter Analog to PageRank [Online]. Available: http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/
• Neal Richter. (2009, Feb 18). TunkRank Scoring Improvement [Online]. Available: http://aicoder.blogspot.com/2009/02/tunkrank-scoring-improvement.html
• Jianshu Weng et al., “TwitterRank: Finding Topic-sensitive Influencial Twitterers,” in WSDM Conf., New York, USA, 2010, pp. 261-270
• Wenlong Chen et al., “InfluenceRank: An Efficient Social Influence Measurement for Millions of Users in Microblog,” in 2nd Int. Conf. on CGC, Xiangtan, 2012, pp. 563-570
• Alex Braunstein. (2011, June 01). Why your Klout score is meaningless [Online]. Available: http://alexbraunstein.com/2011/06/01/why-your-klout-score-is-meaningless/
• Sean Golliher. (2011, June 27). How I Reverse Engineered Klout Score to an R2 = 0.94. [Online]. Available:http://www.seangolliher.com/2011/uncategorized/how-i-reversed-engineered-klout-score-to-an-r2-094/
38
References
Thank You
39