Upload
vinny
View
39
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Microsoft Instant Messenger Communication Network How does the world communicate?. Jure Leskovec ([email protected]) Machine Learning Department http://www.cs.cmu.edu/~ jure. Joint work with: Eric Horvitz, Microsoft Research. Networks: Why?. - PowerPoint PPT Presentation
Citation preview
Microsoft Instant Messenger Communication NetworkHow does the world communicate?
Jure Leskovec ([email protected])Machine Learning Departmenthttp://www.cs.cmu.edu/~jure
Joint work with: Eric Horvitz, Microsoft Research
Networks: Why?
Today: large on-line systems leave detailed records of social activity On-line communities: MyScace, Facebook Email, blogging, instant messaging On-line publications repositories, arXiv, MedLine
Emerging behavior (need lots of data): Actions of individual nodes are independent
but global patterns and regularities emerge
3
The Largest Social Network
What is the largest social network in the world (that we can relatively easily obtain)?
For the first time we had a chance to look at complete (anonymized) communication of the whole planet (using Microsoft MSN instant messenger network)
4
Instant Messaging
• Contact (buddy) list• Messaging window
5
Instant Messaging as a Network
Buddy Conversation
6
IM – Phenomena at planetary scale
Observe social phenomena at planetary scale: How does communication change with user
demographics (distance, age, sex)? How does geography affect communication? What is the structure of the communication
network?
7
Communication data
The record of communication Presence data
user status events (login, status change) Communication data
who talks to whom Demographics data
user age, sex, location
8
Data description: Presence
Events: Login, Logout Is this first ever login Add/Remove/Block buddy Add unregistered buddy (invite new user) Change of status (busy, away, BRB, Idle,…)
For each event: User Id Time
9
Data description: Communication
For every conversation (session) we have a list of users who participated in the conversation
There can be multiple people per conversation For each conversation and each user:
User Id Time Joined Time Left Number of Messages Sent Number of Messages Received
10
Data description: Demographics
For every user (self reported): Age Gender Location (Country, ZIP) Language IP address (we can do reverse geo IP lookup)
11
Data collection
Log size: 150Gb/day Just copying over the network takes 8 to 10h Parsing and processing takes another 4 to 6h After parsing and compressing ~ 45 Gb/day Collected data for 30 days of June 2006:
Total: 1.3Tb of compressed data
12
Network: Conversations
Conversation
13
Data statistics
Activity over June 2006 (30 days) 245 million users logged in 180 million users engaged in conversations 17,5 million new accounts activated More than 30 billion conversations
14
Data statistics per day
Activity on June 1 2006 1 billion conversations 93 million users login 65 million different users talk (exchange
messages) 1.5 million invitations for new accounts sent
15
User characteristics: age
16
Age piramid: MSN vs. the world
17
Conversation: Who talks to whom? Cross gender edges:
300 male-male and 235 female-female edges 640 million female-male edges
18
Number of people per conversation
Max number of people simultaneously talking is 20, but conversation can have more people
19
Conversation duration
Most conversations are short
20
Conversations: number of messages
Sessions between fewer people run out of steam
21
Time between conversations Individuals are highly diverse What is probability to login into
the system after t minutes? Power-law with exponent 1.5 Task queuing model [Barabasi]
My email, Darvin’s and Einstein’s letters follow the same pattern
22
Age: Number of conversationsU
ser s
elf r
epor
ted
age
High
Low
23
Age: Total conversation durationU
ser s
elf r
epor
ted
age
High
Low
24
Age: Messages per conversationU
ser s
elf r
epor
ted
age
High
Low
25
Age: Messages per unit timeU
ser s
elf r
epor
ted
age
High
Low
26
Who talks to whom: Number of conversations
27
Who talks to whom: Conversation duration
28
Geography and communication
Count the number of users logging in from particular location on the earth
29
How is Europe talking Logins from Europe
30
Users per geo location
Blue circles have more than 1 million
logins.
31
Users per capita
Fraction of population using MSN:•Iceland: 35%•Spain: 28%•Netherlands, Canada, Sweden, Norway: 26%•France, UK: 18%•USA, Brazil: 8%
32
Communication heat map
For each conversation between geo points (A,B) we increase the intensity on the line between A and B
33
Correlation: Probability:
Homophily (gliha v kup štriha) Ag
e vs
. Age
34
Per country statistics
On a particular typical day…Country # of logins # of users # of messages Messages per user
USA 38,319,363 13,261,337 412,729,278 31.12Brazil 20,582,613 7,864,424 467,972,522 59.50France 19,163,131 6,475,858 518,931,785 80.13Unknown 18,444,352 6,872,347 191,167,085 27.81Spain 16,868,549 6,140,895 503,759,240 82.03UK 16,659,009 5,724,826 487,018,470 85.07Canada 14,558,692 5,021,185 160,249,686 31.91China 14,225,163 5,314,463 101,003,729 19.00Turkey 13,619,789 4,696,555 353,540,475 75.27Mexico 10,756,989 4,359,932 209,195,100 47.98
Note that global usage and market share statistics are higher if we accumulate data over longer time periods.
35
Per typical user per country
On a typical day MSN user from a country …Country
Logins on a particular day
Users on a particular day
Messages sent
Messages per user
Slovenia 364,988 130,884 15,919,892 121.6335992Malta 122,846 41,829 4,993,316 119.3745009Hungary 1,214,268 427,320 47,623,604 111.4471684Bosnia 105,584 35,689 3,254,170 91.18131637Teunion 100,335 33,399 3,041,635 91.0696428Gibraltar 19,096 6,452 581,195 90.07982021UK 16,659,009 5,724,826 487,018,470 85.07131396Macedonia 126,729 43,754 3,669,977 83.87751977Netherlands 7,399,160 2,696,669 221,300,210 82.06428375Spain 16,868,549 6,140,895 503,759,240 82.03352117
Note that global usage and market share numbers are higher if we accumulate data over longer time periods.
36
What about Slovenia (per capita)?
Statistic Number Rank (per capita)
Conversations inside 19,868,886 22Conversation to outside 7,868,483 48Total conversations 27,737,369 29Avg. time inside 309.49 147Avg. time outside 314.39 80Avg. time inside (pct.) 0.4960Messages sent inside 9.78 32Messages sent outside 9.46 19Messages inside (pct.) 0.5083
37
Who is Slovenia talking to?Rank Target
CountryPairs of people
Number of conversations
Avg. time per conv.
Avg. # of messages
1 Slovenia 13,41,250 19,868,886 309.4 9.782 USA 61,794 922,527 303.4 9.143 Spain 27,650 310,357 289.4 7.974 UK 14,709 204,335 325.4 9.025 Germany 9,047 129,551 350.3 10.206 Bosnia 9,956 114,509 385.9 14.627 Yugoslavia 8,194 104,270 381.7 12.558 Italy 8,612 100,698 358.8 9.899 Croatia 6,838 84,362 359.0 11.00
10 Turkey 10,763 77,651 292.4 8.0811 Albania 9,517 76,440 320.7 10.8812 Sweden 5,083 69,019 306.9 8.3413 Netherlands 5,061 68,287 315.9 8.8714 Canada 5,003 60,617 301.8 7.38
38
Instant Messaging as a Network
Buddy
39
IM Communication Network
Buddy graph: 240 million people (people that login in June ’06) 9.1 billion edges (friendship links)
Communication graph: There is an edge if the users exchanged at least
one message in June 2006 180 million people 1.3 billion edges 30 billion conversations
40
Buddy network: Number of buddies
Buddy graph: 240 million nodes, 9.1 billion edges (~40 buddies per user)
Network: Small-world
6 degrees of separation [Milgram ’60s] Average distance 5.5 90% of nodes can be reached in < 8 hops
Hops Nodes1 10
2 78
3 396
4 8648
5 3299252
6 28395849
7 79059497
8 52995778
9 10321008
10 1955007
11 518410
12 149945
13 44616
14 13740
15 4476
16 1542
17 536
18 167
19 71
20 29
21 16
22 10
23 3
24 2
25 3
43
Network: Searchability Milgram’s experiment showed:
(1) short paths exist in networks (2) humans are able to find them
Assume the following setting: Nodes are scattered on a plane Given starting node u and we want to
reach target node v Algorithm: always navigate to a
neighbor that is geographically closest to target node v
Surprise: Geo-routing finds the short paths (for appropriate distance measure)
u
v
44
Communication network: Clustering
How many triangles are closed?
Clustering normally decays as k-1
Communication network is highly clustered: k-0.37
High clustering Low clustering
45
Communication Network Connectivity
46
k-Cores decomposition
What is the structure of the core of the network?
47
k-Cores: core of the network
People with k<20 are the periphery Core is composed of 79 people, each having 68 edges
among them
48
Network robustness
We delete nodes (in some order) and observe how network falls apart: Number of edges deleted Size of largest connected component
49
Robustness: Nodes vs. Edges
50
Robustness: Connectivity
51
Conclusion
A first look at planetary scale social network The largest social network analyzed
Strong presence of homophily: people that communicate share attributes
Well connected: in only few hops one can research most of the network
Very robust: Many (random) people can be removed and the network is still connected
52
References
Leskovec and Horvitz: Worldwide Buzz: Planetary-Scale Views on an Instant-Messaging Network, 2007
http://www.cs.cmu.edu/~jure