40
Geographic routing in social networks David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins Presentation prepared by Dor Medalsy

Geogra phic routing in social networks David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins Presentation prepared by Dor Medalsy

Embed Size (px)

Citation preview

  • Slide 1
  • Geogra phic routing in social networks David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins Presentation prepared by Dor Medalsy
  • Slide 2
  • Introduction Anecdotal evidence that we live in a small world, where arbitrary pairs of people are connected through extremely short chains of intermediary friends, is ubiquitous. Experimental studies have verified this property in real social networks, and theoretical models have been advanced to explain it we introduce a richer model relating geography and social-network friendship.
  • Slide 3
  • Milgrams experiment Sociological experiments, beginning with the seminal work of Milgram have shown that a source person can transmit a message to a target through only a small number of intermediate friends, using only scant information about the targets geography and occupation. The successful messages passed from source to target through six intermediaries, six degrees of separation.
  • Slide 4
  • Milgrams experiment The explanation was: Random graphs have small diameter. Only two minor problems: o a bad model of social networks. o doesnt explain the small-world phenomenon six degrees of separation.
  • Slide 5
  • Geographic dimension As part of the recent surge of interest in networks, there has been active research exploring strategies for navigating synthetic and small-scale social networks including routing through common membership in groups, popularity, and geographic proximity. Subjects report that geography and occupation are by far the two most important dimensions in choosing the next step in the chain. Geography tends to predominate in early steps.
  • Slide 6
  • Adding nongeographic dimensions to routing strategies, especially once the chain has arrived at a point geographically close to the target, can make routing more efficient. However, geography appears to be the single most valuable dimension for routing, and we are thus interested in understanding how powerful geography alone may be. Question: what is the connection between friendship and geography, and to what extent can this connection explain the navigability of large-scale real-world social networks?
  • Slide 7
  • We present a study that combines measurements of the role of geography in a large social network with theoretical modeling of path discovery, using the measurements to validate and inform the theoretical results. I. A simulation-based study on a 500,000 person online social network reveals that routing through geographic information alone allows people to discover short paths to a target city. II. 70% of friendships are derived from geographical processes.
  • Slide 8
  • III. Existing models that predict the probability of friendship solely on the basis of geographic distance are too weak to explain these friendships, rendering previous theoretical results inapplicable. IV. Density-aware model of friendship formation called rank-based friendship, relating the probability that a person befriends a particular candidate to the inverse of the number of closer candidates. We are able to prove that the presence of rank-based friendship for any population density implies that the social network will contain discoverable short paths to small destination region under geographic routing.
  • Slide 9
  • The LiveJournal Community Online blogging community. 1.3 million ( 500,000-in the continental USA) in February 2004. LiveJournal users provide: o Disturbingly detailed accounts of their personal lives. o Profiles (geographic location, topical interests, explicit list of other bloggers whom he or she considers to be a friend). o Using geographic location, compute longitude/latitude of users. The resolution of our geographic data is limited to the level of towns and cities. So we Study problem of global routing.
  • Slide 10
  • The LiveJournal Community In our study our goal is to direct a message to the targets city by geographic factors only. Once the proper locality has been reached, a local routing problem must then be solved to move the message from the correct city down to the correct person by using a wide set of potential nongeographic factors, like interests or profession.
  • Slide 11
  • The LiveJournal Community Graph of LiveJournal social network: A set of user vertices(500,000) A social relationship linking them - edges(3,959,440. friendship) u is a friend of v relationship defined by the explicit appearance of blogger u in the list of friends in the profile of blogger v. d(u,v) - The geographic distance between two people u and v.
  • Slide 12
  • The graph is directed network. About 8 friend per user. 80% of them mutual friendship. 77.6% form a giant component in which any two people u and v are connected by chains of friends. The coefficient of the network is 0.2 (the proportion of the time that u and v are themselves friends if they have a common friend w)
  • Slide 13
  • The in-degree log/log plot is more linear than the out-degree plot, but both appear far more parabolic than linear. These curves provide some evidence supporting a log-normal degree distribution in social networks, instead of a power-law distribution.
  • Slide 14
  • Geographic Routing We perform a simulated version of the message-forwarding experiment in the LiveJournal social network, using only geographic information to choose the next message holder in a chain. The main goals: 1. Determining whether individuals using purely geographic information in a simple way can succeed in discovering short paths to a destination city. 2. Analyzing the applicability of existing theoretical models that explain the presence or absence of short discoverable paths in networks.
  • Slide 15
  • Geographic Routing This approach allows us to investigate the performance of simple routing schemes without suffering from a reliance on the voluntary participation of the people in the network. The information on the location of every friend of every participant then allows us to analyze in detail the underlying geographic basis of friendship in explaining these results. The simulation, messages are forwarded by using the geographically greedy routing algorithm GEOGREEDY.
  • Slide 16
  • GEOGREEDY Algorithm 1) Choose source s and target t randomly. 2) Try to reach targets city not target itself. 3) At each step, the message is forwarded from the current message holder u to the friend v of u geographically closest to t - MIN{d(v,t) : v friend of u}. 4) If d(v,t)>d(u,t) then the chin fails. 5) Stop when you reach targets city Problem: Contain restrict condition users can forward message only to friend whom they have explicitly listed in their profile and the friend geographically closest to the target.
  • Slide 17
  • Modifed GEOGREEDY Algorithm 1) Choose source s and target t randomly. 2) Try to reach targets city not target itself. 3) At each step, the message is forwarded from the current message holder u to the friend v of u geographically closest to t - MIN{d(v,t) : v friend of u}. 4) If d(v,t)>d(u,t) then forwards the message to a person selected at random from us city else the chin fails. 5) Stop when you reach targets city
  • Slide 18
  • Results of GeoGreedy Algorithm stop if d(v,t) > d(u,t) 13% of the chains are completed. Median 4 Mean length 4.12 if d(v,t) > d(u,t) pick a neighbor at random in the same city if possible, else stop. 80% of the chains are completed. Median 12 Mean length 16.74 f(k) - The fraction of pairs in which the chain reaches ts city in exactly k steps.
  • Slide 19
  • GEOGREEDY Algorithm Conclusions 1) Routes messages only to the destination city and does not suffer from problems of voluntary participation, which explain why our completion rate is significantly higher than earlier experiments. 2) Even under restrictive forwarding conditions(narrow choice of actions), geographic information is sufficient to perform global routing in a significant fraction of cases. 3) This simulated experiment shows that the first GEOGREEDY algorithm is lower bound on the presence of short discoverable paths and Modified GEOGREEDY Algorithm is upper bound.
  • Slide 20
  • The Geographic Basis Of Friendship Because a restrictive global-routing scheme enjoys a high success rate, a question naturally arises: Is there some special structure relating friendship and geography that might explain this finding?
  • Slide 21
  • The Geographic Basis Of Friendship We examine The relationship between friendship probability and geographic distance: = d(u,v) the distance between pairs of people P( ) - the proportion of pairs u,v separated by distance who are friends. The probability that two people are friends given their distance is equal to P( ) = + 1/ , is a constant independent of geography. probability is 5.0 x 10 -6 for LiveJournal users who are very far apart.
  • Slide 22
  • As increases, P( ) decreases, indicating that geographic proximity indeed increases the probability of friendship. Fig. 3A verifies that geography remains crucial in online friendship. for distances larger than 1,000 km, the background friendship probability begins to dominate geography-based friendships.
  • Slide 23
  • Slide 24
  • Removing nongeographic friendships them from our plot to see only the geographic friendships, correcting for the background friendship probability (f( ) = P( ) - ). f( ) decreases smoothly as increases. We use only the average persons 5.5 geographic links to give a sufficient explanation of the navigable small-world phenomenon.
  • Slide 25
  • Kleinbergs social network model Put n people on a k-dimensional grid. Connect each to its immediate geographic neighbors. Add one long-distance link per person:
  • Slide 26
  • Kleinberg & Watts models Watts present a model to explain searchability in social networks based on assignments of individuals to locations in multiple hierarchical dimensions. Two individuals are socially similar if they are nearby in any dimension. Disadvantages: Although interests or occupations might be naturally hierarchical, geography is far more naturally expressed in 2D Euclidean space. Their work does not include a theoretical analysis of the model as the network size grows, nor does it include a direct empirical comparison to a real social network.
  • Slide 27
  • Kleinbergs social network model
  • Slide 28
  • Kleinbergs model & GEOGREEDY if the probability f[d(u, v)] of geographic friendship between u and v is roughly proportional to 1/(d(u, v))^2, then the finding of short paths by GEOGREEDY will be explained.
  • Slide 29
  • Explain the contradiction A dot is shown for every distinct United States location home to at least one LiveJournal user. The population of each successive displayed circle increases by 50,000 people. Note that the gap between the 350,000- and 400,000-person circles encompasses almost the entire Western United States. Evidence of the nonuniformity of the LiveJournal population:
  • Slide 30
  • Explain the contradiction showing a distinction in friendship probability as a function of distance for residents of the East and West coasts. A geographic model of friendship must be based on more than distance alone.
  • Slide 31
  • Why does distance fail? Population density varies widely across US red and blue vertices: best friends in Minnesota, strangers in Manhattan. To summarize: Any model of friendship that is based solely on the distance between people is insufficient to explain the geographic nature of friendships in the LiveJournal network. A model must be based on something beyond distance alone.
  • Slide 32
  • How do we handle non-uniformly distributed populations? Rank-Based Friendship Instead of distance, use Rank as the key geographic notion: o when examining a friend v of u, the relevant quantity is the number of people who live closer to u than v does. o Formally, The probability that u and v are geographic friends is:
  • Slide 33
  • Rank-Based Friendship Rank-based friendship implies that GEOGREEDY will find short paths in any social network. The LiveJournal network exhibits rank-based friendship.
  • Slide 34
  • A rank-based population network consists of: A 2-dimensional grid N of locations. a population P of people, living at points in N (|P|=n). a set E PP of friendships: one edge from each person in each direction Long-range link to fifth person, chosen by rank-based friendship Population Networks locations rounded to the nearest integral point in longitude/latitude.
  • Slide 35
  • Geographic Linking in the LiveJournal Social Network We return to the LiveJournal social network to show that rank-based friendship holds in a real network.
  • Slide 36
  • Geographic Linking in the LiveJournal Social Network Fig. 5A. The LiveJournal data contain geographic information limited to the level of towns and cities, our data do not have sufficient resolution to distinguish between all pairs of ranks. Fig. 5B. We show the same data, where the probabilities are averaged over a range of 1,306 ranks. This experiment validates that the LiveJournal social network does exhibit rank-based friendship, which thus yields a sufficient explanation for the experimentally observed navigability properties.
  • Slide 37
  • Geographic Linking in the LiveJournal Social Network The same data are replotted (unaveraged and averaged, respectively), correcting for the background friendship probability: we plot the rank r versus P(r) = 5.0 x 10 -6.
  • Slide 38
  • The slopes of the lines for the two coasts are nearly the same, and they are much closer together than the distance friendship-probability slopes shown in Fig. 4B. confirming that probabilities based on ranks are a more accurate representation than distance-based probabilities.
  • Slide 39
  • Summary The LiveJournal social network displays a surprising and variable relationship between geographic distance and probability of friendship, which is inconsistent with earlier theoretical models. The network evinces short paths discoverable by using geography alone, even though existing models predict the opposite. Rank-based friendship is provides two desirable properties: o (i) it matches our experimental observations regarding the relationshipbetween geography and friendship. o (ii) it admits a mathematical proof that networks exhibiting rank-based friendship will contain discoverable short paths.
  • Slide 40
  • Summary The LiveJournal social network displays a surprising and variable relationship between geographic distance and probability of friendship, which is inconsistent with earlier theoretical models. rank-based friendship is mechanism that has been empirically observed in real networks and theoretically guarantees small-world properties. Watts suggest that multiple independent dimensions play a role in message routing, and our results confirm this viewpoint: on average about one-third of LiveJournal friend. We have shown that the natural mechanisms of friendship formation result in rank-based friendship: people have formed relationships with almost exactly the connection between friendship and rank that is required to produce a navigable small world.