37
we.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas Karagiannis FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, Microsoft Research www 2011 March 30 2011 Presented by Somin Kim

we.b : The web of short URLs

  • Upload
    halima

  • View
    42

  • Download
    1

Embed Size (px)

DESCRIPTION

we.b : The web of short URLs. Demetris Antoniades , lasonas Polakis , Gerogios Kontaxis , Elias Athansapoulos , Sotiris loannidis , Evangelos P.Markatos , Thomas Karagiannis FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, Microsoft Research www 2011 March 30 2011 - PowerPoint PPT Presentation

Citation preview

Page 1: we.b  : The web of short URLs

we.b : The web of short URLs

Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas KaragiannisFORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, Microsoft Research

www 2011March 30 2011

Presented by Somin Kim

Page 2: we.b  : The web of short URLs

Outline Introduction URL Shortening Services Data Collection The Web of Short URLs Evolution and Lifetime Publishers Short URLs and Web Performance Conclusion

2/36

Page 3: we.b  : The web of short URLs

Introduction

The idea behind URL shortening services is to assist in the easy sharing of URLs by providing a short equivalent one

Short URLs have seen a significant increase in their usage– Result of their extensive usage in Online Social Networks

Understanding the usage of short URLs is important– To provide insight into the interests of OSNs or IM systems– To know performance, scalability, and reliability of URL short-

ening services– To define the proper architecture for URL shortening services

3/36

Page 4: we.b  : The web of short URLs

Outline Introduction URL Shortening Services Data Collection The Web of Short URLs Evolution and Lifetime Publishers Short URLs and Web Performance Conclusion

4/36

Page 5: we.b  : The web of short URLs

URL Shortening Services

Popularity of URL shortening services– The rapid adoption of OSNs has led to an increased demand

for short URLs– Short URLs are also useful in traditional systems

such as IMs, SMSes, and e-mails

URL Shortening Services(1/3)

5/36

Long URLhttp://www.this.is.a.-

long.url.com/in-deed.html

Short URLhttp://bit.ly/dv82ka

access

Redirected to original URL

URL shorten-

ingService

bit.ly

publish

Page 6: we.b  : The web of short URLs

URL Shortening Services(2/3) Some of these services provide statistics about the

accesses of these URLs– The number of hits– The referrer sites the hits came from– The visitor’s countries– …

Users can create many short URLs for the same long URL– If a user creates a short URL for the same long URL, the ser-

vice will create a different hash that will be given to the user– For each unique long URL, bit.ly provides a unique global

hash with an information page– Overall statistics will still be kept by the global URL’s infor-

mation page

6/36

Page 7: we.b  : The web of short URLs

URL Shortening Services(3/3)

통계페이지 캡쳐해서보여

줄까 ?

Global information

7/36

Page 8: we.b  : The web of short URLs

Outline Introduction URL Shortening Services Data Collection

– Collection methodology– Collected data

The Web of Short URLs Evolution and Lifetime Publishers Short URLs and Web Performance Conclusion

8/36

Page 9: we.b  : The web of short URLs

Data Collection(1/3)

Collection Methodology Twitter crawling

– Twitter crawling returns links “gossiped” in a social network– We collected tweets that contain HTTP URLs– Only 13% of the HTTP URLs were not shortened by any URL

shortening services– 50% of the HTTP URLs from Twitter were from bit.ly URLs

9/36

Page 10: we.b  : The web of short URLs

Data Collection(2/3)

Collection Methodology Brute-Force

– We can get hashes irrespective of their published medium and recency

– We gathered metadata provided by the shortening service– We monitored the evolution of the keyspace in ow.ly system

Ow.ly serially iterates over the available short URL space About 70000 new short URLs created each day

10/36

Page 11: we.b  : The web of short URLs

Data Collection(3/3)

Collected Data In case of twitter and bitly, all the accompanied

metadata for each short URL are also collected

11/36

Page 12: we.b  : The web of short URLs

Outline Introduction URL Shortening Services Data Collection The Web of Short URLs

– Where do short URLs come from?– Where do short URLs point to?– Location– Popularity

Evolution and Lifetime Publishers Short URLs and Web Performance Conclusion

12/36

Page 13: we.b  : The web of short URLs

The Web of Short URLs(1/7)

Where do short URLs come from?

Short URLs do not frequently appear in traditional web pages

– The vast majority of users arrive at bit.ly from non-web ap-plications

– Users who access through web applications mostly come from social networking channels (Twitter, facebook)

13/36

Page 14: we.b  : The web of short URLs

The Web of Short URLs(2/7)

Where do short URLs point to?

Most popular types of short URL contents

– News and informative content come first– 4% of the most accessed URLs in owly trace were shortening

services Spammers use short URLs packed inside other short URLs to

avoid exposure of the long URL

14/36

Page 15: we.b  : The web of short URLs

The Web of Short URLs(3/7)

Location

The penetration of short URL use is significantly dif-ferent from that of the Internet/web

– Most of these accesses come from the United States, Japan, and Great Britain

– Any accesses from China and India was not seen China and India are ranked in the top-5 countries with the

largest number of Internet users

15/36

Page 16: we.b  : The web of short URLs

The Web of Short URLs(4/7)

Popularity

URL popularity– Large systems that provide

content to users typically exhibit the power-law behav-ior

A small fraction of the con-tent is very popular

Most of it is considered un-interesting

16/36

Page 17: we.b  : The web of short URLs

The Web of Short URLs(5/7)

Popularity

URL popularity (cont.)

– We split short URLs into active and inactive Inactive : no hit was observed during the last 7 days of trace

– 10% of the short URLs are responsible for about 90% of the total hits seen in trace

17/36

Page 18: we.b  : The web of short URLs

The Web of Short URLs(6/7)

Popularity

Content popularity

– Besides familiar websites, less known or popular websites were observed

Pollpigeon.com(short opinion polls), Mashable.com(social media news), Twibbon.com(Twitter campaign)

– Short polls are popular contents It’s very common in social networking sites

18/36

Page 19: we.b  : The web of short URLs

The Web of Short URLs(7/7)

Popularity

Content popularity (cont.)– Do popular web sites significantly change over time?

About 6 sites appears every single day of April 2010 in the top-100

– 22 sites for March 2010 About 400 sites enjoy short bursts of popularity

19/36

Page 20: we.b  : The web of short URLs

Outline Introduction URL Shortening Services Data Collection The Web of Short URLs Evolution and Lifetime

– Life span of short URLs– Temporal evolution

Publishers Short URLs and Web Performance Conclusion

20/36

Page 21: we.b  : The web of short URLs

Evolution and Lifetime(1/5)

Life span of short URLs

Lifetime of a URL is the number of days between its last and first observed hit

Lifetime CDF of the traces (twitter2, bitly)

– 50% of the short URLs are not ephemeral– Inactive URLs have a shorter lifespan

21/36

Page 22: we.b  : The web of short URLs

Evolution and Lifetime(2/5)

Temporal evolution

The daily change in the number of hits for each short URL

– The number of accesses for a typical short URL varies by as much as 40% from one day to the next

– As less popular URLs are included, larger daily changes are observed

22/36

Page 23: we.b  : The web of short URLs

Evolution and Lifetime(3/5)

Temporal evolution

– Inactive URLs Average 60% of hits are

observed during their first day

After that, hit rate drops sharply

– Active URLs First-day effect is also ev-

ident A significant hit rate for

recent days are also ob-served

The evolution of hit rate across the lifetime of the short URLs

23/36

Page 24: we.b  : The web of short URLs

Evolution and Lifetime(4/5)

Temporal evolution

The daily hit rate with a short URL’s lifetime for inac-tive short URLs

– There’s no obvious dependence of the daily hit rate with a short URL’s lifetime

24/36

Page 25: we.b  : The web of short URLs

Evolution and Lifetime(5/5)

Temporal evolution

Total number of hits as a function of the short URL’s lifetime

– Active short URLs(bottom) appear to exhibit a linear relation-ship in log-log scale

25/36

Page 26: we.b  : The web of short URLs

Outline Introduction URL Shortening Services Data Collection The Web of Short URLs Evolution and Lifetime Publishers Short URLs and Web Performance Conclusion

26/36

Page 27: we.b  : The web of short URLs

Publishers(1/4)

Twitter effect

– Short URLs referred from Twitter enjoy significantly higher popularity

27/36

Page 28: we.b  : The web of short URLs

Publishers(2/4)

CCDF of posted short URLs per Twitter user

– Most users published a handful of tweets with short URLs– The majority of tweets with short URLs are original Twitter

messages (not retweets)

28/36

Page 29: we.b  : The web of short URLs

Publishers(3/4)

User’s daily publish rate of short URLs

– Median rate is 1 short URL per day– 98% or the user publish no more than 5 short URLs per day

29/36

Page 30: we.b  : The web of short URLs

Publishers(4/4)

Correlation between a user’s publish rate and total number of hits

– As the number of URLs published by a poster increases, the expected hit rate drops

Spamming-type behavior Only a few short URLs from each publisher enjoy high hit rates

30/36

Page 31: we.b  : The web of short URLs

Outline Introduction URL Shortening Services Data Collection The Web of Short URLs Evolution and Lifetime Publishers Short URLs and Web Performance

– Space reduction– Latency

Conclusion

31/36

Page 32: we.b  : The web of short URLs

Short URLs and Web Performance(1/3)

Space reduction

Space gain for the short URL– URL shortening services are quite effective at reducing URL

size For roughly 50% of the URLs, 91% reduction in size is observed

– In twitter trace, only 31% of long versions of short URL re-mained under the character limit

32/36

Page 33: we.b  : The web of short URLs

Short URLs and Web Performance(2/3)

Latency

URL shortening services impose an additional over-head in the user’s web request

We periodically accessed the 10 most popular short URLs

– Fb.me and ow.ly exhibit a bimodal behavior– Bit.ly appears to be the slowest but shows more consistent

behavior33/36

Page 34: we.b  : The web of short URLs

Short URLs and Web Performance(3/3)

Latency

The redirection overhead of bit.ly

– More than 50% of the accesses, the URL shortening redirect-ion imposes a relative overhead of 54%

– This additional delay turns out to be comparable to the final web page access time in a significant fraction

34/36

Page 35: we.b  : The web of short URLs

Outline Introduction URL Shortening Services Data Collection The Web of Short URLs Evolution and Lifetime Publishers Short URLs and Web Performance Conclusion

35/36

Page 36: we.b  : The web of short URLs

Conclusion We have presented a large-scale study of URL short-

ening services– Exploring traces from services themselves and Twitter

Summary– Short URLs appear mostly in ephemeral media, with pro-

found effects on their popularity, lifetime, and access pat-terns

– Small number of URLs have a very large number of accesses– A large percentage of short URLs are not ephemeral– The most popular websites changes slowly over time– The web sites differ from the sites which are popular among

the broader web community– URL shortening services are extremely effective in space

gaining but increase the overhead to access the web page

36/36

Page 37: we.b  : The web of short URLs