45
Two Years of Short URL Internet Measurement http://flic.kr/phretor Federico Maggi, Alessandro Frossi, Stefano Zanero Gianluca Stringhini, Brett Stone-Gross, Christopher Kruegel, Giovanni Vigna

Two Years of Short URL Internet Measurement: security threats and countermeasures

Embed Size (px)

DESCRIPTION

I presented the results of this research work at #www2013. Abstract: URL shortening services have become extremely popular. However, it is still unclear whether they are an effective and reliable tool that can be leveraged to hide malicious URLs, and to what extent these abuses can impact the end users. With these questions in mind, we first analyzed existing countermeasures adopted by popular shortening services. Surprisingly, we found such countermeasures to be ineffective and trivial to bypass. This first measurement motivated us to proceed further with a large-scale collection of the HTTP interactions that originate when web users access live pages that contain short URLs. To this end, we monitored 622 distinct URL shortening services between March 2010 and April 2012, and collected 24,953,881 distinct short URLs. With this large dataset, we studied the abuse of short URLs. Despite short URLs are a significant, new security risk, in accordance with the reports resulting from the observation of the overall phishing and spamming activity, we found that only a relatively small fraction of users ever encountered malicious short URLs. Interestingly, during the second year of measurement, we noticed an increased percentage of short URLs being abused for drive-by download campaigns and a decreased percentage of short URLs being abused for spam campaigns. In addition to these security-related findings, our unique monitoring infrastructure and large dataset allowed us to complement previous research on short URLs and analyze these web services from the user’s perspective.

Citation preview

Page 1: Two Years of Short URL Internet Measurement: security threats and countermeasures

Two Years of Short URL Internet Measurement

http://flic.kr/phretor

Federico Maggi, Alessandro Frossi, Stefano ZaneroGianluca Stringhini, Brett Stone-Gross, Christopher Kruegel, Giovanni Vigna

Page 2: Two Years of Short URL Internet Measurement: security threats and countermeasures

How short URLs work: Creating an alias.

URL shorteningservice

http://example.com/very/long/?url=to&the=landing-pagelong URL

http://ab.cd/d73fYfzshort URL

RANDOM SUFFIXIS GENERATED

"make me shorter"

Page 3: Two Years of Short URL Internet Measurement: security threats and countermeasures

How short URLs work: Resolving an alias.

http://ab.cd/d73fYfz

REDIRECTIONMECHANISM

http://example.com/very/long/...HTTP 302

HTML META Refresh

JavaScript

ActionScript

redirection happens on the browser's side

URL shorteningservice

Page 4: Two Years of Short URL Internet Measurement: security threats and countermeasures

From a security point of view

Perfect mean for masquerading malicious URLs

• Trivially evade naïve filters

• Robust to email clients that break long URLs into multiple lines

• Trendy effect (e.g., Twitter, Facebook)

• Users have grown accustomed to short URLs

http://i.am/harmless http://evil.com/very/long/and/suspicous/path

Page 5: Two Years of Short URL Internet Measurement: security threats and countermeasures

Do shortening services take security measures?

Page 6: Two Years of Short URL Internet Measurement: security threats and countermeasures

Do shortening services take security measures?

• Prepare lists of malicious long URLs

• Malware (e.g., drive-by download) from Wepawet

• Spam from Spamhaus

• Phishing from PhishTank

• Performed 3 small measurement experiments

• Creation

• Resolution

• Periodic checks

Page 7: Two Years of Short URL Internet Measurement: security threats and countermeasures

Creation of malicious short URLs

• Submit for shortening to the top 6 services (as of 2010)

1. bit.ly

2. durl.me

3. goo.gl

4. is.gd

5. migre.me

6. tinyurl.com

• Do they accept malicious long URLs?

Page 8: Two Years of Short URL Internet Measurement: security threats and countermeasures

URL shorteningservice

http://example.com/very/long/?url=to&the=evil.comlong URL

"make me shorter"

Blacklist

"sure, here is your short URL"

?

Page 9: Two Years of Short URL Internet Measurement: security threats and countermeasures

Malicious long URLs accepted by top services

Service Malware Phishing Spam

# % # % # %

bit.ly 2,000 100.0 2,000 100.0 2,000 100.0durl.me 1,999 99.9 1,987 99.4 1,976 98.8goo.gl* 2000 99.9 994 99.4 1,000 100.0

is.gd 1,854 92.7 1,834 91.7 364 18.2migre.me 1,738 86.9 1,266 63.3 1,634 81.7

tinyurl.com 1,959 99.5 1,935 96.8 587 29.4

Overall 9,550 95.5 9,022 90.2 6,561 65.6

Page 10: Two Years of Short URL Internet Measurement: security threats and countermeasures

Resolution of malicious short URLs

• Resolved the malicious short URLs with the same services

• Do they resolve malicious URLs?

Page 11: Two Years of Short URL Internet Measurement: security threats and countermeasures

URL shorteningservice

http://i.am/so-evilshort URL

"please resolve this"

Mappingtable or blacklist

"sure, here is your long URL"

Page 12: Two Years of Short URL Internet Measurement: security threats and countermeasures

Malicious short URLs resolved by top services

Service Malware Phishing Spam

bit.ly 0.05 11.3 0.0durl.me 0.0 0.0 0.0goo.gl* 66.4 96.9 78.7

is.gd 1.08 2.27 0.8migre.me 0.86 14.0 0.0

tinyurl.com 0.66 0.7 2.04

Overall 21.45 26.39 31.38

Page 13: Two Years of Short URL Internet Measurement: security threats and countermeasures

Periodic checks on existing short URLs

• Created a dynamic landing page (i.e., long URL)

• Submission time (day zero): serve benign content (e.g., text & images)

• ...

Page 14: Two Years of Short URL Internet Measurement: security threats and countermeasures

URL shorteningservice

http://our.server/dynamic-page.phpdynamic long URL

"make me shorter"

Blacklist

"is this long URL malicious?"No!

http://ab.cd/good-today

Page 15: Two Years of Short URL Internet Measurement: security threats and countermeasures

...we changed the content of the page after 1 week

• ...

• Resolution time (each subsequent week): serve malicious content

• Do services periodically check and sanitize existing short URLs?

Page 16: Two Years of Short URL Internet Measurement: security threats and countermeasures

URL shorteningservice

http://ab.cd/good-todayshort URL

"please resolve this"

Aliases DB

"is the long URL malicious?"No!

http://our.server/dynamic-page.php

Page 17: Two Years of Short URL Internet Measurement: security threats and countermeasures

Deferred malicious short URLs survived

Threat Shortened Blocked Not Blocked

Malware 5,000 20% 80%Phishing 5,000 20% 80%

Spam 5,000 20% 80%

Overall 15,000 20% 80%durl.me

Page 18: Two Years of Short URL Internet Measurement: security threats and countermeasures

This motivated us to start a measurement experimentto collect short URLs

Page 19: Two Years of Short URL Internet Measurement: security threats and countermeasures

A different perspective what is the impact on users?

• Most of previous work used Twitter as a source of short URLs

• Types of questions

• Do users stumble upon malicious short URLs very often?

• Are users aware of malicious short URLs?

• What kind of short URLs users typically encounter?

User-centered measurement.

Page 20: Two Years of Short URL Internet Measurement: security threats and countermeasures

Measurement infrastructure

JS

Our resolver

Collectors (users)

http://ab.cd/4jaYashttp://ab.cd/4jaYas

http://ab.cd/4jaYas

http://ab.cd/4jaYas

LANDINGPAGE

Page 22: Two Years of Short URL Internet Measurement: security threats and countermeasures

Biased measurements?

• We do not ask a user to become a collector

• We want users to subscribe as collectors spontaneously

• How? We provide a useful service!

Page 23: Two Years of Short URL Internet Measurement: security threats and countermeasures
Page 24: Two Years of Short URL Internet Measurement: security threats and countermeasures

Collected data between 2010 and 2012

• Total 7,000 distinct users (estimate from 1,370,277 distinct IPs)

• about 500 to 1,000 active users per day

• about 20,000 to 50,000 short URLs sent each day (100,000 peaks)

• 24,953,881 distinct short URLs collected overall

Page 25: Two Years of Short URL Internet Measurement: security threats and countermeasures

Dataset statistics (geolocation of the submitters)

Page 26: Two Years of Short URL Internet Measurement: security threats and countermeasures

Dataset statistics (top services)

Distinct URLs Log entries

10,069,846 bit.ly 24,818,239 bit.ly

4,725,125 t.co 12,054,996 t.co

1,418,418 tinyurl.com 5,649,043 tinyurl.com

816,744 ow.ly 2,188,619 goo.gl

800,761 goo.gl 2,053,575 ow.ly

638,483 tumblr.com 1,214,705 j.mp

597,167 fb.me 1,159,536 fb.me

584,377 4sq.com 1,116,514 4sq.com

517,965 j.mp 1,066,325 tumblr.com

464,875 tl.gd 1,045,380 is.gd

Page 27: Two Years of Short URL Internet Measurement: security threats and countermeasures

Do users stumble upon malicious short URLs very often?

Page 28: Two Years of Short URL Internet Measurement: security threats and countermeasures

Blacklist Phishing Malware Spam

Spamhaus - - 10,306

Phishtank 7 - -

Wepawet - 6,057 -

Safe Browsing 913 2,405 -

Malicious short URLs encountered by submitters

Top sources

Social networksBlogsNews

Overall baseline

ChatroomsWebmailsOnline IMs

Page 29: Two Years of Short URL Internet Measurement: security threats and countermeasures

Malicious page

http://ab.cd/sfb4Ac

http://ab.cd/asd31A

http://ab.cd/5aD3B9

http://ab.cd/419E9s

http://ab.cd/sfb4Ac

. . .

http://ab.cd/5aD3B9

http://ab.cd/419E9s

Aliasing of of malicious URLs

Page 30: Two Years of Short URL Internet Measurement: security threats and countermeasures

Aliasing of malicious pages using short URLs

x = #distinct short URL aliases per landing page

Spam pages not aliasedas much as before.

DRAFT COPY

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

BenignSpam URLs

Malware URLs

(a) Distinct short URLs per distinct maliciousor benign landing URL from April 2010 to April2011.

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

1 10 100

CD

FBenign

Spam URLsMalware URLs

(b) Distinct short URLs per distinct maliciousor benign landing URL from April 2010 to April2012.

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

Benign URLsSpam URLs

Malware URLs

(c) Distinct containers per distinct malicious orbenign short URL from April 2010 to April 2011.

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

Benign URLsSpam URLs

Malware URLs

(d) Distinct containers per distinct malicious orbenign short URL from April 2010 to April 2012.

Figure 5: Comparison of the number of distinct short URLs per unique landing page (a, c) and distinctcontainer page per unique short URL (b, d) after 1 year (a, b) and after 2 years (c, d). The distribution havechanged over time: (a) spam pages were generally aliased with a larger number of short URLs than benignpages; (b) in the following year spammers did not to alias their pages as much as before. Also, (c) spam shortURLs used to be spread over a larger number of container pages than benign short URLs; however, (c) theynow exhibit the same distribution as their benign counterpart. We may argue that short URLs are not seenanymore as a valuable mean of aliasing spam pages.

over their evolution. This is corroborated by the statisticsabout the abuse of short URLs found in latest three APWGreports [12–14]: After a growing trend, at the beginning ofour measurement (2010), the subsequent reports highlight astable (2011) and decreasing (2012) trend.

4.2 The Short URLs EcosystemAs part of their research, Antoniades and colleagues in [2]

have analyzed the category of the pages to which bit.ly andow.ly short URLs typically point to, along with the categoryof the container page, that they had available for bit.ly URLsonly. They assigned categories to a selection of URLs. Wedid a similar yet more comprehensive analysis by character-izing all the short URLs that we collected by means of thecategories described in the following.Categorizing an arbitrary-large number of websites au-

tomatically is a problem that has no solution. However,our goal was to obtain a coarse-grained categorization. Tothis end, we relied on community-maintained directories andblacklists. More precisely, we classified the container pages(about 25,000,000 distinct URLs) and landing pages (about22,000,000 distinct URLs) using the DMOZ Open DirectoryProject (http://www.dmoz.org) and URLBlacklist.com. The

former is organized in a tree structure and includes 3,883,992URLs: URLs are associated to nodes, each with localized,regional mirrors. We expanded these nodes by recursivelymerging the URLs found in these mirrors. The latter comple-ments the DMOZ database with about 1,607,998 URLs anddomains metadata. URLBlacklist.com is used by web-filteringtools such as SquidGuard (http://www.squidguard.org) andcontains URLs belonging to clean categories (e.g., gardening,news), possibly undesired subjects (e.g., adult sites), and alsomalicious pages (i.e., 22.6% of the sites categorized as “anti-spyware”, 18.15% of those categorized as “hacking”, 8.29%of pages falling within “searchengine” domains, and 5.7% ofthe sites classified as “onlinepayment” are in this order, themost rogue categories according to an analysis that we runthrough McAfee SiteAdvisor 3). Overall, we ended up with74 categories. For clearer visualization, we selected the 48most frequent categories. These include, for example, “so-cialnetworking,”“adult,”“abortion,”“contraception,”“chat,”etc. We reserved the word “other” for URLs belonging tothe less meaningful categories that we removed, or for URLs

3http://siteadvisor.com/sites/

Until first year Until second year

Spam pages more aliasedthan benign pages.

Page 31: Two Years of Short URL Internet Measurement: security threats and countermeasures

http://ab.cd/asd31A

http://ab.cd/5aD3B9

http://ab.cd/sfb4Ac

http://ab.cd/419E9s

Container page 1

http://ab.cd/asd31A

http://ab.cd/5aD3B9

http://ab.cd/419E9s

http://ab.cd/sfb4Ac

Container page 2

. . .

http://ab.cd/sfb4Ac

http://ab.cd/asd31A

http://ab.cd/5aD3B9

http://ab.cd/419E9s

Container page N

Dissemination of malicious short URLs

http://ab.cd/sfb4Ac

Page 32: Two Years of Short URL Internet Measurement: security threats and countermeasures

Spreading of malicious short URLs

DRAFT COPY

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

BenignSpam URLs

Malware URLs

(a) Distinct short URLs per distinct maliciousor benign landing URL from April 2010 to April2011.

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

1 10 100

CD

F

BenignSpam URLs

Malware URLs

(b) Distinct short URLs per distinct maliciousor benign landing URL from April 2010 to April2012.

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

Benign URLsSpam URLs

Malware URLs

(c) Distinct containers per distinct malicious orbenign short URL from April 2010 to April 2011.

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

FBenign URLs

Spam URLsMalware URLs

(d) Distinct containers per distinct malicious orbenign short URL from April 2010 to April 2012.

Figure 5: Comparison of the number of distinct short URLs per unique landing page (a, c) and distinctcontainer page per unique short URL (b, d) after 1 year (a, b) and after 2 years (c, d). The distribution havechanged over time: (a) spam pages were generally aliased with a larger number of short URLs than benignpages; (b) in the following year spammers did not to alias their pages as much as before. Also, (c) spam shortURLs used to be spread over a larger number of container pages than benign short URLs; however, (c) theynow exhibit the same distribution as their benign counterpart. We may argue that short URLs are not seenanymore as a valuable mean of aliasing spam pages.

over their evolution. This is corroborated by the statisticsabout the abuse of short URLs found in latest three APWGreports [12–14]: After a growing trend, at the beginning ofour measurement (2010), the subsequent reports highlight astable (2011) and decreasing (2012) trend.

4.2 The Short URLs EcosystemAs part of their research, Antoniades and colleagues in [2]

have analyzed the category of the pages to which bit.ly andow.ly short URLs typically point to, along with the categoryof the container page, that they had available for bit.ly URLsonly. They assigned categories to a selection of URLs. Wedid a similar yet more comprehensive analysis by character-izing all the short URLs that we collected by means of thecategories described in the following.Categorizing an arbitrary-large number of websites au-

tomatically is a problem that has no solution. However,our goal was to obtain a coarse-grained categorization. Tothis end, we relied on community-maintained directories andblacklists. More precisely, we classified the container pages(about 25,000,000 distinct URLs) and landing pages (about22,000,000 distinct URLs) using the DMOZ Open DirectoryProject (http://www.dmoz.org) and URLBlacklist.com. The

former is organized in a tree structure and includes 3,883,992URLs: URLs are associated to nodes, each with localized,regional mirrors. We expanded these nodes by recursivelymerging the URLs found in these mirrors. The latter comple-ments the DMOZ database with about 1,607,998 URLs anddomains metadata. URLBlacklist.com is used by web-filteringtools such as SquidGuard (http://www.squidguard.org) andcontains URLs belonging to clean categories (e.g., gardening,news), possibly undesired subjects (e.g., adult sites), and alsomalicious pages (i.e., 22.6% of the sites categorized as “anti-spyware”, 18.15% of those categorized as “hacking”, 8.29%of pages falling within “searchengine” domains, and 5.7% ofthe sites classified as “onlinepayment” are in this order, themost rogue categories according to an analysis that we runthrough McAfee SiteAdvisor 3). Overall, we ended up with74 categories. For clearer visualization, we selected the 48most frequent categories. These include, for example, “so-cialnetworking,”“adult,”“abortion,”“contraception,”“chat,”etc. We reserved the word “other” for URLs belonging tothe less meaningful categories that we removed, or for URLs

3http://siteadvisor.com/sites/

x = #distinct container pages per short URL

Until first year Until second year

Spam short URLs almost undistinguishable from benign ones.

Spam short URLs spread on morecontainer pagesthan benign ones.

Page 33: Two Years of Short URL Internet Measurement: security threats and countermeasures

Total lifespan of a malicious short URL

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

Number of unique short URL per distinct landing URL (log-scale)

BenignSpam URLs

Malware URLs

(a) Distinct short URLs per distinct maliciousor benign landing URL.

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

(log) Number of unique container page per distinct short URL

Benign URLsSpam URLs

Malware URLs

(b) Distinct containers per distinct malicious orbenign short URL.

Figure 10. Number of distinct short URLs per unique landing page and distinct container page perunique short URL. Notably, benign short URLs exhibit different characteristics if compared withmalicious ones: More precisely, (b) spam short URLs are generally spread on a larger number ofcontainer pages than benign ones. Surprisingly, (a) more unique short URLs are created for thesame malicious landing page, whereas benign long URLs are typically less aliased. Phishing shortURLs are a minority with respect to the others, thus are not accounted.

We derived the maximum lifespan of each collected URL based on historical accesslogs to their container pages. We calculated the maximum lifespan (or simply lifespan)as the delta time between first and latest (i.e., more recent) occurrence of each shortURL in our database. More specifically, our definition of lifespan accounts for thefact that short URLs may disappear from some container pages and reappear after awhile on the same or other container pages. Fig. 11 shows the empirical cumulativedistribution frequency of the lifespan of malicious versus benign short URLs. About95% of the benign short URLs have a lifetime around 20 days, whereas 95% of themalicious short URLs lasted about 4 months. For example, we observed a spam campaignspanning between April 1st and June 30th 2010 that involved 1,806 malicious short URLsredirecting to junk landing pages; this campaign lasted about three months before beingremoved by tinyurl.com administrators. The latest MessageLabs Intelligence Annual

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

CD

F

Lifespan (hours between first and latest occurrence of a short URLs in our dataset)

3-months spam campaign disseminatedvia 1,806 tinyurl.com short URLs(April 1st-June 30, 2010)

BenignMalicious

Malicious (excluding 3-months Apr-Jun spam campaign)

Figure 11. Delta time between first and latest occurrence of malicious versus benign short URLs.The “peak” indicates a high, about 50 %, amount of spam short URLs that lasted about threemonths. Malicious URLs are usually found on multiple different container pages even for extendedperiods of time, whereas benign short URLs follow the “one-day-of-fame effect” pattern.

Malicious short URLs typically survive longerthan benign ones.

We found out a spam campaign (Storm botnet?) with1,806 short URLs. Deleted by tinyurl.com's admins after 3 months.

x = #hours between the first and latest occurrence of a short URL

Page 34: Two Years of Short URL Internet Measurement: security threats and countermeasures

Conclusions

• Do shortening services take enough countermeasures to protect the users?

• Some of them use blacklists but do not proactively check existing aliases

• Do users stumble upon malicious short URLs very often?

• Not very often: about 0.1% overall.

• Are users aware of malicious short URLs?

• Not much: almost no one clicked on our "+flag as malicious" link. Also confirmed by [Onarlioglu et al., NDSS 2012]

Page 35: Two Years of Short URL Internet Measurement: security threats and countermeasures

Questions? Federico [email protected]@phretor

Page 36: Two Years of Short URL Internet Measurement: security threats and countermeasures

What happens when users click on a short URL?

Page 37: Two Years of Short URL Internet Measurement: security threats and countermeasures
Page 38: Two Years of Short URL Internet Measurement: security threats and countermeasures

r Category

0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions

1.00 onlinepayment

⇢ =In(cat)

In(cat) +Out(cat)

⇢ ! 0⇢ ! 1

Many outbound short URLs (aggregators, e.g., Twitter)Many inbound short URLs (landing pages, e.g., news, blogs)

Page 39: Two Years of Short URL Internet Measurement: security threats and countermeasures
Page 40: Two Years of Short URL Internet Measurement: security threats and countermeasures
Page 41: Two Years of Short URL Internet Measurement: security threats and countermeasures
Page 42: Two Years of Short URL Internet Measurement: security threats and countermeasures

Are the shortening service similarin terms of content aliased?

Page 43: Two Years of Short URL Internet Measurement: security threats and countermeasures

Content-specific vs. general-purpose services

Table 3. Ranking of categories by the ratio of incoming and outgoing connections via short URLs.

r Category

0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions

1.00 onlinepayment

landing pages. This is the case, for example, of 4sq.com (about 100%), whose shortURLs always bring from online social-networking sites to pages categorized as “other”.The most popular shortening services (e.g., bit.ly, goo.gl, ow.ly) fall into the second tier(i.e., 32–48%), together with those services that cover a wide variety of categories, andtypically interconnect pages of different categories.

Non-obvious Uses of Short URLs We also analyzed how short URLs interconnecttogether pages of different categories, to understand whether some categories have amajority of container or landing pages. To this end, we calculated the average frequencyof category change from the perspective of the container page and landing page. With thisdata we created a weighted digraph with 48 nodes, each corresponding to a category. Theweights are the frequencies of change, calculated between each pair of categories—andaveraged over all the short URLs and pages within each category; weights are between10.19 and 39.41% and distributed as shown in Fig. 8. We then calculate the averageweight of incoming, In(cat), and outgoing, Out(cat), edges for each category cat, andfinally derive the ratio r(cat) = In(cat)

In(cat)+Out(cat) . When r ! 0, the category has a majorityof outgoing short URLs (i.e., many container pages of such category), whereas r ! 1

tru

nc.

it

om

.ly

slid

esh

a.re

mo

by

.to

mig

re.m

e

wp

.me

tum

blr

.co

m

lnk

.ms

cot.

ag

flic

.kr

icio

.us

tin

y.l

y

amzn

.to

po

st.l

y

yo

utu

.be

p.t

l

tcrn

.ch

tl.g

d

ny

ti.m

s

ht.

ly

altu

rl.c

om

tin

yso

ng

.co

m

dld

.bz

su.p

r

ust

re.a

m

t.co

j.m

p

dlv

r.it

fb.m

e

twu

rl.n

l

go

o.g

l

ow

.ly

ff.i

m

pin

g.f

m

is.g

d

tin

y.c

c

bit

.ly

tin

yu

rl.c

om

mas

h.t

o

ur1

.ca

sqze

.it

cort

.as

shar

.es

4sq

.co

m

0 2

0 4

0 6

0 8

0 1

00

Med

ian

% c

ateg

ory

dri

ft

Most popular shorteners are also general-purpose and cover a wide variety of categories

#categories covered (min. 0, max. 48)

Figure 7. Frequency of change of category (median with 25- and 75-percent quantiles) and numberof categories covered (size of black dot) of the top 50 services. The most popular, general-purposeshortening services highlighted are characterized by an ample set of categories (close to 48, whichis the maximum) and short URLs that, in 32–48% of the cases, are published on pages havingcategories different from the landing page category.

Page 44: Two Years of Short URL Internet Measurement: security threats and countermeasures

Table 3. Ranking of categories by the ratio of incoming and outgoing connections via short URLs.

r Category

0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions

1.00 onlinepayment

landing pages. This is the case, for example, of 4sq.com (about 100%), whose shortURLs always bring from online social-networking sites to pages categorized as “other”.The most popular shortening services (e.g., bit.ly, goo.gl, ow.ly) fall into the second tier(i.e., 32–48%), together with those services that cover a wide variety of categories, andtypically interconnect pages of different categories.

Non-obvious Uses of Short URLs We also analyzed how short URLs interconnecttogether pages of different categories, to understand whether some categories have amajority of container or landing pages. To this end, we calculated the average frequencyof category change from the perspective of the container page and landing page. With thisdata we created a weighted digraph with 48 nodes, each corresponding to a category. Theweights are the frequencies of change, calculated between each pair of categories—andaveraged over all the short URLs and pages within each category; weights are between10.19 and 39.41% and distributed as shown in Fig. 8. We then calculate the averageweight of incoming, In(cat), and outgoing, Out(cat), edges for each category cat, andfinally derive the ratio r(cat) = In(cat)

In(cat)+Out(cat) . When r ! 0, the category has a majorityof outgoing short URLs (i.e., many container pages of such category), whereas r ! 1

tru

nc.

it

om

.ly

slid

esh

a.re

mo

by

.to

mig

re.m

e

wp

.me

tum

blr

.co

m

lnk

.ms

cot.

ag

flic

.kr

icio

.us

tin

y.l

y

amzn

.to

po

st.l

y

yo

utu

.be

p.t

l

tcrn

.ch

tl.g

d

ny

ti.m

s

ht.

ly

altu

rl.c

om

tin

yso

ng

.co

m

dld

.bz

su.p

r

ust

re.a

m

t.co

j.m

p

dlv

r.it

fb.m

e

twu

rl.n

l

go

o.g

l

ow

.ly

ff.i

m

pin

g.f

m

is.g

d

tin

y.c

c

bit

.ly

tin

yu

rl.c

om

mas

h.t

o

ur1

.ca

sqze

.it

cort

.as

shar

.es

4sq

.co

m

0 2

0 4

0 6

0 8

0 1

00

Med

ian

% c

ateg

ory

dri

ft

Most popular shorteners are also general-purpose and cover a wide variety of categories

#categories covered (min. 0, max. 48)

Figure 7. Frequency of change of category (median with 25- and 75-percent quantiles) and numberof categories covered (size of black dot) of the top 50 services. The most popular, general-purposeshortening services highlighted are characterized by an ample set of categories (close to 48, whichis the maximum) and short URLs that, in 32–48% of the cases, are published on pages havingcategories different from the landing page category.

Page 45: Two Years of Short URL Internet Measurement: security threats and countermeasures

Table 3. Ranking of categories by the ratio of incoming and outgoing connections via short URLs.

r Category

0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions

1.00 onlinepayment

landing pages. This is the case, for example, of 4sq.com (about 100%), whose shortURLs always bring from online social-networking sites to pages categorized as “other”.The most popular shortening services (e.g., bit.ly, goo.gl, ow.ly) fall into the second tier(i.e., 32–48%), together with those services that cover a wide variety of categories, andtypically interconnect pages of different categories.

Non-obvious Uses of Short URLs We also analyzed how short URLs interconnecttogether pages of different categories, to understand whether some categories have amajority of container or landing pages. To this end, we calculated the average frequencyof category change from the perspective of the container page and landing page. With thisdata we created a weighted digraph with 48 nodes, each corresponding to a category. Theweights are the frequencies of change, calculated between each pair of categories—andaveraged over all the short URLs and pages within each category; weights are between10.19 and 39.41% and distributed as shown in Fig. 8. We then calculate the averageweight of incoming, In(cat), and outgoing, Out(cat), edges for each category cat, andfinally derive the ratio r(cat) = In(cat)

In(cat)+Out(cat) . When r ! 0, the category has a majorityof outgoing short URLs (i.e., many container pages of such category), whereas r ! 1

tru

nc.

it

om

.ly

slid

esh

a.re

mo

by

.to

mig

re.m

e

wp

.me

tum

blr

.co

m

lnk

.ms

cot.

ag

flic

.kr

icio

.us

tin

y.l

y

amzn

.to

po

st.l

y

yo

utu

.be

p.t

l

tcrn

.ch

tl.g

d

ny

ti.m

s

ht.

ly

altu

rl.c

om

tin

yso

ng

.co

m

dld

.bz

su.p

r

ust

re.a

m

t.co

j.m

p

dlv

r.it

fb.m

e

twu

rl.n

l

go

o.g

l

ow

.ly

ff.i

m

pin

g.f

m

is.g

d

tin

y.c

c

bit

.ly

tin

yu

rl.c

om

mas

h.t

o

ur1

.ca

sqze

.it

cort

.as

shar

.es

4sq

.co

m

0 2

0 4

0 6

0 8

0 1

00

Med

ian

% c

ateg

ory

dri

ft

Most popular shorteners are also general-purpose and cover a wide variety of categories

#categories covered (min. 0, max. 48)

Figure 7. Frequency of change of category (median with 25- and 75-percent quantiles) and numberof categories covered (size of black dot) of the top 50 services. The most popular, general-purposeshortening services highlighted are characterized by an ample set of categories (close to 48, whichis the maximum) and short URLs that, in 32–48% of the cases, are published on pages havingcategories different from the landing page category.