22
Blink II: A node ranking system of DHT network using Map Reduce Framework Ruo Ando, Akihiko Shinohara and Takayuki Sugiura NICT National Institute of Information and Communication Technology NetAgent Co. Ltd

Ncm 2012 Ruo Ando

Embed Size (px)

Citation preview

Page 1: Ncm 2012 Ruo Ando

Blink II: A node ranking system of DHT network using Map Reduce

Framework

Ruo Ando, Akihiko Shinohara and Takayuki Sugiura

NICT National Institute of Information and Communication Technology

NetAgent Co. Ltd

Page 2: Ncm 2012 Ruo Ando

2

Overview: detecting illegal adoption in huge network

• BitTorrent becomes irreplaceable network application for distributing software and contents. But ..

• No one can know its exact scale and dynamics ! How many nodes join and disappear in BitTorrent network in

24 hours ?

• We have tackled this challenge of monitoring the largest scale network using our rapid and massive DHT crawler.

• We have succeeded to obtain 10,000,000 nodes in 24 hours !

• Also, ranking of countries and cities about BitTorrent Network is presented.

Page 3: Ncm 2012 Ruo Ando

BitTorrent Traffic estimations

“① 55%” - CableLabs About an half of upstream traffic of CATV.

“② 35%” - CacheLogic“LIVEWIRE - File-sharing network thrives beneath the Radar”

“③ 60%” - documents in www.sans.edu“It is estimated that more than 60% of the traffic on the internet is peer-to-peer.”

Page 4: Ncm 2012 Ruo Ando

Proposed system architecture for monitoring large scale networks

DHT network

DHT Crawler

Key value store

Dump Data

DHT Crawler DHT Crawler

<key>=node ID <value>=data (address, port, etc)

Map Map Map

Shuffle

Reduce

Scale out !

Page 5: Ncm 2012 Ruo Ando

BT Ecosystem Public Private

Is it stoppable?

TRACKERS

DHT NETWORKS

Page 6: Ncm 2012 Ruo Ando

Basic architecture of tracker network ① Ask

Node A (newcomer) ask the tracker for searching the file.

② torrent downloadTracker provides torrent file.

③ join Node A queries node B.

④ downloadNode A can downloads pieces of file on swarm network

Seeder has a complete file.Leecher has pieces of file.

PacSec 2011

Page 7: Ncm 2012 Ruo Ando

BitTorrent Network Tracker or DHT (trackerless)

Tracker – a dedicated machine which stores torrent files, tracks of which nodes are downloading and uploading.

DHT – decentralized network architecture to share the functionality of the tracker. DHT is decentralized, but is more scalable than pure-P2P.

DHT (Distributed Hash Table) is method using <key,value> pairs. DHT lookup method enables us to discover the location of the node who shares the responsibility of tracker of a file share.

Recently DHT network has been paid much attention due to Dot-P2P project and Pirates Bay’s confirmation of stopping tracker.

Page 8: Ncm 2012 Ruo Ando

DHT Protocol

There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.

• PING : the basic query for checking the queried node is alive. 20-byte string. Network byte order.

• FIND_NODE : used to obtain the contact information of ID. Response should be a key “nodes” or the compact node info for the target node or the K (8) in its routing table.

arguments: {"id" : "<querying nodes id>", "target" : "<id of target node>"}

response: {"id" : "<queried nodes id>", "nodes" : "<compact node info>"}

PacSec 2011

Page 9: Ncm 2012 Ruo Ando

DHT Protocol

There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.

• GET_PEERS : used to cope with a torrent infohash. if the queried node has peers for the infohash, response is a key

values as a list of strings. if not, K nodes in the queried nodes routing table closest to the

infohash

• ANNOUNCE_PEER : used to announce the peer which has the querying node is downloading a torrent on a port.

arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>", "port" : <port number>, "token" : "<opaque token>"}

PacSec 2011

Page 10: Ncm 2012 Ruo Ando

DHT network crawling

There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.

PING : the basic query for checking the queried node is alive. 20-byte string. Network byte order.FIND_NODE : used to obtain the contact information of ID. Response should be a key “nodes” or the compact node info for the target node or the K (8) in its routing table.

Page 11: Ncm 2012 Ruo Ando

Rapid crawling: 24 hours to reach 10000000 nodes !

node

0

2000000

4000000

6000000

8000000

10000000

12000000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

hourdiff

10000

100000

1000000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Page 12: Ncm 2012 Ruo Ando

Map Reduce

Input

Map

Map

Map

Reduce

MapReduce is the algorithm for coping with Big data.

map(key1,value) -> list<key2,value2> reduce(key2, list<value2>) -> list<value3>

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay GhemawatOSDI'04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, December, 2004.

Reduce

Reduce

Output

Page 13: Ncm 2012 Ruo Ando

Map Reduce

Input

Map

PacSec 2011

Map

Map

Reduce

MapReduce is the algorithm for coping with Big data.

map(key1,value) -> list<key2,value2> reduce(key2, list<value2>) -> list<value3>

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay GhemawatOSDI'04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, December, 2004.

Reduce

Reduce

Output

Page 14: Ncm 2012 Ruo Ando

Map

*.0.194.107,h116-0-194-107.catv02.itscom.jp*.28.27.107,c-76-28-27-107.hsd1.ct.comcast.net*.40.239.181,c-68-40-239-181.hsd1.mi.comcast.net*.253.44.184,pool-96-253-44-184.prvdri.fios.verizon.net*.27.170.168,cpc11-stok15-2-0-cust167.1-4.cable.virginmedia.com*.22.23.81,cpc2-stkn10-0-0-cust848.11-2.cable.virginmedia.com

hdsl1*.0.194.107 comcast verizon virginmediahdsl1 comcast

1 1 1 1 1 1 1

Log string is divided into words and assigned “1”.key-value – {word, 1}

PacSec 2011

Page 15: Ncm 2012 Ruo Ando

Reduce

hdsl1*.0.194.107 comcast verizon virginmediahdsl1 comcast

1 1 1 1 1 1 1

Reduce: count up 1 for each word.Key-value – {hdsl, 2} / Key-value – {comcast, 2} / Key-value – {verizon, 1}

hdsl1 comcast

1

1

1

1

1

verizon

PacSec 2011

Page 16: Ncm 2012 Ruo Ando

Sorting and ranking

hdsl1*.0.194.107 comcast verizon hdsl1hdsl1 comcast

1 1 1 1 1 1 1

@list1 = reverse sort { (split(/\s/,$a))[1] <=> (split(/\s/,$b))[1] } @list1;

hdsl1 comcast

1

1

1

1

1

verizon

1 ①②

PacSec 2011

Page 17: Ncm 2012 Ruo Ando

# of nodes Ranking in one dayRANK Country # of nodes Region Domain

1 Russia 1,488,056 Russia RU

2 United states 1,177,766 North America US

3 China 815,934 East Asia CN

4 UK 414,282 West Europe GB

5 Canada 408,592 North America CA

6 Ukraine 399,054 East Europe UA

7 France 394,005 West Europe FR

8 India 309,008 South Asia IN

9 Taiwan 296,856 East Asia TW

10 Brazil 271,417 South America BR

11 Japan 262,678 East Asia JP

12 Romania 233,536 East Europe RO

13 Bulgaria 226,885 East Europe BG

14 South Korea 217,409 East Asia KR

15 Australia 216,250 Oceania AU

16 Poland 184,087 East Europe PL

17 Sweden 183,465 North Europe SE

18 Thailand 183,008 South East Asia TH

19 Italy 177,932 West Europe IT

20 Spain 172,969 West Europe ES

Page 18: Ncm 2012 Ruo Ando

EU: 4 UK 414,282 West Europe GB UK (code: GB)N/A 77490London 47559 (7550000: 0.6%)Manchester 9808 (441000: 2%)Birmingham 6617Leeds 5111Glasgow 4841Brighton 4788Liverpool 4445Bristol 3814Sheffield 3536Upon 3363Edinburgh 3140Nottingham 2412Newcastle 2297Bradford 2093Tyne 2091Stoke-on-trent 2021Coventry 1965Preston 1902Reading 1814

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12

GB RU J P CN US

Page 19: Ncm 2012 Ruo Ando

Rank 11 Japan 262,678

N/A 69648Tokyo 54531 (13100000: 0.045)Osaka 7430 (8860000: ??)Yokohama 6983Nagoya 4114Kawasaki 3503Fukuoka 2989Kyoto 2875Chiba 2443Kobe 2409Sapporo 2015Shizuoka 1667Hamamatsu 1396Hiroshima 1356Setagaya 1339Nara 1239Sagamihara 1151Toyonaka 1089Kawaguchi 1077Tokorozawa 980

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12

GB RU J P CN US

Page 20: Ncm 2012 Ruo Ando

rank 3 China 815,934 East Asia CN

Beijing 240419 (17500000: 1%)Guangzhou 52981 (10330000 : 0.5 %?)Shanghai 27399 (18580000 : 0.1%?)Jinan 26281N/A 24695Chengdu 18835Shenyang 18566Tianjin 18460Hebei 17414Wuhan 15239Hangzhou 12997Harbin 10848Changchun 10411Nanning 10318Qingdao 10257Taiy 9573�Hefei 9455Changsha 6988Chongqing 5641Shenzhen 5600

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12

GB RU J P CN US

Page 21: Ncm 2012 Ruo Ando

City # country # of country population

1 Moscow 285097 Russia 1,488,056 1367

2 Beijing 240419 China        815,934 1755

3 Seoul 180186 Korea 217409 970

4 Saint Pertergburg 165735 Russia 1,488,056

5 Taipei 161498 Taiwan 296856 265

6 Hong Kong 130920 Hong Kong

7 Kiev 117392 Ukraine        399,054 251

8 Bucharest 79336 Romania   233,536 194

9 Sofia 78445 Bulgaria        226,885 126

10 Bangkok 62882 Thailand        183,008 687

11 Delhi 62563 India        309,008 2099

12 Tokyo 54531 Japan        262,678 1300

13 London 53514 England        414,282 755

14 Guangzhou 52981 China        815,934 1004

15 Athens 52656 Greece   300

Ranking of Cities

Page 22: Ncm 2012 Ruo Ando

22

Conclusiondetecting illegal adoption in huge network

• BitTorrent becomes irreplaceable network application for distributing software and contents. But ..

• No one can know its exact scale and dynamics ! How many nodes join and disappear in BitTorrent network in

24 hours ?

• We have tackled this challenge of monitoring the largest scale network using our rapid and massive DHT crawler.

• We have succeeded to obtain 10,000,000 nodes in 24 hours !

• Also, ranking of countries and cities about BitTorrent Network is presented.