Upload
ruo-ando
View
22
Download
3
Embed Size (px)
Citation preview
Blink II: A node ranking system of DHT network using Map Reduce
Framework
Ruo Ando, Akihiko Shinohara and Takayuki Sugiura
NICT National Institute of Information and Communication Technology
NetAgent Co. Ltd
2
Overview: detecting illegal adoption in huge network
• BitTorrent becomes irreplaceable network application for distributing software and contents. But ..
• No one can know its exact scale and dynamics ! How many nodes join and disappear in BitTorrent network in
24 hours ?
• We have tackled this challenge of monitoring the largest scale network using our rapid and massive DHT crawler.
• We have succeeded to obtain 10,000,000 nodes in 24 hours !
• Also, ranking of countries and cities about BitTorrent Network is presented.
BitTorrent Traffic estimations
“① 55%” - CableLabs About an half of upstream traffic of CATV.
“② 35%” - CacheLogic“LIVEWIRE - File-sharing network thrives beneath the Radar”
“③ 60%” - documents in www.sans.edu“It is estimated that more than 60% of the traffic on the internet is peer-to-peer.”
Proposed system architecture for monitoring large scale networks
DHT network
DHT Crawler
Key value store
Dump Data
DHT Crawler DHT Crawler
<key>=node ID <value>=data (address, port, etc)
Map Map Map
Shuffle
Reduce
Scale out !
BT Ecosystem Public Private
Is it stoppable?
TRACKERS
DHT NETWORKS
Basic architecture of tracker network ① Ask
Node A (newcomer) ask the tracker for searching the file.
② torrent downloadTracker provides torrent file.
③ join Node A queries node B.
④ downloadNode A can downloads pieces of file on swarm network
Seeder has a complete file.Leecher has pieces of file.
PacSec 2011
BitTorrent Network Tracker or DHT (trackerless)
Tracker – a dedicated machine which stores torrent files, tracks of which nodes are downloading and uploading.
DHT – decentralized network architecture to share the functionality of the tracker. DHT is decentralized, but is more scalable than pure-P2P.
DHT (Distributed Hash Table) is method using <key,value> pairs. DHT lookup method enables us to discover the location of the node who shares the responsibility of tracker of a file share.
Recently DHT network has been paid much attention due to Dot-P2P project and Pirates Bay’s confirmation of stopping tracker.
DHT Protocol
There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.
• PING : the basic query for checking the queried node is alive. 20-byte string. Network byte order.
• FIND_NODE : used to obtain the contact information of ID. Response should be a key “nodes” or the compact node info for the target node or the K (8) in its routing table.
arguments: {"id" : "<querying nodes id>", "target" : "<id of target node>"}
response: {"id" : "<queried nodes id>", "nodes" : "<compact node info>"}
PacSec 2011
DHT Protocol
There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.
• GET_PEERS : used to cope with a torrent infohash. if the queried node has peers for the infohash, response is a key
values as a list of strings. if not, K nodes in the queried nodes routing table closest to the
infohash
• ANNOUNCE_PEER : used to announce the peer which has the querying node is downloading a torrent on a port.
arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>", "port" : <port number>, "token" : "<opaque token>"}
PacSec 2011
DHT network crawling
There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.
PING : the basic query for checking the queried node is alive. 20-byte string. Network byte order.FIND_NODE : used to obtain the contact information of ID. Response should be a key “nodes” or the compact node info for the target node or the K (8) in its routing table.
Rapid crawling: 24 hours to reach 10000000 nodes !
node
0
2000000
4000000
6000000
8000000
10000000
12000000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
hourdiff
10000
100000
1000000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Map Reduce
Input
Map
Map
Map
Reduce
MapReduce is the algorithm for coping with Big data.
map(key1,value) -> list<key2,value2> reduce(key2, list<value2>) -> list<value3>
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay GhemawatOSDI'04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, December, 2004.
Reduce
Reduce
Output
Map Reduce
Input
Map
PacSec 2011
Map
Map
Reduce
MapReduce is the algorithm for coping with Big data.
map(key1,value) -> list<key2,value2> reduce(key2, list<value2>) -> list<value3>
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay GhemawatOSDI'04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, December, 2004.
Reduce
Reduce
Output
Map
*.0.194.107,h116-0-194-107.catv02.itscom.jp*.28.27.107,c-76-28-27-107.hsd1.ct.comcast.net*.40.239.181,c-68-40-239-181.hsd1.mi.comcast.net*.253.44.184,pool-96-253-44-184.prvdri.fios.verizon.net*.27.170.168,cpc11-stok15-2-0-cust167.1-4.cable.virginmedia.com*.22.23.81,cpc2-stkn10-0-0-cust848.11-2.cable.virginmedia.com
hdsl1*.0.194.107 comcast verizon virginmediahdsl1 comcast
1 1 1 1 1 1 1
Log string is divided into words and assigned “1”.key-value – {word, 1}
PacSec 2011
Reduce
hdsl1*.0.194.107 comcast verizon virginmediahdsl1 comcast
1 1 1 1 1 1 1
Reduce: count up 1 for each word.Key-value – {hdsl, 2} / Key-value – {comcast, 2} / Key-value – {verizon, 1}
hdsl1 comcast
1
1
1
1
1
verizon
PacSec 2011
Sorting and ranking
hdsl1*.0.194.107 comcast verizon hdsl1hdsl1 comcast
1 1 1 1 1 1 1
@list1 = reverse sort { (split(/\s/,$a))[1] <=> (split(/\s/,$b))[1] } @list1;
hdsl1 comcast
1
1
1
1
1
verizon
1 ①②
③
PacSec 2011
# of nodes Ranking in one dayRANK Country # of nodes Region Domain
1 Russia 1,488,056 Russia RU
2 United states 1,177,766 North America US
3 China 815,934 East Asia CN
4 UK 414,282 West Europe GB
5 Canada 408,592 North America CA
6 Ukraine 399,054 East Europe UA
7 France 394,005 West Europe FR
8 India 309,008 South Asia IN
9 Taiwan 296,856 East Asia TW
10 Brazil 271,417 South America BR
11 Japan 262,678 East Asia JP
12 Romania 233,536 East Europe RO
13 Bulgaria 226,885 East Europe BG
14 South Korea 217,409 East Asia KR
15 Australia 216,250 Oceania AU
16 Poland 184,087 East Europe PL
17 Sweden 183,465 North Europe SE
18 Thailand 183,008 South East Asia TH
19 Italy 177,932 West Europe IT
20 Spain 172,969 West Europe ES
EU: 4 UK 414,282 West Europe GB UK (code: GB)N/A 77490London 47559 (7550000: 0.6%)Manchester 9808 (441000: 2%)Birmingham 6617Leeds 5111Glasgow 4841Brighton 4788Liverpool 4445Bristol 3814Sheffield 3536Upon 3363Edinburgh 3140Nottingham 2412Newcastle 2297Bradford 2093Tyne 2091Stoke-on-trent 2021Coventry 1965Preston 1902Reading 1814
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU J P CN US
Rank 11 Japan 262,678
N/A 69648Tokyo 54531 (13100000: 0.045)Osaka 7430 (8860000: ??)Yokohama 6983Nagoya 4114Kawasaki 3503Fukuoka 2989Kyoto 2875Chiba 2443Kobe 2409Sapporo 2015Shizuoka 1667Hamamatsu 1396Hiroshima 1356Setagaya 1339Nara 1239Sagamihara 1151Toyonaka 1089Kawaguchi 1077Tokorozawa 980
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU J P CN US
rank 3 China 815,934 East Asia CN
Beijing 240419 (17500000: 1%)Guangzhou 52981 (10330000 : 0.5 %?)Shanghai 27399 (18580000 : 0.1%?)Jinan 26281N/A 24695Chengdu 18835Shenyang 18566Tianjin 18460Hebei 17414Wuhan 15239Hangzhou 12997Harbin 10848Changchun 10411Nanning 10318Qingdao 10257Taiy 9573�Hefei 9455Changsha 6988Chongqing 5641Shenzhen 5600
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU J P CN US
City # country # of country population
1 Moscow 285097 Russia 1,488,056 1367
2 Beijing 240419 China 815,934 1755
3 Seoul 180186 Korea 217409 970
4 Saint Pertergburg 165735 Russia 1,488,056
5 Taipei 161498 Taiwan 296856 265
6 Hong Kong 130920 Hong Kong
7 Kiev 117392 Ukraine 399,054 251
8 Bucharest 79336 Romania 233,536 194
9 Sofia 78445 Bulgaria 226,885 126
10 Bangkok 62882 Thailand 183,008 687
11 Delhi 62563 India 309,008 2099
12 Tokyo 54531 Japan 262,678 1300
13 London 53514 England 414,282 755
14 Guangzhou 52981 China 815,934 1004
15 Athens 52656 Greece 300
Ranking of Cities
22
Conclusiondetecting illegal adoption in huge network
• BitTorrent becomes irreplaceable network application for distributing software and contents. But ..
• No one can know its exact scale and dynamics ! How many nodes join and disappear in BitTorrent network in
24 hours ?
• We have tackled this challenge of monitoring the largest scale network using our rapid and massive DHT crawler.
• We have succeeded to obtain 10,000,000 nodes in 24 hours !
• Also, ranking of countries and cities about BitTorrent Network is presented.