Upload
vuongkhuong
View
217
Download
3
Embed Size (px)
Citation preview
The bandwidth estimation by the visualization of TCP
window sizeS.Y.Suzuki, H.Matsunaga (KEK)
1
• perfSONAR is widely used in the LHC universe, to monitor the network stability.
• It periodically runs bwctl and owamp tests to multiple sites and records the result in the database.- bwctl internally uses iperf or nuttcp
• and shows histories graphically.
bandwidth historyone way latency history
2
Unfortunately, KEK is not joining to LHC tier sites.
Recently, the computing activity in HEP is led by LHC.
KEK
LHC
3
• Major reasons- End nodes are not tuned for the long latency transmission.
• Easy to solve. Just tune the system parameter.
- Intermediate link is congested.
• Difficult to solve.
• The congested link is shared by several activities.
• Sometimes- Remote firewall dislike KEK.
• Firewall is NOT always operated by physicists and KEK is still nameless for non-HEP people.
• FW admin "Our firewall already permits major HEP labs for LHC!KEK? what is that?".
Users voice: "Hi, the transfer from KEK to my institute is very slow. Why?"
4
Response for them
• If their institute runs already perfSONAR, that is quite useful.- "Just temporal congestion. Please try another day."- "It is always saturated. Please express upgrade necessity to your upstream
network!"
• perfSONAR contains NPAD and NDT for the diagnostic of the end nodes.- NDT measures the bandwidth- NPAD checks more details and
reports several problems if exist.
5
Bandwidth report of perfSONAR
• The bandwidth report of perfSONAR is made by iperf or nuttcp.
• The report value is an average speed during 20 or 30 seconds.
• If there is a packet loss, the average value is bad.• In the case of long RTT, the average value becomes worse.
Test endTest start
Spee
d
AverageActu
al s
peed
Maximum speed
Test endTest start
Spee
d
AverageActual speed
Maximum speedShort RTT Long RTT(If no packet loss)
6
• TCP is very modest and cooperative protocol.
• When TCP found a congestion, it reduces the speed pessimistically.
• Recovery speed depends on the RTT.
• Once packet loss has happened, always TCP on long RTT loses the race.
The TCP behavior
A B C Dloss
Transmission start
Spee
d loss loss loss loss
loss loss loss loss loss loss loss loss
TimeTransmission start
Spee
d Available bandwidth
TimeTransmission start
Spee
d Available bandwidthloss loss loss
short RTT
long RTT
7
• tcp_probe
- requires a root privilege to insert kernel module for the sender machine.
• wireshark
- uses only information in ACK packets from the receiver, so the congestion window size will not be shown.
• http://ask.wireshark.org/questions/2820/how-to-get-cwnd-in-wireshark
This behavior can be visualized by
8
What is thecongestion window?
• Receiver sides announces "the limit you can send"- in every ACK packet.
• Sender CAN send within the limit, but actually sends slower than that in order to avoid the congestion.- This is the "congestion window size" and it internally exists in the sender kernel.
• The actual speed depends on this congestion window size.
• If packets are lost, the congestion window size shrinks.
Time
Advertized window size
Con
gest
ion
win
dow
siz
e
loss loss loss loss
Transmission start
9
How to detect the congestion window size?
• peep directly kernel- tcp_probe
• Guess from the packet sequence at the sender side.- Using absolute time information
of captured packets.- We use this way
Time ACK
PSH
Data receiverData sender
Adve
rtise
d w
indo
w s
ize
cong
estio
n w
indo
w s
ize
As the congestion window determines the actual speed,the recording of its behavior is a precise speed measurement.
10
• For LHC guys- the grid-ftp using multi TCP streams
• multiple TCP enables the high-speed transfer with the small window size.
- Run multiple grid-ftps in non-interactively.
• Never mind, your transmission will be finished in someday.
- Bandwidth allocation if possible.
• Not always applicable, especially non-LHC sites.- including KEK
- Data in KEK is not always exportable via GRID
• Some of them are exportable, but some of them are not.
For the long distance data transmission
11
Case 1.Nagoya univ. and KEK
12
• From KEK to Nagoya is about 400km
Nagoya-U is the most powerful and nearest Belle collaborator.
KEK
Nagoya
13
perfSONAR shows a very jaggy bandwidth history
What is happening?
14
Sometimes good, but sometimes bad.But most of the packet loss happen over 900Mbps.
Observed behaviors in each data point.
980Mbps in average
200Mbps in average
15
After the QoS limitation up to 925Mbps
But we do not recommend QoS on the regular tests of perfSONAR,because it breaks the comparability of trends to other sites.
Possibly other activities use about 60-100Mbps
16
Case 2.DESY and KEK
17
• that is, "slow"
• At 2009, ILC collaborator in DESY questioned about similar problem.
- At that time, the problem is just window size configuration.
- After proper setting, 900Mbps was achieved.
- QoS is also very effective.
• Now DESY is running perfSONAR server for LHC,but it was not opened to KEK at that time.
- Instead of that, DESY collaborator opens iperf server immediately. We can test the speed by that.
Last year we asked from Belle collaborator in DESY.
18
At 2009, 900Mbps is achieved 24MB wsize
19
But 1Gbs was hard to use
There is a significant interference between reversal connection.We applied QoS to reduce the maximum speed to 900Mbps, then the interference seemed to be away.
Bi-directional test between DESY and KEK
KEK->DESY
DESY->KEK
20
• A problem of the end-node, not of the network.
• Sometimes the window size is very smaller than 3.4MB.
• Is there any time dependency?
- If perfSONAR had been available at that time, it is very easy question.
• It maybe hard to get a permission to access perfSONAR soon, so we did DIY.
In this time, the window size is limited 3.4MB.
100kB
3.4MB
Better day
worse day
21
Script for periodical test#!/bin/shcd $HOME/desysuffix=`date +%Y.%m%d.%H%M`exec > log.$suffixexec 2>&1tcpdump -i eth0 -n -w pkts.$suffix -p port 33200 &tcpdump_pid=$!sleep 5iperf -c ******************* -t 10 -w 32M -i 2 -p ******kill $tcpdump_pid
test by iperf and capture the packetsto make graph of the window size
22
Run it in every 10min.
23
Plot these graphsas a 2D histogram
Time since the iperf begins
(sec)
When the iperf is started
(Date)
Win
dow
siz
e
24
A few days history
Spikes caused by software bug,please ignore.
25
Shapes of good and bad
Bad
Good
Bad condition makes “tail”26
The top view of the histogram
Rare bads
Bads probability is not uniform.
Many bads
27
Activity of weekday?
Weekend
Weekend
iperf test duration was extended to 20sec from
10sec
28
In weekend, we can extend the window size.• From the 2009 test, the window size should be about
25MB for 1Gbps from KEK to DESY.
• DESY collaborator extended it for this test.
• But the window size > 18MB makes heavy packet loss.
Packet loss
Extend
3.4MB
29
Result of Nov. 2012weekend
weekend
weekend
30
DESY permitted perfSONAR access from KEK
KEK->DESYDESY->KEK Is available bandwidth 250Mbps...?
More faster!
31
Speed improvement
~10MB/s
45-55MB/s (500-600Mbps)
Extending window size...
32
Nov.2012 - Jan.2013
Nov. 2012
Jan. 2013
Xmas+new year
33
Even in holidays,
34
Available bandwidth is about 60MB/s
35
The best case
Window size reached 20MB,
Achieved speed is 70MB/s.But this is very rare.
Window size
Achieved speed
36
Summary
• Although perfSONAR is very effective to find a trend of the bandwidth, but it is not always available.
• The detail observation of TCP congestion window is useful to estimate the available bandwidth even in such a case.- If we can precisely estimate it, the parameter tuning and QoS
application will make better performance.
37
Special thanks to
• Kiyoshi Hayasaka (Nagoya University)
• Andreas Gellrich (DESY)
• Kind coordination for perfSONAR servers and an iperf server in remote site.
38