Upload
jakobogt
View
222
Download
0
Embed Size (px)
Citation preview
8/13/2019 High statistics ping results
1/19
High statistics ping results
Created: May 14, 1999; last updated by: Les Cottrell on February 27, 2000
Page Contents
Introduction
To understand better how to interpret PingER results we decided to make a series
of one off high statistics ping measurements with shorter time frames than the
normal PingER measurements on both LAN and various WAN paths. The idea is
to look at the frequency distributions, the time variations for various types of
networks in the LAN and WAN environment, and to correlate the results with the
topology, routes and known performance issues. Our goal is also to compare theseresults with results from other high statitics delay measurements.
Unless otherwise noted, the pings were sent at one second intervals with a timeout
of 20 seconds and a payload (including the 8 ICMP protocol bytes) of 100 bytes.
Pings between hosts at same site
To understand the behavior of ping at a single site we used the NIKHEF ping client
(since it has a resolution of 1 usec.) running on a Redhat 5.2 Linux on an Intel
400MHz Pentium host (doris) at SLAC to various other hosts at SLAC separated
from doris by various network devices with various interface speeds from 10 Mbps
to 1 Gbps. Doris, the ping client, is connected to the network via a shared 10Mbps
hub that is connected to a 10Mbps edge switch port. The table below shows the
hosts pinged (i.e. acting as ping servers) together with their hardware and software
configurations and the connection between doris and the server. The edge switches
are Cisco Catalyst 5000s, the core switches are Cisco Catalyst 6500s, the farm and
server switches are Cisco Catalyst 5500s, and the core routers are Cisco Catalyst
8500s.
Server
name
Server
Hardware
Server
OS
Serverinterface
speed
Network connection devices & speeds
mercury Sun Ultra 5Solaris
5.6
10Mbps HDX
sharedSame shared 10 Mbps hub
charon Sun Ultra 1Solaris
5.6
10Mbps HDX
shared10Mbps to edge switch (cgb3) 10 Mbps to doris
1
8/13/2019 High statistics ping results
2/19
bronco001 Sun Ultra 5Solaris
5.6
100Mbps
FDX switched
100 Mbps to farm switch, 1Gbps to core switch,
1Gbps to core router, 1Gbps to core switch,
100Mbps to edge switch, 10Mbps to doris
mailbox Sun Ultra 5Solaris
5.6
100 Mbps
FDX switched
100Mbps to server switch, 1Gbps to core switch,
1Gbps to core router, 1Gbps to core switch,
100Mbps to edge switch, 10 Mbps to doris
grouse Sun Sparc1+
SunOS4.1.3.1
10Mbps HDXshared
10Mbps to edge switch, 100Mbps to core switch,
1Gbps to core router, 1Gbps to core switch,
100Mbps to edge switch, 10Mbps to doris
A simple model to understand the median or minimum ping response times for an
unloaded local area network and lightly loaded hosts is to ignore the hubs (a hub
inserts about 1 bit time delay) and the cable lengths (for a site with cable runs of =1msec. and < 10msec., 100usec. for RTT >=10msec.
and < 100msec., and 1msec. for >=100msec.). and does not show up when using
equally spaced linear RTT bins. The double peak in the frequency distribution for
the two hosts on the same subnet is also a binning effect. and does not show up
when using linear bin-widths. On the other hand, by measuring the wire-time
difference between packets entering and leaving the server, the double peak seen
in the low RTT "peak" for the distribution of the two hosts on the same shared hub
is found to be be caused by the ping server. For another example of a pathological
RTT distribution caused by a ping server, see Pinger Measurement Pathologies.
2
8/13/2019 High statistics ping results
3/19
Pings between 2 hosts on the same shared 10 Mbps hub
The following plots are for the Linux host to a Sun Ultra 5 (mercury) running
Solaris 5.6. The first plot shows the ping RTT in msec. for about 260,000 100 byte
pings started May 30, 1999.
The
second
plot
shows
the
frequency histogram of the ping RTTs with log scales. From measurements of the
"wire time" using NetXray running on a separate Windows NT on the same hub as
the ping server/responder host (mercury), we verified that the two peaks around
log10(RTT) = 0.5 are a function of the ping server/responder host itself.
Pings
between 2
hosts on
the same
subnet but
different
ports on
the sameswitch
The
following
plots are
from the
3
8/13/2019 High statistics ping results
4/19
host to a Sun Ultra 1 (charon) running Solaris 5.6. The hosts are on the same Cisco
Catalyst 5000 switch but different 10 Mbps shared ports. the pings were stated on
May 30, 1999 at 12:38:20 PST and the packet loss was about 0.08%. The first plot
shows the behavior of about 260,000 ping RTTs as a function of time.
Thesecond
plot
shows
the
frequency distribution of the pings.
Pings
between 2
hosts atthe same
site but on
different
subnets
The
following
plots are
for 500,000ping RTTs
between a
Linux
Redhat 5.2 host (doris) and a Sun Sparc 1+ (grouse) running SunOS and the same
Linux host and a Surveyor host running on a Pentium II running FreeBSD. The
grouse pings were started on May 30, 1999. The two hosts are on separate subnets
4
8/13/2019 High statistics ping results
5/19
and are separated by 4 switches and a router. The first plot shows the time
variation of the ping RTT.
The second
plot is the
frequencyhistogram
of the ping
RTT. The
blue line
shows the
cumulative
distribution function (CDF). The data was binned into 2 different bin widths to
provide a reasonable number of counts in the higer RTT bins: 0.1 msec. bins are
shown in magenta and extend out to 10 msec., and 10 msec. bins run from 10 to 100
msec. The counts in the 1 msec. wide bins are normalized to the 0.1 msec. wide
bins by dividing the count in the 1 msec. bins by 10. A simple power series fit to
the data between RTT 2.3 msec. and 61 msec. is also shown as a black line.
The distribution has a sharp peak with a median at 1.35 msec and with an Inter
Quartile Range (IQR) of 0.2 msec. There is also a high RTT tail.
The third plot
in this
subsection
shows the
time variation
of the ping
RTT for
306,000 pings
between the
Linux host
and the SLAC
Surveyor host.
5
8/13/2019 High statistics ping results
6/19
The final
plot in
this
subsection show the frequency distribution of the ping RTTs between the Linux
host and the SLAc Surveyor host. The blue line shows the cumulative distributionfunction (CDF). The data is binned into 3 different bin widths The black dots are
for bins with a width of 0.1 msec. and are for RTT < 1 msec.. The magenta dots are
for bin widths of 1 msec. and are for RTTs < 10 msec.. The green dots have bin
widths of 10 msec. and cover the entire range of data. The binned data is
normalized by dividing the counts in the 1 msec. bins by 10 and the counts in the
6
8/13/2019 High statistics ping results
7/19
10 msec. bins by 10. the black line is a simple power series fits to the data between
2.3 msec. and 61 msec. inclusive.
The distribution exhibits a sharp peak with a median at 0.9 msec. an IQR of 0.06
msec. and a high RTT tail. There are also secondary peaks at 10 msec. and 2.4 msec.
Pings between 2 ESnet sites
ESnet sites have excellent connectivity with low packet loss and a high speed
well-provisioned backbone that they connect to. Thus they provide an example of
"how good it can get". The ESnet operations center is at LBNL and SLAC is anESnet site with, at the time of the measurements below, a T3 interconnect to the
ESnet ATM backbone cloud. The SLAC link to ESnet is also lightly loaded with
peaks measured over 5 minutes only reaching about 50% utilization for the period
of interest.
The ping distribution for an extensive (500K samples) measurement between a host
at SLAC (minos.slac.stanford.edu) and a host at ESnet at LBNL (hershey.es.net), is
seen below starting at 9:01am on April 23, 1999 and ending at 3:59am on April 29
1999. The pings were separated by 1 second and the timeout was 20 seconds. It can
be seen that there is a narrow (IQR = 1msec.) peak at 4 msec. with a very long tail
extending out to beyond 750 msec. The black line is a fit to a power series with the
parameters shown.
7
8/13/2019 High statistics ping results
8/19
If one plots this data on a log-log plot (see below) then it can be seen that there are
two time scales (4-18 msec. and 18-1000 msec.) with quite different behaviors. The
bulk of the data (99.8%) falls in the 4-18 msec. region. In the 4-18 msec. region (the
magenta points) the data falls of asy ~ A * RTT-6.6whereas beyond 18msec. (the
blue points) it falls off asy ~ B * RTT-1.7. The parameters of the fits are shown in the
chart. Note that in the 4-18 msec. region the data are histogrammed in 1 msec. bins,
whereas beyond that they are histogrammed in 10 msec. bins. and the 2 y scales are
adjusted appropriately (the one for the wider bins beyond 18 msec. is a factor 10
greater than the other). The green points are not used in the fits and are the data
histogrammed in 1 msec. bins for the range 19 msec. to 55 msec. The power law
exponent behavior in the region 4 - 18 msec. is that exhibited by very chaotic
processes such as fully developed turbulence or the stock market, whereas the data
beyond 18 msec. is more characteristic of heavy-tailed or long range similarity
behavior. A guess is that the transition at 20 msec reflects a change from delays
caused by simple queueing to delays caused by router processing and needs more
work to substantiate.
8
8/13/2019 High statistics ping results
9/19
The autocorrelation function for the first 64000 RTTs for (there was no packet loss
in this period) is shown below. It can be seen that in general there is a very weak (pathchar -q 64 hershe
pathchar to hershey.es
mtu limitted to 8192 bytes at local host
doing 64 probes at each of 64 to 8192 by 2600 FLORA03.SLAC.Stanford.EDU (134.79.16.55)
| 77 Mb/s, 462 us (1.77 ms)
1 RTR-CGB5.SLAC.Stanford.EDU (134.79.19.3)
| 294 Mb/s, 218 us (2.43 ms)
2 RTR-CGB6.SLAC.Stanford.EDU (134.79.135.6)
11
8/13/2019 High statistics ping results
12/19
| 18 Mb/s, 276 us (6.53 ms)
3 RTR-DMZ.SLAC.Stanford.EDU (134.79.111.4)
| ?? b/s, -85 us (2.44 ms)
4 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)
-> 192.68.191.18 (1)
| ?? b/s, 1.42 ms (5.13 ms)5?lbl1-atms.es.net (134.55.24.11)
| 245 Mb/s, 71 us (5.54 ms)
6 esnet-lbl.es.net (134.55.23.66)
| 9.7 Mb/s, 95 us (12.5 ms)
7 hershey.es.net (198.128.1.11)
7 hops, rtt 4.91 ms (12.5 ms), bottleneck 9.7 Mb/s, pipe 42418 byte
Pings between International sites
CERN, at the time of the measurements below, shared an 8Mbps link across the
Atlantic with the World Health Organization, IN2P3 in France and Switch (the
Swiss Academic network). The shared trans Atlantic link was reached over 80%
utilization for a 5 minute period during the measurement period and was normally
the bottleneck. The loading on the link is seen below. The green represents the
average (over 30 minutes) traffic to Switzerland and the blue is the average (over
30 minutes) traffic to the U.S. The dark green and magenta are the 5 minute
maxima. The ping measurements below were for the consecutive days labelled
Sun, Mon, Tue, Wed in the utilization graph below.
To better understand the behavior of ping Round Trip Time (RTT) in the WAN, we
pinged CERN (ping.cern.ch) from SLAC (minos.slac.stanford.edu) every minutewith a timeout of 20 seconds for 260K pings between 8:36 am Sunday May 9 and
10:35am Wednesday May 12, 1999 (PDT). The packet loss for these measurements
was about 0.053%. The distribution of the RTT is seen in the chart below.
12
8/13/2019 High statistics ping results
13/19
The distribution shows a lot of structure. First there is a sharp peak at about 224
msec. with a width of (90% of the peak is contained in) 9.5 msec. On the high RTT
side of the peak several smaller peaks are seen, together with a long tail. If we lookat the individual RTTs in the high RTT tail beyond 260 msec. then we get the chart
shown below:
The clusters of points for Tuesday May 11, also show up in the Surveyor data as
shown in the graphs below:
Of
13
8/13/2019 High statistics ping results
14/19
particular interest is the cluster around 18:00 hours on Tuesday May 11. The ping
RTT and loss data is shown for this data in the chart below. The loss is calculated
by looking for missing ping sequence numbers. The routes are obtained from
Surveyor measurements which use traceroute to measure the routes about every 15minutes.
14
8/13/2019 High statistics ping results
15/19
There is a clear change in behavior starting at about 18:10 hours and stopping at
about 19:20 hours. At the start of this period there is a loss of 169 consecutive ping
packets (or a break in connectivity of 169 seconds, since the pings are sent at one
second intervals, while the network routing converges to a new route), and at the
end a further loss of 36 consecutive ping packets. Apart from this period the route
(as measured by traceroute) to CERN is from SLAC to ESnet to the New York
Sprint NAP, then to West Orange in New Jersey and thence back to Chicago to the
STAR-Tap and onto CERN. During the period from 18:10 hours to 19:20 hours, the
route is from SLAC to ESnet to BBN which goes via New York, London, to Geneva
and is more congested, and hence the increase in packet loss, but avoids the trip
back from New Jersey to Chicago (and so saves an extra 30 msec. in the round
trip). The complete routes can be seen below:
15
8/13/2019 High statistics ping results
16/19
The ping RTT data for the cluster around 1:00am on May 11, 1999 can be seen in
more detail in the chart below. In the chart it can be seen that there is a complete
loss of connectivity (i.e. no pings responded) of about 14 minutes starting at about
1:07am until about 1:21am. After this performance looks fairly normal. Prior to the
loss of connectivity, there are periods of longer RTT (almost double) followed by
shorter losses of connectivity. For CERN to SLAC, Surveyor shows a change from
the normal route at 1:00am and 1:15am returning to the normal route at 1:35am.
For SLAC to CERN, Surveyor shows a change in route at 0:56am returning to the
normal route at the next measurement at 1:23am. The alternate routes are limited
to the SLAC site. This cluster is coincident with problems occurring as a result of
16
8/13/2019 High statistics ping results
17/19
making changes to a core switch at SLAC.
The
cluster around 7:15am on May 11, 1999 shown in more detail below is actually 3
sudden changes in RTT from about 220 msec. to about 525 msec. and back after 1 to
2 minutes, with RTT top hat shaped peaks at about 7:14am to 7:16am, 7:19am to7:20am, and 7:23am to 7:24am. Surveyor traceroute samples did not coincide with
any of these peaks and saw no route changes. Only one packet was lost in the
period shown below. The black line is a moving average with the average being
over 10 seconds. It is inserted to help the eye discern the top hat peaks.
Surveyor also does not indicate any route changes for the clusters around 14:00
hours on May 11, 1999 or 15:00 hours or 18:30 hours on May 10, 1999.
The pathchar information for the normal path from SLAC to CERN is shown
below:
>pathchar -q 64 ping.cern.ch
pathchar to dxcoms.cern.ch (137.138.28.176)
mtu limitted to 8192 bytes at local host
doing 64 probes at each of 64 to 8192 by 260
17
8/13/2019 High statistics ping results
18/19
0 FLORA03.SLAC.Stanford.EDU (134.79.16.55)
| 162 Mb/s, 369 us (1.14 ms)
1 RTR-CORE1.SLAC.Stanford.EDU (134.79.19.2)
| 115 Mb/s, 281 us (2.28 ms)
2 RTR-CGB6.SLAC.Stanford.EDU (134.79.159.12)
| 19 Mb/s, 242 us (6.29 ms)3 RTR-DMZ.SLAC.Stanford.EDU (134.79.111.4)
| ?? b/s, -100 us (2.29 ms)
4 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)
-> 192.68.191.18 (1)
| ?? b/s, 31.1 ms (64.4 ms)
5?nynap1-atms.es.net (134.55.24.9)
| 914 Mb/s, 118 us (64.7 ms)
6 1-sprint-nap.cw.net (192.157.69.11)
-> 192.157.69.11 (1)
| 1997 Mb/s, 1.72 ms (68.2 ms)
7?core4-hssi6-0-0.WestOrange.cw.net (204.70.10.225)
| 591 Mb/s, 9.52 ms (87.4 ms)
8 bordercore4.WillowSprings.cw.net (166.48.34.1)
-> 166.48.34.1 (2)
| 86 Mb/s, 1.13 ms (90.4 ms)
9?cern-cwe.WillowSprings.cw.net (166.48.34.6)
-> 166.48.34.6 (3)
| 130 Mb/s, 59.9 ms (211 ms)
10?cernh9-ar1-chicago.cern.ch (192.65.184.166)
-> 192.65.184.166 (2)
| ?? b/s, 356 us (211 ms)
11?cgate2.cern.ch (192.65.185.1)
| 2634 Mb/s, 135 us (211 ms)
12 cgate1-dmz.cern.ch (192.65.184.65)
-> 192.65.184.65 (3)
| 551 Mb/s, 327 us (212 ms)
13?r513-c-rci47-15-gb0.cern.ch (128.141.211.41)
-> 128.141.211.41 (1)
| 15 Mb/s, -225 us (216 ms)
14?dxcoms.cern.ch (137.138.28.176)
14 hops, rtt 210 ms (216 ms), bottleneck 15 Mb/s, pipe 425545 byte
[ Feedback ]
18
8/13/2019 High statistics ping results
19/19
READABILITY An Arc90 Laboratory Experiment
Excerpted fromHigh statistics ping results
http://www.slac.stanford.edu/comp/net/wan-mon/ping-hi-stat.html
http://lab.arc90.com/experiments/readability
19