High statistics ping results

8/13/2019 High statistics ping results

1/19

High statistics ping results

Created: May 14, 1999; last updated by: Les Cottrell on February 27, 2000

Page Contents

Introduction

To understand better how to interpret PingER results we decided to make a series

of one off high statistics ping measurements with shorter time frames than the

normal PingER measurements on both LAN and various WAN paths. The idea is

to look at the frequency distributions, the time variations for various types of

networks in the LAN and WAN environment, and to correlate the results with the

topology, routes and known performance issues. Our goal is also to compare theseresults with results from other high statitics delay measurements.

Unless otherwise noted, the pings were sent at one second intervals with a timeout

of 20 seconds and a payload (including the 8 ICMP protocol bytes) of 100 bytes.

Pings between hosts at same site

To understand the behavior of ping at a single site we used the NIKHEF ping client

(since it has a resolution of 1 usec.) running on a Redhat 5.2 Linux on an Intel

400MHz Pentium host (doris) at SLAC to various other hosts at SLAC separated

from doris by various network devices with various interface speeds from 10 Mbps

to 1 Gbps. Doris, the ping client, is connected to the network via a shared 10Mbps

hub that is connected to a 10Mbps edge switch port. The table below shows the

hosts pinged (i.e. acting as ping servers) together with their hardware and software

configurations and the connection between doris and the server. The edge switches

are Cisco Catalyst 5000s, the core switches are Cisco Catalyst 6500s, the farm and

server switches are Cisco Catalyst 5500s, and the core routers are Cisco Catalyst

8500s.

Server

name

Server

Hardware

Server

OS

Serverinterface

speed

Network connection devices & speeds

mercury Sun Ultra 5Solaris

5.6

10Mbps HDX

sharedSame shared 10 Mbps hub

charon Sun Ultra 1Solaris

5.6

10Mbps HDX

shared10Mbps to edge switch (cgb3) 10 Mbps to doris

1


2/19

bronco001 Sun Ultra 5Solaris

5.6

100Mbps

FDX switched

100 Mbps to farm switch, 1Gbps to core switch,

1Gbps to core router, 1Gbps to core switch,

100Mbps to edge switch, 10Mbps to doris

mailbox Sun Ultra 5Solaris

5.6

100 Mbps

FDX switched

100Mbps to server switch, 1Gbps to core switch,


100Mbps to edge switch, 10 Mbps to doris

grouse Sun Sparc1+

SunOS4.1.3.1

10Mbps HDXshared

10Mbps to edge switch, 100Mbps to core switch,


100Mbps to edge switch, 10Mbps to doris

A simple model to understand the median or minimum ping response times for an

unloaded local area network and lightly loaded hosts is to ignore the hubs (a hub

inserts about 1 bit time delay) and the cable lengths (for a site with cable runs of =1msec. and < 10msec., 100usec. for RTT >=10msec.

and < 100msec., and 1msec. for >=100msec.). and does not show up when using

equally spaced linear RTT bins. The double peak in the frequency distribution for

the two hosts on the same subnet is also a binning effect. and does not show up

when using linear bin-widths. On the other hand, by measuring the wire-time

difference between packets entering and leaving the server, the double peak seen

in the low RTT "peak" for the distribution of the two hosts on the same shared hub

is found to be be caused by the ping server. For another example of a pathological

RTT distribution caused by a ping server, see Pinger Measurement Pathologies.

2


3/19

Pings between 2 hosts on the same shared 10 Mbps hub

The following plots are for the Linux host to a Sun Ultra 5 (mercury) running

Solaris 5.6. The first plot shows the ping RTT in msec. for about 260,000 100 byte

pings started May 30, 1999.

The

second

plot

shows

the

frequency histogram of the ping RTTs with log scales. From measurements of the

"wire time" using NetXray running on a separate Windows NT on the same hub as

the ping server/responder host (mercury), we verified that the two peaks around

log10(RTT) = 0.5 are a function of the ping server/responder host itself.

Pings

between 2

hosts on

the same

subnet but

different

ports on

the sameswitch

The

following

plots are

from the

3


4/19

host to a Sun Ultra 1 (charon) running Solaris 5.6. The hosts are on the same Cisco

Catalyst 5000 switch but different 10 Mbps shared ports. the pings were stated on

May 30, 1999 at 12:38:20 PST and the packet loss was about 0.08%. The first plot

shows the behavior of about 260,000 ping RTTs as a function of time.

Thesecond

plot

shows

the

frequency distribution of the pings.

Pings

between 2

hosts atthe same

site but on

different

subnets

The

following

plots are

for 500,000ping RTTs

between a

Linux

Redhat 5.2 host (doris) and a Sun Sparc 1+ (grouse) running SunOS and the same

Linux host and a Surveyor host running on a Pentium II running FreeBSD. The

grouse pings were started on May 30, 1999. The two hosts are on separate subnets

4


5/19

and are separated by 4 switches and a router. The first plot shows the time

variation of the ping RTT.

The second

plot is the

frequencyhistogram

of the ping

RTT. The

blue line

shows the

cumulative

distribution function (CDF). The data was binned into 2 different bin widths to

provide a reasonable number of counts in the higer RTT bins: 0.1 msec. bins are

shown in magenta and extend out to 10 msec., and 10 msec. bins run from 10 to 100

msec. The counts in the 1 msec. wide bins are normalized to the 0.1 msec. wide

bins by dividing the count in the 1 msec. bins by 10. A simple power series fit to

the data between RTT 2.3 msec. and 61 msec. is also shown as a black line.

The distribution has a sharp peak with a median at 1.35 msec and with an Inter

Quartile Range (IQR) of 0.2 msec. There is also a high RTT tail.

The third plot

in this

subsection

shows the

time variation

of the ping

RTT for

306,000 pings

between the

Linux host

and the SLAC

Surveyor host.

5


6/19

The final

plot in

this

subsection show the frequency distribution of the ping RTTs between the Linux

host and the SLAc Surveyor host. The blue line shows the cumulative distributionfunction (CDF). The data is binned into 3 different bin widths The black dots are

for bins with a width of 0.1 msec. and are for RTT < 1 msec.. The magenta dots are

for bin widths of 1 msec. and are for RTTs < 10 msec.. The green dots have bin

widths of 10 msec. and cover the entire range of data. The binned data is

normalized by dividing the counts in the 1 msec. bins by 10 and the counts in the

6


7/19

10 msec. bins by 10. the black line is a simple power series fits to the data between

2.3 msec. and 61 msec. inclusive.

The distribution exhibits a sharp peak with a median at 0.9 msec. an IQR of 0.06

msec. and a high RTT tail. There are also secondary peaks at 10 msec. and 2.4 msec.

Pings between 2 ESnet sites

ESnet sites have excellent connectivity with low packet loss and a high speed

well-provisioned backbone that they connect to. Thus they provide an example of

"how good it can get". The ESnet operations center is at LBNL and SLAC is anESnet site with, at the time of the measurements below, a T3 interconnect to the

ESnet ATM backbone cloud. The SLAC link to ESnet is also lightly loaded with

peaks measured over 5 minutes only reaching about 50% utilization for the period

of interest.

The ping distribution for an extensive (500K samples) measurement between a host

at SLAC (minos.slac.stanford.edu) and a host at ESnet at LBNL (hershey.es.net), is

seen below starting at 9:01am on April 23, 1999 and ending at 3:59am on April 29

1999. The pings were separated by 1 second and the timeout was 20 seconds. It can

be seen that there is a narrow (IQR = 1msec.) peak at 4 msec. with a very long tail

extending out to beyond 750 msec. The black line is a fit to a power series with the

parameters shown.

7


8/19

If one plots this data on a log-log plot (see below) then it can be seen that there are

two time scales (4-18 msec. and 18-1000 msec.) with quite different behaviors. The

bulk of the data (99.8%) falls in the 4-18 msec. region. In the 4-18 msec. region (the

magenta points) the data falls of asy ~ A * RTT-6.6whereas beyond 18msec. (the

blue points) it falls off asy ~ B * RTT-1.7. The parameters of the fits are shown in the

chart. Note that in the 4-18 msec. region the data are histogrammed in 1 msec. bins,

whereas beyond that they are histogrammed in 10 msec. bins. and the 2 y scales are

adjusted appropriately (the one for the wider bins beyond 18 msec. is a factor 10

greater than the other). The green points are not used in the fits and are the data

histogrammed in 1 msec. bins for the range 19 msec. to 55 msec. The power law

exponent behavior in the region 4 - 18 msec. is that exhibited by very chaotic

processes such as fully developed turbulence or the stock market, whereas the data

beyond 18 msec. is more characteristic of heavy-tailed or long range similarity

behavior. A guess is that the transition at 20 msec reflects a change from delays

caused by simple queueing to delays caused by router processing and needs more

work to substantiate.

8


9/19

The autocorrelation function for the first 64000 RTTs for (there was no packet loss

in this period) is shown below. It can be seen that in general there is a very weak (pathchar -q 64 hershe

pathchar to hershey.es

mtu limitted to 8192 bytes at local host

doing 64 probes at each of 64 to 8192 by 2600 FLORA03.SLAC.Stanford.EDU (134.79.16.55)

| 77 Mb/s, 462 us (1.77 ms)

1 RTR-CGB5.SLAC.Stanford.EDU (134.79.19.3)

| 294 Mb/s, 218 us (2.43 ms)


11


12/19

| 18 Mb/s, 276 us (6.53 ms)

3 RTR-DMZ.SLAC.Stanford.EDU (134.79.111.4)

| ?? b/s, -85 us (2.44 ms)

4 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)

-> 192.68.191.18 (1)

| ?? b/s, 1.42 ms (5.13 ms)5?lbl1-atms.es.net (134.55.24.11)

| 245 Mb/s, 71 us (5.54 ms)

6 esnet-lbl.es.net (134.55.23.66)

| 9.7 Mb/s, 95 us (12.5 ms)

7 hershey.es.net (198.128.1.11)

7 hops, rtt 4.91 ms (12.5 ms), bottleneck 9.7 Mb/s, pipe 42418 byte

Pings between International sites

CERN, at the time of the measurements below, shared an 8Mbps link across the

Atlantic with the World Health Organization, IN2P3 in France and Switch (the

Swiss Academic network). The shared trans Atlantic link was reached over 80%

utilization for a 5 minute period during the measurement period and was normally

the bottleneck. The loading on the link is seen below. The green represents the

average (over 30 minutes) traffic to Switzerland and the blue is the average (over

30 minutes) traffic to the U.S. The dark green and magenta are the 5 minute

maxima. The ping measurements below were for the consecutive days labelled

Sun, Mon, Tue, Wed in the utilization graph below.

To better understand the behavior of ping Round Trip Time (RTT) in the WAN, we

pinged CERN (ping.cern.ch) from SLAC (minos.slac.stanford.edu) every minutewith a timeout of 20 seconds for 260K pings between 8:36 am Sunday May 9 and

10:35am Wednesday May 12, 1999 (PDT). The packet loss for these measurements

was about 0.053%. The distribution of the RTT is seen in the chart below.

12


13/19

The distribution shows a lot of structure. First there is a sharp peak at about 224

msec. with a width of (90% of the peak is contained in) 9.5 msec. On the high RTT

side of the peak several smaller peaks are seen, together with a long tail. If we lookat the individual RTTs in the high RTT tail beyond 260 msec. then we get the chart

shown below:

The clusters of points for Tuesday May 11, also show up in the Surveyor data as

shown in the graphs below:

Of

13


14/19

particular interest is the cluster around 18:00 hours on Tuesday May 11. The ping

RTT and loss data is shown for this data in the chart below. The loss is calculated

by looking for missing ping sequence numbers. The routes are obtained from

Surveyor measurements which use traceroute to measure the routes about every 15minutes.

14


15/19

There is a clear change in behavior starting at about 18:10 hours and stopping at

about 19:20 hours. At the start of this period there is a loss of 169 consecutive ping

packets (or a break in connectivity of 169 seconds, since the pings are sent at one

second intervals, while the network routing converges to a new route), and at the

end a further loss of 36 consecutive ping packets. Apart from this period the route

(as measured by traceroute) to CERN is from SLAC to ESnet to the New York

Sprint NAP, then to West Orange in New Jersey and thence back to Chicago to the

STAR-Tap and onto CERN. During the period from 18:10 hours to 19:20 hours, the

route is from SLAC to ESnet to BBN which goes via New York, London, to Geneva

and is more congested, and hence the increase in packet loss, but avoids the trip

back from New Jersey to Chicago (and so saves an extra 30 msec. in the round

trip). The complete routes can be seen below:

15


16/19

The ping RTT data for the cluster around 1:00am on May 11, 1999 can be seen in

more detail in the chart below. In the chart it can be seen that there is a complete

loss of connectivity (i.e. no pings responded) of about 14 minutes starting at about

1:07am until about 1:21am. After this performance looks fairly normal. Prior to the

loss of connectivity, there are periods of longer RTT (almost double) followed by

shorter losses of connectivity. For CERN to SLAC, Surveyor shows a change from

the normal route at 1:00am and 1:15am returning to the normal route at 1:35am.

For SLAC to CERN, Surveyor shows a change in route at 0:56am returning to the

normal route at the next measurement at 1:23am. The alternate routes are limited

to the SLAC site. This cluster is coincident with problems occurring as a result of

16


17/19

making changes to a core switch at SLAC.

The

cluster around 7:15am on May 11, 1999 shown in more detail below is actually 3

sudden changes in RTT from about 220 msec. to about 525 msec. and back after 1 to

2 minutes, with RTT top hat shaped peaks at about 7:14am to 7:16am, 7:19am to7:20am, and 7:23am to 7:24am. Surveyor traceroute samples did not coincide with

any of these peaks and saw no route changes. Only one packet was lost in the

period shown below. The black line is a moving average with the average being

over 10 seconds. It is inserted to help the eye discern the top hat peaks.

Surveyor also does not indicate any route changes for the clusters around 14:00

hours on May 11, 1999 or 15:00 hours or 18:30 hours on May 10, 1999.

The pathchar information for the normal path from SLAC to CERN is shown

below:

>pathchar -q 64 ping.cern.ch

pathchar to dxcoms.cern.ch (137.138.28.176)

mtu limitted to 8192 bytes at local host

doing 64 probes at each of 64 to 8192 by 260

17


18/19

0 FLORA03.SLAC.Stanford.EDU (134.79.16.55)

| 162 Mb/s, 369 us (1.14 ms)

1 RTR-CORE1.SLAC.Stanford.EDU (134.79.19.2)

| 115 Mb/s, 281 us (2.28 ms)


| 19 Mb/s, 242 us (6.29 ms)3 RTR-DMZ.SLAC.Stanford.EDU (134.79.111.4)

| ?? b/s, -100 us (2.29 ms)

4 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)

-> 192.68.191.18 (1)

| ?? b/s, 31.1 ms (64.4 ms)

5?nynap1-atms.es.net (134.55.24.9)

| 914 Mb/s, 118 us (64.7 ms)

6 1-sprint-nap.cw.net (192.157.69.11)

-> 192.157.69.11 (1)

| 1997 Mb/s, 1.72 ms (68.2 ms)

7?core4-hssi6-0-0.WestOrange.cw.net (204.70.10.225)

| 591 Mb/s, 9.52 ms (87.4 ms)

8 bordercore4.WillowSprings.cw.net (166.48.34.1)

-> 166.48.34.1 (2)

| 86 Mb/s, 1.13 ms (90.4 ms)

9?cern-cwe.WillowSprings.cw.net (166.48.34.6)

-> 166.48.34.6 (3)

| 130 Mb/s, 59.9 ms (211 ms)

10?cernh9-ar1-chicago.cern.ch (192.65.184.166)

-> 192.65.184.166 (2)

| ?? b/s, 356 us (211 ms)

11?cgate2.cern.ch (192.65.185.1)

| 2634 Mb/s, 135 us (211 ms)

12 cgate1-dmz.cern.ch (192.65.184.65)

-> 192.65.184.65 (3)

| 551 Mb/s, 327 us (212 ms)

13?r513-c-rci47-15-gb0.cern.ch (128.141.211.41)

-> 128.141.211.41 (1)

| 15 Mb/s, -225 us (216 ms)

14?dxcoms.cern.ch (137.138.28.176)

14 hops, rtt 210 ms (216 ms), bottleneck 15 Mb/s, pipe 425545 byte

[ Feedback ]

18


19/19

READABILITY An Arc90 Laboratory Experiment

Excerpted fromHigh statistics ping results

http://www.slac.stanford.edu/comp/net/wan-mon/ping-hi-stat.html

http://lab.arc90.com/experiments/readability

19

Documents

High statistics ping results