41
\\pcbackup\users\cottrell \xiwg\xiwt-mar98.ppt 1 Internet Monitoring - Results Les Cottrell & Warren Matthews SLAC < cottrell @ slac . stanford . edu > <[email protected]> Presented at the XIWT Meeting, San Francisco, Mar 1998 http://www.slac.stanford.edu/grp/scs/net/talk/xiwt-mar98/ Partially funded by MICS joint SLAC/LBL proposal on Internet End-to-end Performance Monitoring (IEPM)

Internet Monitoring - Results

  • Upload
    katy

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Internet Monitoring - Results. Les Cottrell & Warren Matthews SLAC < [email protected]> Presented at the XIWT Meeting, San Francisco, Mar 1998 http://www.slac.stanford.edu/grp/scs/net/talk/xiwt-mar98/ - PowerPoint PPT Presentation

Citation preview

Page 1: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

1

Internet Monitoring - Results

Les Cottrell & Warren Matthews SLAC <[email protected]>

<[email protected]>

Presented at the XIWT Meeting, San Francisco, Mar 1998

http://www.slac.stanford.edu/grp/scs/net/talk/xiwt-mar98/

Partially funded by MICS joint SLAC/LBL proposal on Internet End-to-end Performance Monitoring (IEPM)

Page 2: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

2

Outline of Talk

• What, why & how are we (ESnet/HENP community) measuring?

• What PingER measurement reports are available and what do they show– short, intermediate & long term

• Traffic volume & Traceroute measurements• Summary

– Deployment/development, Internet Performance, Next Steps

– Collaborations

Page 3: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

3

Why go to the effort?• Apparent quality of Internet getting worse as size

and demands increases

• Internet woefully under-measured & under-instrumented

• Internet very diverse - no single path typical

• Users need:– realistic expectations, planning information– guidelines for setting SLAs– information to help in identifying problems– help to decide where to apply resources

Page 4: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

4

Why the focus on Ping

• “Universally available”, easy to understand– no software for clients to install

• Low network impact

• Select hosts carefully, concerns over routers, loaded hosts etc.

• Provides end-to-end (user view vs network infrastructure view) loss, response time, reachability, unpredictability

Page 5: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

5

Importance of Response Time

• Time is scarcest and most valuable commodity– Studies in late 70’s and early 80s showed the economic

value of Rapid Response Time • 0-0.4s High productivity interactive response

• 0.4-2s Fully interactive regime

• 2-12s Sporadically interactive regime

• 12s-600s Break in contact regime

– >600sBatch regime– Threshold around 4-5s complaints increase rapidly. – Voice has threshold around 100ms

Page 6: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

6

Perception of Poor Packet Loss

• Above 4-6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate.

• The occurrence of long delays of 4 seconds or more at a frequency of 4-5% or more is also irritating for interactive activities such as telnet and X windows.

• Above 10-12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable.

Page 7: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

7

Our Main Metric is Ping

• “Universally available”, easy to understand– no software for clients to install

• Low network impact

• select hosts carefully, concerns over routers, loaded hosts etc.

• Provides loss, response time, reachability, unpredictability

• Provides useful real world measures

Page 8: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

8

Ping Response vs Web Response

y = 1.7135x + 719.83y = 2.5726x

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 100 200 300 400 500

Minimum Ping Response in msec.

GE

T R

esp

on

se i

n m

sec.

y = 2x

y = 1.71x + 720

y = 2.57x

HT

TP

GE

T R

espo

nse

(ms)

Minimum Ping Response (ms)

Page 9: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

9

Method– Measurement

• Each Collection site keeps list of remote hosts to ping at sites it is interested in

• Every 30 mins ping each remote host with 11 * 100 byte followed by 10 * 1000 byte pings

• Min separation of pings is 1 second, timeout 20 seconds

• Throw away first ping

• Measure response, packet loss, host unreachable (no answer to any ping)

• Record loss, min/avg/max response time and make available

• Have Poisson sampling & median measurement in beta

Page 10: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

10

Architecture

WWWWWW

AnalysisAnalysis AnalysisAnalysis

CollectingCollecting

CollectingCollecting

CollectingCollectingCollectingCollecting

RemoteRemote

RemoteRemoteRemoteRemote

RemoteRemote

HTTP

Pings

E.g. HEPNRC E.g. SLAC

Archive

Reports &Data

Cache

Page 11: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

11

Long Term Reports

• Currently only available from SLAC

• Tabular reports generated automatically by SAS

• Monthly averages:– Response time, packet loss for prime time (SLAC)– Quiescent frequency– Reachability & Unpredictability

Page 12: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

12

Monthly Packet Loss Sorted to show worst at top

Click here for180 day plot

Click here forExcel

Colored forquality

Page 13: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

13

Graphical Analysis

• Use Excel manually or with macros for more detailed analysis– graphs, – means, medians, standard deviations, distributions,

percentiles– fits

Page 14: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

14

Ranked packet loss for 3 months

Rome

UK

Stanford

Cincinnatti

Page 15: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

15

Sawtooth EffectPacket Loss for UK sites Jan-95 thru Jan98 seen from

SLAC

0

5

10

15

20

25

30

35

40

Jan-

95

Mar

-95

May

-95

Jul-9

5

Sep-9

5

Nov-9

5

Jan-

96

Mar

-96

May

-96

Jul-9

6

Sep-9

6

Nov-9

6

Jan-

97

Mar

-97

May

-97

Jul-9

7

Sep-9

7

Nov-9

7

Jan-

98

% P

rim

e t

ime

pa

cle

t lo

ss

gla.ac.uk

rl.ac.uk

Added 45 Mbps (quadrupled capacity)

2 * capacity (+ 2Mbps) 3 * capacity

+ 9 Mbps

Holiday

Page 16: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

16

RAL Last 180 Days plotLines are simply cubic splines fits to aid eye

Upper green and black points are response time in ms

Red & blue are weekday loss

Cyan are weekend loss

Note weekend/weekday differences

Note Xmas/New Year lull

Also note quick onset of saturation at end August & September

Page 17: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

17

Ping Response & Loss between HEPNRC & Manchester Dec-Jan

‘97/98

Page 18: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

18

Italian sites look similar to each otherPacket Loss for Italian sites Jan-95 thru Jan-98 seen from SLAC

0

5

10

15

20

25

30

35

40

Jan-

95

Ma

r-9

5

Ma

y-9

5

Jul-9

5

Se

p-9

5

No

v-9

5

Jan-

96

Ma

r-9

6

Ma

y-9

6

Jul-9

6

Se

p-9

6

No

v-9

6

Jan-

97

Ma

r-9

7

Ma

y-9

7

Jul-9

7

Se

p-9

7

No

v-9

7

Jan-

98

% P

rim

e t

ime

pa

ck

et

los

s

ge.infn.it

lnf.infn.it

na.infn.it

pd.infn.it

roma1.infn.it

ts.infn.it

cern.ch

desy.de

ethz.ch

fzu.cz

gla.ac.uk

ihep.ac.cn

inp.nsk.su

kek.jp

phy.tu-dresden.derl.ac.uk

rmki.kfki.hu

Page 19: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

19

Representative International HENP Site Loss Jan-95 thru Nov-97

• Note RL (UK) saw-tooths as add UK-US bandwidth (Apr-96, Feb-97, Aug-97)

Page 20: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

20

Aggregation• Group measurements, for example:

– by area (e.g. N. America E, N. America E, W. Europe/Japan, others, by country)

– trans-oceanic links– separation e.g. number of hops, time zones crossed, IXPs

crossed– ISP (ESnet, vBNS/I2, ...)– by monitoring site– one site seen from multiple sites– common interest/affiliation (XIWT, HENP …)– user selectable

Page 21: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

21

Group Selection for Ping Loss Plots

• Allow wild cards

• Allow pre-selected groups

• In beta test

Page 22: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

22

• Improved between 1 and 2.5% / month

• Response & Loss similar improvements

Group Response Time Jan-95 Nov-97

Page 23: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

23

• Frequency of zero packet loss (for all time - not cut on prime time)

Network Quiescence

Page 24: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

24

Ping Loss Quality

• Want quick to grasp indicator of link quality

• Loss is the most sensitive indicator– loss of packet requires ~ 4 sec TCP retry timeout– Studies on economic value of response time by IBM

showed there is a threshold around 4-5secs where complaints increase.

– 0-1% = Good 1-2.5% = Acceptable– 2.5%-5% = Poor 5%-12% = Very Poor– > 12% = Bad

Page 25: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

25

Quality Distributions

• ESnet median good quality

• All other groups poor or very poor

• Critical to have good peering

Page 26: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

26

Traffic Growth

• Read out of external router

Exponential growth from 2.5-6%

Page 27: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

27

Traffic Volume for Germany (DFN)

DFN T1 Utilization for 15 Jan ‘98 (5 min averages)

ESnet Serial Traffic Peaks -- December 1997 (created Jan 15, 1998 ) Line Speed = 15360000 bits/sec

Number of Peaks vs Percent of Utilization

cebaf1-dfn2_2min

cebaf1-dfn2_10min

cebaf1-dfn2_60min

dfn2-cebaf1_2min

dfn2-cebaf1_10min

dfn2-cebaf1_60min

0 5,000 10,000 15,000 20,000 25,000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

# Samples

DFN T1 Utilization 15 Jan ‘98 (5 min averages)

# of 2 min periods in Dec-96 with peak utilization > y %

To USFrom US

Green = to USBlue = from US

Page 28: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

28

ESnet Serial Traffic Peaks -- December 1997 (created Jan 15, 1998 ) Line Speed = 15360000 bits/sec

Number of Peaks vs Percent of Utilization

pppl1-infn1_2min

pppl1-infn1_10min

pppl1-infn1_60min

infn1-pppl1_2min

infn1-pppl1_10min

infn1-pppl1_60min

0 5,000 10,000 15,000 20,000 25,000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

INFN T1 Link Utilization for 16 Jan’98

Traffic Volume for ESnet Italy Link

# of 2 min periods in Dec-96 with peak utilization > y %

From US

To US

Page 29: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

29

Traceroute

• Reverse traceroute servers– provides traceroute from Web server to client– available at about 30 HENP & ESnet sites

Page 30: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

30

TracePingM

ulip

le r

oute

s se

en

Page 31: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

31

Traceroute

• Reverse traceroute servers

• Traceping

• TopologyMap– Ellipses show node on

route

– Open ellipse is measurement node

– Blue ellipse no reachable

• Keeping history

From TRIUMF

Page 32: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

32

• Deployment Development– ESnet/HENP has 14 Collection sites in 8 countries

collecting data on > 500 links involving 22 countries– 600MB/month/link, 6 bps/link, .25 FTE @ analysis site,

1.5-2.5 FTE on analysis– HEPNRC gathering, archiving– Long term reports being ported to HEPNRC from SLAC– Long term analysis today requires tool like SAS

• Cost of SAS (or Oracle) license problem for analysis site

– XIWT/IPWT deployed ~ 6 collection sites using PingER tools

Summary

Page 33: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

33

Summary• Deployment Development

• Internet Performance – Performance within ESnet is good– Performance between ESnet & other sites is poor to very

poor on average• one of main causes is congestion points, so peering is critical

– ESnet traffic accepted from major HENP labs growing by 2.5-6% per month

– Response time improving by 1-2% / month– Packet loss improving between SLAC & other sites by

3% / month

Page 34: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

34

Summary• Deployment Development

• Internet Performance (continued):– Links to sites outside N. America vary from good (KEK)

to bad– Some of the bad sites are to be expected, e.g. FSU,

China, Czeck Republic, some surprises such as UK– CERN, France, Germany acceptable to poor

Page 35: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

35

Summary• Deployment Development

• Internet Performance

• Next Steps– Improve tools:

• Deploy Poisson sampling & median measurements• Extend MapPing & bring to production (work with NLANR), port traceping

to Unix, extend deployment of traceroute topology map

– Make long term reports at Analysis site available & understandable• Get group defining/selection going• Look at & compare site performance seen from multiple sites• Look at new visualization techniques

– Look into prediction (extrapolations, develop models, configure and validate with data)

– Pursue IETF Surveyor & NIMI deployment

Page 36: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

36

National Internet Measurement Infrastructure (NIMI)

• Secure, scalable infrastructure for scheduling monitoring, gathering data

• Minimal amount of human intervention

• Inexpensive probe built on PC FreeBSD platform

• Dynamic - can add/modify measurement suites, initially includes:– Traceroute– TReno - measures bulk transfer thruput– Poip - one way ping

Page 37: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

37

Asymmetric One-way Delays

0%

20%

Loss Loss

Delay Delay

Advanced to U Chicago U Chicago to Advanced

0ms

300ms

0 24

Page 38: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

38

Summary• Deployment Development

• Internet Performance

• Next Steps

• Lots of collaboration:– SLAC & HEPNRC

– 14 collection sites, ~ 400 remote sites

– Collection site tools CERN & CNAF/ICFA

– Oxford/TracePing

– MapPing/MAPNet/NLANR

– TRIUMF Traceroute topology Map

– NIMI/LBNL & Surveyor/IETF

– XIWT/IPWT

– Talks at IETF, XIWT, ICFA, ESCC ...

Page 39: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

39

Summary• Deployment Development

• Internet Performance

• Next Steps

• Lots of collaboration:

• To join:– Collection site needs:

• perl5 & HTTP server

• install timeping & pingdata (need only cgi-bin access, not root)

• decide on links to monitor

• Get an analysis site to retrieve & generate graphs, or at least get connectivity.pl & ping_data_plot.pl

– Need volunteers to work on analysis scripts, some of it will require SAS, also need Java applets to visualize,

Page 40: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

40

More Information• ICFA Monitoring WG home page (links to status report,

meeting notes, how to access data, and code)– http://www.slac.stanford.edu/xorg/icfa/ntf/home.html

• WAN Monitoring at SLAC has lots of links– http://www.slac.stanford.edu/comp/net/wan-mon.html

• Tutorial on WAN Monitoring– http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html

• MapPing Tool:– http://www.slac.stanford.edu/~warrenm/work/java/newjava/mapping.html

• NIMI http://www.psc.edu/~mahdavi/nimi_paper/NIMI.html

Page 41: Internet Monitoring - Results

\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

41

Internet MonitoringInternet Monitoring