Upload
katy
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Internet Monitoring - Results. Les Cottrell & Warren Matthews SLAC < [email protected]> Presented at the XIWT Meeting, San Francisco, Mar 1998 http://www.slac.stanford.edu/grp/scs/net/talk/xiwt-mar98/ - PowerPoint PPT Presentation
Citation preview
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
1
Internet Monitoring - Results
Les Cottrell & Warren Matthews SLAC <[email protected]>
Presented at the XIWT Meeting, San Francisco, Mar 1998
http://www.slac.stanford.edu/grp/scs/net/talk/xiwt-mar98/
Partially funded by MICS joint SLAC/LBL proposal on Internet End-to-end Performance Monitoring (IEPM)
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
2
Outline of Talk
• What, why & how are we (ESnet/HENP community) measuring?
• What PingER measurement reports are available and what do they show– short, intermediate & long term
• Traffic volume & Traceroute measurements• Summary
– Deployment/development, Internet Performance, Next Steps
– Collaborations
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
3
Why go to the effort?• Apparent quality of Internet getting worse as size
and demands increases
• Internet woefully under-measured & under-instrumented
• Internet very diverse - no single path typical
• Users need:– realistic expectations, planning information– guidelines for setting SLAs– information to help in identifying problems– help to decide where to apply resources
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
4
Why the focus on Ping
• “Universally available”, easy to understand– no software for clients to install
• Low network impact
• Select hosts carefully, concerns over routers, loaded hosts etc.
• Provides end-to-end (user view vs network infrastructure view) loss, response time, reachability, unpredictability
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
5
Importance of Response Time
• Time is scarcest and most valuable commodity– Studies in late 70’s and early 80s showed the economic
value of Rapid Response Time • 0-0.4s High productivity interactive response
• 0.4-2s Fully interactive regime
• 2-12s Sporadically interactive regime
• 12s-600s Break in contact regime
– >600sBatch regime– Threshold around 4-5s complaints increase rapidly. – Voice has threshold around 100ms
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
6
Perception of Poor Packet Loss
• Above 4-6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate.
• The occurrence of long delays of 4 seconds or more at a frequency of 4-5% or more is also irritating for interactive activities such as telnet and X windows.
• Above 10-12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable.
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
7
Our Main Metric is Ping
• “Universally available”, easy to understand– no software for clients to install
• Low network impact
• select hosts carefully, concerns over routers, loaded hosts etc.
• Provides loss, response time, reachability, unpredictability
• Provides useful real world measures
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
8
Ping Response vs Web Response
y = 1.7135x + 719.83y = 2.5726x
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 100 200 300 400 500
Minimum Ping Response in msec.
GE
T R
esp
on
se i
n m
sec.
y = 2x
y = 1.71x + 720
y = 2.57x
HT
TP
GE
T R
espo
nse
(ms)
Minimum Ping Response (ms)
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
9
Method– Measurement
• Each Collection site keeps list of remote hosts to ping at sites it is interested in
• Every 30 mins ping each remote host with 11 * 100 byte followed by 10 * 1000 byte pings
• Min separation of pings is 1 second, timeout 20 seconds
• Throw away first ping
• Measure response, packet loss, host unreachable (no answer to any ping)
• Record loss, min/avg/max response time and make available
• Have Poisson sampling & median measurement in beta
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
10
Architecture
WWWWWW
AnalysisAnalysis AnalysisAnalysis
CollectingCollecting
CollectingCollecting
CollectingCollectingCollectingCollecting
RemoteRemote
RemoteRemoteRemoteRemote
RemoteRemote
HTTP
Pings
E.g. HEPNRC E.g. SLAC
Archive
Reports &Data
Cache
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
11
Long Term Reports
• Currently only available from SLAC
• Tabular reports generated automatically by SAS
• Monthly averages:– Response time, packet loss for prime time (SLAC)– Quiescent frequency– Reachability & Unpredictability
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
12
Monthly Packet Loss Sorted to show worst at top
Click here for180 day plot
Click here forExcel
Colored forquality
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
13
Graphical Analysis
• Use Excel manually or with macros for more detailed analysis– graphs, – means, medians, standard deviations, distributions,
percentiles– fits
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
14
Ranked packet loss for 3 months
Rome
UK
Stanford
Cincinnatti
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
15
Sawtooth EffectPacket Loss for UK sites Jan-95 thru Jan98 seen from
SLAC
0
5
10
15
20
25
30
35
40
Jan-
95
Mar
-95
May
-95
Jul-9
5
Sep-9
5
Nov-9
5
Jan-
96
Mar
-96
May
-96
Jul-9
6
Sep-9
6
Nov-9
6
Jan-
97
Mar
-97
May
-97
Jul-9
7
Sep-9
7
Nov-9
7
Jan-
98
% P
rim
e t
ime
pa
cle
t lo
ss
gla.ac.uk
rl.ac.uk
Added 45 Mbps (quadrupled capacity)
2 * capacity (+ 2Mbps) 3 * capacity
+ 9 Mbps
Holiday
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
16
RAL Last 180 Days plotLines are simply cubic splines fits to aid eye
Upper green and black points are response time in ms
Red & blue are weekday loss
Cyan are weekend loss
Note weekend/weekday differences
Note Xmas/New Year lull
Also note quick onset of saturation at end August & September
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
17
Ping Response & Loss between HEPNRC & Manchester Dec-Jan
‘97/98
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
18
Italian sites look similar to each otherPacket Loss for Italian sites Jan-95 thru Jan-98 seen from SLAC
0
5
10
15
20
25
30
35
40
Jan-
95
Ma
r-9
5
Ma
y-9
5
Jul-9
5
Se
p-9
5
No
v-9
5
Jan-
96
Ma
r-9
6
Ma
y-9
6
Jul-9
6
Se
p-9
6
No
v-9
6
Jan-
97
Ma
r-9
7
Ma
y-9
7
Jul-9
7
Se
p-9
7
No
v-9
7
Jan-
98
% P
rim
e t
ime
pa
ck
et
los
s
ge.infn.it
lnf.infn.it
na.infn.it
pd.infn.it
roma1.infn.it
ts.infn.it
cern.ch
desy.de
ethz.ch
fzu.cz
gla.ac.uk
ihep.ac.cn
inp.nsk.su
kek.jp
phy.tu-dresden.derl.ac.uk
rmki.kfki.hu
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
19
Representative International HENP Site Loss Jan-95 thru Nov-97
• Note RL (UK) saw-tooths as add UK-US bandwidth (Apr-96, Feb-97, Aug-97)
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
20
Aggregation• Group measurements, for example:
– by area (e.g. N. America E, N. America E, W. Europe/Japan, others, by country)
– trans-oceanic links– separation e.g. number of hops, time zones crossed, IXPs
crossed– ISP (ESnet, vBNS/I2, ...)– by monitoring site– one site seen from multiple sites– common interest/affiliation (XIWT, HENP …)– user selectable
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
21
Group Selection for Ping Loss Plots
• Allow wild cards
• Allow pre-selected groups
• In beta test
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
22
• Improved between 1 and 2.5% / month
• Response & Loss similar improvements
Group Response Time Jan-95 Nov-97
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
23
• Frequency of zero packet loss (for all time - not cut on prime time)
Network Quiescence
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
24
Ping Loss Quality
• Want quick to grasp indicator of link quality
• Loss is the most sensitive indicator– loss of packet requires ~ 4 sec TCP retry timeout– Studies on economic value of response time by IBM
showed there is a threshold around 4-5secs where complaints increase.
– 0-1% = Good 1-2.5% = Acceptable– 2.5%-5% = Poor 5%-12% = Very Poor– > 12% = Bad
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
25
Quality Distributions
• ESnet median good quality
• All other groups poor or very poor
• Critical to have good peering
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
26
Traffic Growth
• Read out of external router
Exponential growth from 2.5-6%
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
27
Traffic Volume for Germany (DFN)
DFN T1 Utilization for 15 Jan ‘98 (5 min averages)
ESnet Serial Traffic Peaks -- December 1997 (created Jan 15, 1998 ) Line Speed = 15360000 bits/sec
Number of Peaks vs Percent of Utilization
cebaf1-dfn2_2min
cebaf1-dfn2_10min
cebaf1-dfn2_60min
dfn2-cebaf1_2min
dfn2-cebaf1_10min
dfn2-cebaf1_60min
0 5,000 10,000 15,000 20,000 25,000
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
# Samples
DFN T1 Utilization 15 Jan ‘98 (5 min averages)
# of 2 min periods in Dec-96 with peak utilization > y %
To USFrom US
Green = to USBlue = from US
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
28
ESnet Serial Traffic Peaks -- December 1997 (created Jan 15, 1998 ) Line Speed = 15360000 bits/sec
Number of Peaks vs Percent of Utilization
pppl1-infn1_2min
pppl1-infn1_10min
pppl1-infn1_60min
infn1-pppl1_2min
infn1-pppl1_10min
infn1-pppl1_60min
0 5,000 10,000 15,000 20,000 25,000
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
INFN T1 Link Utilization for 16 Jan’98
Traffic Volume for ESnet Italy Link
# of 2 min periods in Dec-96 with peak utilization > y %
From US
To US
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
29
Traceroute
• Reverse traceroute servers– provides traceroute from Web server to client– available at about 30 HENP & ESnet sites
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
30
TracePingM
ulip
le r
oute
s se
en
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
31
Traceroute
• Reverse traceroute servers
• Traceping
• TopologyMap– Ellipses show node on
route
– Open ellipse is measurement node
– Blue ellipse no reachable
• Keeping history
From TRIUMF
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
32
• Deployment Development– ESnet/HENP has 14 Collection sites in 8 countries
collecting data on > 500 links involving 22 countries– 600MB/month/link, 6 bps/link, .25 FTE @ analysis site,
1.5-2.5 FTE on analysis– HEPNRC gathering, archiving– Long term reports being ported to HEPNRC from SLAC– Long term analysis today requires tool like SAS
• Cost of SAS (or Oracle) license problem for analysis site
– XIWT/IPWT deployed ~ 6 collection sites using PingER tools
Summary
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
33
Summary• Deployment Development
• Internet Performance – Performance within ESnet is good– Performance between ESnet & other sites is poor to very
poor on average• one of main causes is congestion points, so peering is critical
– ESnet traffic accepted from major HENP labs growing by 2.5-6% per month
– Response time improving by 1-2% / month– Packet loss improving between SLAC & other sites by
3% / month
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
34
Summary• Deployment Development
• Internet Performance (continued):– Links to sites outside N. America vary from good (KEK)
to bad– Some of the bad sites are to be expected, e.g. FSU,
China, Czeck Republic, some surprises such as UK– CERN, France, Germany acceptable to poor
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
35
Summary• Deployment Development
• Internet Performance
• Next Steps– Improve tools:
• Deploy Poisson sampling & median measurements• Extend MapPing & bring to production (work with NLANR), port traceping
to Unix, extend deployment of traceroute topology map
– Make long term reports at Analysis site available & understandable• Get group defining/selection going• Look at & compare site performance seen from multiple sites• Look at new visualization techniques
– Look into prediction (extrapolations, develop models, configure and validate with data)
– Pursue IETF Surveyor & NIMI deployment
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
36
National Internet Measurement Infrastructure (NIMI)
• Secure, scalable infrastructure for scheduling monitoring, gathering data
• Minimal amount of human intervention
• Inexpensive probe built on PC FreeBSD platform
• Dynamic - can add/modify measurement suites, initially includes:– Traceroute– TReno - measures bulk transfer thruput– Poip - one way ping
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
37
Asymmetric One-way Delays
0%
20%
Loss Loss
Delay Delay
Advanced to U Chicago U Chicago to Advanced
0ms
300ms
0 24
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
38
Summary• Deployment Development
• Internet Performance
• Next Steps
• Lots of collaboration:– SLAC & HEPNRC
– 14 collection sites, ~ 400 remote sites
– Collection site tools CERN & CNAF/ICFA
– Oxford/TracePing
– MapPing/MAPNet/NLANR
– TRIUMF Traceroute topology Map
– NIMI/LBNL & Surveyor/IETF
– XIWT/IPWT
– Talks at IETF, XIWT, ICFA, ESCC ...
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
39
Summary• Deployment Development
• Internet Performance
• Next Steps
• Lots of collaboration:
• To join:– Collection site needs:
• perl5 & HTTP server
• install timeping & pingdata (need only cgi-bin access, not root)
• decide on links to monitor
• Get an analysis site to retrieve & generate graphs, or at least get connectivity.pl & ping_data_plot.pl
– Need volunteers to work on analysis scripts, some of it will require SAS, also need Java applets to visualize,
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
40
More Information• ICFA Monitoring WG home page (links to status report,
meeting notes, how to access data, and code)– http://www.slac.stanford.edu/xorg/icfa/ntf/home.html
• WAN Monitoring at SLAC has lots of links– http://www.slac.stanford.edu/comp/net/wan-mon.html
• Tutorial on WAN Monitoring– http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html
• MapPing Tool:– http://www.slac.stanford.edu/~warrenm/work/java/newjava/mapping.html
• NIMI http://www.psc.edu/~mahdavi/nimi_paper/NIMI.html
\\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt
41
Internet MonitoringInternet Monitoring