60
1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP

1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

Embed Size (px)

Citation preview

Page 1: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

1

Internet Monitoring

Les Cottrell – SLACPresented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan,

March 15, 2005

Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP

Page 2: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

2

Overview• Why is measurement difficult yet important?

• LAN vs WAN

• SNMP

• Effects of measurement interval

• Passive

• Active– Tools including some results on Digital Divide

• Trouble shooting– Tools, how to find things & who to tell

• New challenges

Page 3: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

3

Why is measurement difficult?• Internet's evolution as a composition of independently

developed and deployed protocols, technologies, and core applications

• Diversity, highly unpredictable, hard to find “invariants” • Rapid evolution & change, no equilibrium so far

– Findings may be out of date

• Measurement not high on vendors list of priorities– Resources/skill focus on more interesting an profitable issues– Tools lacking or inadequate– Implementations poor & not fully tested with new releases

• ISPs worried about providing access to core, making results public, & privacy issues

• The phone connection oriented model (Poisson distributions of session length etc.) does not work for Internet traffic (heavy tails, self similar behavior, multi-fractals etc.)

Page 4: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

4

Add to that …• Distributed systems are very hard

– A distributed system is one in which I can't get my work done because a computer I've never heard of has failed. Butler Lampson

• Network is deliberately transparent• The bottlenecks can be in any of the following components:

– the applications– the OS– the disks, NICs, bus, memory, etc. on sender or receiver– the network switches and routers, and so on

• Problems may not be logical– Most problems are operator errors, configurations, bugs

• When building distributed systems, we often observe unexpectedly low performance

• the reasons for which are usually not obvious

• Just when you think you’ve cracked it, in steps security

Page 5: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

5

Why is measurement important?• End users & network managers need to be able to identify &

track problems• Choosing an ISP, setting a realistic service level agreement,

and verifying it is being met• Choosing routes when more than one is available• Setting expectations:

– Deciding which links need upgrading

– Deciding where to place collaboration components such as a regional computing center, software development

– How well will an application work (e.g. VoIP)

• Application steering (e.g. forecasting)– Grid middleware, e.g. replication manager

Page 6: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

6

LAN vs WAN• Measuring the LAN

– Network admin has control so:• Can read MIBs from devices• Can within limits passively sniff traffic • Know the routes between devices

– Manually for small networks– Automated for large networks

• Measuring the WAN– No admin control, unless you are an ISP

• Can’t read information out of routers• May not be able to sniff/trace traffic due to privacy/security concerns• Don’t know route details between points, may change, not under your

control, may be able to deduce some of it

– So typically have to make do with what can be measured from end to end with very limited information from intermediate equipment hops.

Page 7: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

7

SNMP (Simple Network Management Protocol)• Example of an Application, usually built on UDP• Defacto standard for network management• Created by IETF to address short term needs of TCP/IP• Consists of:

– Management Information Bases (MIBs)• Store information about managed object (host, router, switch etc.) – system

&status info, performance & configuration data

– Remote Network Monitoring (RMON) is a management tool for passively watching line traffic

– SNMP communication protocol to read out data and set parameters• Polling protocol, manager asks questions & agent responds

Page 8: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

8

SNMP Model

• NMS contains manager software to send & receive SNMP messages to Agents

• Agent is a software component residing on a managed node, responds to SNMP queries, performs updates & reports problems

• MIBs resides on nodes and at NMS and is a logical description of all network management data.

TCP/IP net

AgentMIB

AgentMIB

AgentMIB

AgentMIB

AgentMIB

AgentMIB

Network Management Station(NMS)

Page 9: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

9

SNMP version 1 limitations• Authentication is inadequate:

– Password (community string) placed in clear in SNMP messages

• MIB variables must be polled separately, i.e. entire MIB cannot be fetched with single command

• SNMPv2 and v3 attempt to address these and other limitations

• Despite limitations, SNMP has been a huge success– Provides device and link utilization (byte, packets) and errors

– Lot of facilities/tools built around SNMP to provide reports for sites

– Security concerns limit access typically to very limited set of owner/admins

• E.g. ISPs won’t let you poll their devices

Page 10: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

10

SNMP Examples• Using MRTG to display Router bits/s MIB variable

CERNtrans-Atlantictraffic

Page 11: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

11

Averaging/Sampling intervals• Typical measurements of utilization are made for 5

minute intervals or longer in order not to create much impact.

• Interactive human interactions require second or sub-second response

• So it is interesting to see the difference between measurement made with different time frames.

Page 12: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

12

Utilization with different

averaging times• Same data, measured Mbits/s

every 5 secs

• Average over different time intervals

• Does not get a lot smoother

• May indicate multi-fractal behavior

5 secs

5 mins

1 hour

Page 13: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

13

Averages vs maxima• Maximum of all 5

sec samples can be factor of 2 or more greater than the average over 5 minute intervals

Page 14: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

14

Lot of heavy FTP activity• The difference

depends on traffic type

• Only 20% difference in max & average

Page 15: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

15

Passive vs. Active Monitoring• Active injects traffic on demand• Passive watches things as they happen

– Network device records information• Packets, bytes, errors … kept in MIBs retrieved by SNMP

– Devices (e.g. probe) capture/watch packets as they pass• Router, switch, sniffer, host in promiscuous (tcpdump)

• Complementary to one another:– Passive:

• does not inject extra traffic, measures real traffic• Polling to gather data generates traffic, also gathers large amounts of data

– Active:• provides explicit control on the generation of packets for measurement scenarios• testing what you want, when you need it. • Injects extra artificial traffic

• Can do both, e.g. start active measurement and look at passively

Page 16: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

16

Passive tools• SNMP• Hardware probes e.g. Sniffer, NetScout, can be stand-alone

or remotely access from a central management station • Software probes: snoop, tcpdump, require promiscous

access to NIC card, i.e. root/sudo access• Flow measurement: netramet, OCxMon/CoralReef, Netflow

Page 17: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

17

Example: Passive site border monitoring• Use Cisco Netflow in Catalyst 6509 with MSFC, on

SLAC border• Gather about 200MBytes/day of flow data• The raw data records include source and destination

addresses and ports, the protocol, packet, octet and flow counts, and start and end times of the flows– Much less detailed than saving headers of all packets, but

good compromise– Top talkers history and daily (from & to), tlds, vlans,

protocol and application utilization

• Use for network & security

Page 18: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

18

SLAC Traffic profileSLAC offsite links: OC3 to ESnet, 1Gbps to Stanford U & thence OC12 to I2 OC48 to NTONProfile bulk-data xfer dominates

SSHFTP

HTTP

Mbp

s in

Mbp

s ou

tLast 6 months 2 Days

bbftp

iperf

Page 19: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

19

Top talkers by protocolH

ostn

ame

MBytes/day (log scale)1001 10000Volume dominated by single

Application - bbcp

Page 20: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

20

Flow sizes

Heavy tailed, in ~ out, UDP flows shorter than TCP, packet~bytes75% TCP-in < 5kBytes, 75% TCP-out < 1.5kBytes (<10pkts)UDP 80% < 600Bytes (75% < 3 pkts), ~10 * more TCP than UDPTop UDP = AFS (>55%), Real(~25%), SNMP(~1.4%)

SNMP

RealA/V

AFS fileserver

Page 21: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

21

Flow lengths• 60% of TCP flows less than 1 second

• Would expect TCP streams longer lived – But 60% of UDP flows over 10 seconds, maybe due to

heavy use of AFS

Page 22: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

22

Some Active Measurement Tools• Ping connectivity, RTT & loss

– flavors of ping, fping, Linux vs Solaris ping– but blocking & rate limiting

• Alternative synack, but can look like DoS attack• Sting: measures one way loss• Traceroute

– How it works, what it provides– Reverse traceroute servers– Traceroute archives

• Combining ping & traceroute, – traceping, pingroute

• Pathchar, pchar, pipechar, bprobe, abing etc.• Iperf, netperf, ttcp, FTP …

Page 23: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

23

Ping• ICMP client/server application built on IP

– Client send ICMP echo request, server sends reply– Server usually in kernel, so reliable & fast

• User can specify number of data bytes. Client puts timestamp in data bytes. Compares timestamp with time when echo comes back to get RTT

• Many flavors (e.g. fping) and options– packet length, number of tries, timeout, separation …

• Ping localhost (127.0.0.1) first, then gateway IP address etc.

Page 24: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

24

Ping example

syrup:/home$ ping -c 6 -s 64 thumper.bellcore.com PING thumper.bellcore.com (128.96.41.1): 64 data bytes 72 bytes from 128.96.41.1: icmp_seq=0 ttl=240 time=641.8 ms 72 bytes from 128.96.41.1: icmp_seq=2 ttl=240 time=1072.7 ms 72 bytes from 128.96.41.1: icmp_seq=3 ttl=240 time=1447.4 ms 72 bytes from 128.96.41.1: icmp_seq=4 ttl=240 time=758.5 ms 72 bytes from 128.96.41.1: icmp_seq=5 ttl=240 time=482.1 ms --- thumper.bellcore.com ping statistics --- 6 packets transmitted, 5

packets received, 16% packet loss round-trip min/avg/max = 482.1/880.5/1447.4 ms

Repeat count Packet size Remote host

RTT

Missing seq #

Summary

Page 25: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

25

Traceroute• UDP/ICMP tool to show route packets take from local to

remote host

17cottrell@flora06:~>traceroute -q 1 -m 20 lhr.comsats.net.pktraceroute to lhr.comsats.net.pk (210.56.16.10), 20 hops max, 40 byte packets 1 RTR-CORE1.SLAC.Stanford.EDU (134.79.19.2) 0.642 ms 2 RTR-MSFC-DMZ.SLAC.Stanford.EDU (134.79.135.21) 0.616 ms 3 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.66) 0.716 ms 4 snv-slac.es.net (134.55.208.30) 1.377 ms 5 nyc-snv.es.net (134.55.205.22) 75.536 ms 6 nynap-nyc.es.net (134.55.208.146) 80.629 ms 7 gin-nyy-bbl.teleglobe.net (192.157.69.33) 154.742 ms 8 if-1-0-1.bb5.NewYork.Teleglobe.net (207.45.223.5) 137.403 ms 9 if-12-0-0.bb6.NewYork.Teleglobe.net (207.45.221.72) 135.850 ms10 207.45.205.18 (207.45.205.18) 128.648 ms11 210.56.31.94 (210.56.31.94) 762.150 ms12 islamabad-gw2.comsats.net.pk (210.56.8.4) 751.851 ms13 *14 lhr.comsats.net.pk (210.56.16.10) 827.301 ms

Probes/hop

Max hopsRemote host

No response:Lost packet or router

ignores

Page 26: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

26

Reverse traceroute servers• Reverse traceroute server runs as CGI script in web server• Allow measurement of route from other end. Important for

asymmetric routes. See e.g.– www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html

• CAIDA map of reverse traceroute servers– www.caida.org/analysis/routing/reversetrace/

Page 27: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

27

Pingroute• Run traceroute, then ping each router n times

– helps identify where in route the problems start to occur

• Routers may not respond to pings, or may treat pings directed at them, differently to other packets

Page 28: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

28

Path characterization• Pathchar

– sends multiple packets of varying sizes to each router along route

– measures minimum response time– plot min RTT vs packet size to get bandwidth– calculate differences to get individual hop characteristics– measures for each hop: BW, queuing, delay/hop– can take a long time

• Pipechar/abing– Also sends back-to-back packets and measures separation

on return– Much faster– Finds bottleneck

Bottleneck

Min spacingAt bottleneck Spacing preserved

On higher speed links

Page 29: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

29

Network throughput• Iperf

– Client generates & sends UDP or TCP packets– Server receives receives packets– Can select port, maximum window size, port , duration,

Mbytes to send etc.– Client/server communicate packets seen etc.– Reports on throughput

• Requires sever to be installed at remote site, i.e. friendly administrators or logon account and password

Page 30: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

30

Iperf example

25cottrell@flora06:~>iperf -p 5008 -w 512K -P 3 -c sunstats.cern.ch

------------------------------------------------------------

Client connecting to sunstats.cern.ch, TCP port 5008

TCP window size: 512 KByte

------------------------------------------------------------

[ 6] local 134.79.16.101 port 57582 connected with 192.65.185.20 port 5008

[ 5] local 134.79.16.101 port 57581 connected with 192.65.185.20 port 5008

[ 4] local 134.79.16.101 port 57580 connected with 192.65.185.20 port 5008

[ ID] Interval Transfer Bandwidth

[ 4] 0.0-10.3 sec 19.6 MBytes 15.3 Mbits/sec

[ 5] 0.0-10.3 sec 19.6 MBytes 15.3 Mbits/sec

[ 6] 0.0-10.3 sec 19.7 MBytes 15.3 Mbits/sec

• Total throughput =3*15.3Mbits/s = 45.9Mbits/s

TCP port 5006 Max window size 3 parallel streams Remote host

Page 31: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

31

Active Measurement Projects• PingER – running at NIIT• AMP – coming soon to NIIT• One way delay:

– Surveyor (now defunct), RIPE (mainly Europe), owamp

• IEPM-BW – running at NIIT • NIMI (mainly a design infrastructure)• NWS (mainly for forecasting)• Skitter• All projects measure routes• For a detailed comparison see:

– www.slac.stanford.edu/comp/net/wan-mon/iepm-cf.html– www.slac.stanford.edu/grp/scs/net/proposals/infra-mon.html

Page 32: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

32

AMP• http://amp.nlanr.net/AMP/

– AMP uses dedicated PCs as monitors, ~ 150 (June, 2005)– Today mainly does pings– Oriented to Internet 2, ~ 10 countries– Does mainly full mesh pinging– Being re-written to provide support for more probes

Page 33: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

33

PingER• Measure the network performance for developing regions

– From developed to developing & vice versa– Between developing regions & within developing regions

• Use simple tool (PingER/ping)– Ping installed on all modern hosts, low traffic interference, – 21 pings each 30 mins to remote hosts (< 100bits/s average)

• Provides very useful measures• Originated in High Energy Physics, now focused on DD• Persistent (data goes back to 1995), interesting history

Monitoring siteRemote site

PingER coverage Feb 2005

Page 34: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

34

Examples:World ViewS.E. Europe, Russia: catching upLatin Am., Mid East, China: keeping upIndia, Africa: falling behind

C. Asia, Russia, S.E. Europe, L. America, M. East, China: 4-5 yrs behind

India, Africa: 7 yrs behind

Important for policy makers

Many institutes in developing world have less performance than a household in N. America or Europe

Page 35: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

35

Losses

• US residential Broadband users have better access than sites in many regions

Page 36: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

36

Loss to Africa (example of variability)

From PingER project

Page 37: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

37

Compare with TAI• UN Technology Achievement Index (TAI)

– Measures creation & diffusion of technology and building human skills

Note how bad Africa is

Page 38: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

38

E2E Troubleshooting• Solving the E2E performance problem is the critical

problem for the user– Improve e2e throughput for data intensive apps in high-

speed WANs– Provide ability to do performance analysis & fault

detection ins Grid computing environment– Provide accurate, detailed, & adaptive monitoring of all

distributed components including the network

Page 39: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

39

Anatomy of a Problem

Applications Developer

System Administrator

LAN Administrator

CampusNetworking

Gigapop Gigapop

Backbone

CampusNetworking

LAN Administrator

System Administrator

Applications Developer

How do you solvea problem along a path?

Hey, this is not

working right!

The computerIs working

OK

Talk to the other guys

Everything isAOK

No othercomplaints

The network is lightly loaded

All the lights are green

We don’t see anything wrong

Looks fine

Others are getting in ok

Not our problem

From an Internet2 E2E presentationby Russ Hobby

Page 40: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

40

Needs• Measurement tools to quickly, accurately and

automatically identify problems– Automatically take action to investigate and gather

information, on-demand measurements

• Standard ways to discover request and report results of measurements, for applications– GGF/NMWG schemas– Share information with people and apps across a

federation of measurement infrastructures

Page 41: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

41

Trouble shooting• Ping to localhost, ping to gateway & to remote host

– Use IP address to avoid nameserver problems– Look for connectivity, loss & RTT– May need to run for a long time to see some pathologies

(e.g. bursty loss dues to DSL loss of sync)– Use synack or sting if ICMP blocked

• Traceroute to remote host• Reverse traceroute from remote host to you• Ping routers along route• Look at history plots (PingER, AMP), when did

problem start, how big an effect is it?

Page 42: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

42

Trouble shooting• Try user application

• Iperf to test throughput

Page 43: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

43

Where is a host?• Name server lookup to find hostname given IP address

47cottrell@netflow:~>nslookup 210.56.16.10Server: localhostAddress: 127.0.0.1Name: lhr.comsats.net.pkAddress: 210.56.16.10

• Triangulate position based on RTT measurements made to unknown host from several hosts at known locations.

Page 44: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

44

Whereis a host• Do a Google search on IP address to location,

e.g.• http://www.geobytes.com/IpLocator.htm

Page 45: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

45

Hi-perf Challenges• Packet loss hard to measure by ping

– For 10% accuracy on BER 1/10^8 ~ 1 day at 1/sec– Ping loss ≠ TCP loss

• Iperf/GridFTP throughput at 10Gbits/s– To measure stable (congestion avoidance) state for 90% of test

takes ~ 60 secs ~ 75GBytes– Requires scheduling implies authentication etc.

• Using packet pair dispersion can use only few tens or hundreds of packets, however:– Timing granularity in host is hard (sub μsec)– NICs may buffer (e.g. coalesce interrupts. or TCP offload) so need

info from NIC or before

• Security: blocked ports, firewalls, keys vs. one time passwords, varying policies … etc.

Page 46: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

46

Dedicated Optical Circuits• Could be whole new playing field, today’s tools no

longer applicable:– No jitter (so packet pair dispersion no use)– Instrumented TCP stacks a la Web100 may not be

relevant– Layer 1 & 2 switches make traceroute less useful– Losses so low, ping not viable to measure– High speeds make some current techniques fail or more

difficult (timing, amounts of data etc.)

Page 47: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

47

More Information• Tutorial on monitoring

– www.slac.stanford.edu/comp/net/wan-mon/tutorial.html• RFC 2151 on Internet tools

– www.freesoft.org/CIE/RFC/Orig/rfc2151.txt• Network monitoring tools

– www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html• Ping

– http://www.ping127001.com/pingpage.htm• IEPM/PingER home site

– www-iepm.slac.stanford.edu/• IEEE Communications, May 2000, Vol 38, No 5,

pp 130-136

Page 48: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

48

Simplified SLAC DMZ Network, 2001

swh-root

Swh-dmz

rtr-msfc-dmz

slac-rt1.es.netESnet

StanfordInternet2

OC12 link622Mbps

155MbpsOC3 link(*)

SLAC Internal Network (*) Upgrade to OC12 has been requested

(#) This link will be replaced with a OC48 POS card for the 6500 when available

Etherchannel 4 gbps

10Mbps Ethernet

1Gbps Ethernet

NTON

2.4GbpsOC48 link

(#)

100Mbps Ethernet

Dial up &ISDN

Page 49: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

49

Flow lengths• Distribution of netflow lengths for SLAC border

– Log-log plots, linear trendline = power law– Netflow ties off flows after 30 minutes– TCP, UDP & ICMP “flows” are ~log-log linear for

longer (hundreds to 1500 seconds) flows (heavy-tails)– There are some peaks in TCP distributions, timeouts?

• Web server CGI script timeouts (300s), TCP connection establishment (default 75s), TIME_WAIT (default 240s), tcp_fin_wait (default 675s)

TCP UDP

ICMP

Page 50: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

50

Traceroute technical detailsRough traceroute algorithm

ttl=1; #To 1st router port=33434; #Starting UDP port

while we haven’t got UDP port unreachable {send UDP packet to host:port with ttlget response

if time exceeded note roundtrip timeelse if UDP port unreachable

quitprint outputttl++; port++

}• Can appear as a port scan

– SLAC gets about one complaint every 2 weeks.

Page 51: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

51

Time series

UDP

TCP

Outgoing IncomingCat 4000 802.1qvs. ISL

Page 52: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

52

Power law fit parameters by time

Just 2 parameters provide a reasonable description of the flowsize distributions

Page 53: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

53

Not your normal Internet site

Ames IXP: approximately 60-65% was HTTP, about 13% was NNTPUwisc: 34% HTTP, 24% FTP, 13% Napster

Page 54: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

54

PingER cont.• Monitor timestamps and sends ping to remote site at

regular intervals (typically about every 30 minutes)

• Remote site echoes the ping back

• Monitor notes current and send time and gets RTT

• Discussing installing monitor site in Pakistan– provide real experience of using techniques– get real measurements to set expectations, identify

problem areas, make recommendations– provide access to data for developing new analysis

techniques, for statisticians etc.

Page 55: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

55

PingER• Measurements from

– 38 monitors in 14 countries– Over 600 remote hosts– Over 120 countries – Over 3300 monitor-remote site pairs– Measurements go back to Jan-95– Reports on RTT, loss, reachability, jitter, reorders,

duplicates …

• Uses ubiquitous “ping” facility of TCP/IP • Countries monitored

– Contain over 80% of world population– 99% of online users of Internet

Page 56: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

56

Surveyor & RIPE, NIMI• Surveyor & RIPE use dedicated PCs with GPS

clocks for synchronization– Measure 1 way delays and losses– Surveyor mainly for Internet 2– RIPE mainly for European ISPs

• NIMI (National Internet Measurement Infrastructure) more of an infrastructure for measurements and some tools (I.e. currently does not have public available data,regularly updated)– Mainly full mesh measurements on demand

Page 57: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

57

Skitter• Makes ping & route measurements to tens of

thousands of sites around the world. Site selection varies based on web site hits.– Provide loss & RTTs– Skitter & PingER are main 2 sites to monitor

developing world.

Page 58: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

58

“Where is” a host – cont.• Find the Autonomous System (AS) administering

– Use reverse traceroute server with AS identification, e.g.:• www.slac.stanford.edu/cgi-bin/nph-traceroute.pl…14 lhr.comsats.net.pk (210.56.16.10) [AS7590 - COMSATS] 711 ms (ttl=242)

– Get contacts for ISPs (if know ISP or AS):• http://puck.nether.net/netops/nocs.cgi• Gives ISP name, web page, phone number, email, hours etc.

– Review list of AS's ordered by Upstream AS Adjacency• www.telstra.net/ops/bgp/bgp-as-upsstm.txt• Tells what AS is upstream of an ISP

– Look at real-time information about the global routing system from the perspectives of several different locations around the Internet

• Use route views at www.antc.uoregon.edu/route-views/

• Triangulate RTT measurements to unknown host from multiple places

Page 59: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

59

Who do you tell• Local network support people

• Internet Service Provider (ISP) usually done by local networker– Use puck.nether.net/netops/nocs.cgi to find ISP– Use www.telstra.net/ops/bgp/bgp-as-upsstm.txt to find

upstream ISPs

• Give them the ping and traceroute results

Page 60: 1 Internet Monitoring Les Cottrell – SLAC Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 Partially funded

60

Achieving throughput• User can’t achieve throughput available (Wizard gap)• Big step just to know what is achievable