LJorgensonPvanEpp.ppt

Optimum Performance,

Maximum Insight:

Behind the Scenes with Network Measurement Tools

Peter Van Epp Network DirectorSimon Fraser University

Loki JorgensonChief Scientist

ConferenceApril 17-18, 2007

Overview

Network measurement, performance analysis and troubleshooting are critical elements of effective network management.

Recommended tools, methodologies, and practices with a bit of hands-on

Overview Quick-start hands-on Elements of Network Performance Realities: Industry and Campus Contexts Methodologies Tools Demos Q&A

Troubleshooting the LAN:NDT (Public Domain) Preview - NDT

Source - http://e2epi.internet2.edu/ndt/ Local server - http://ndtbby.ucs.sfu.ca:7123

http://192.75.244.191:7123 http://142.58.200.253:7123

Local instructions – http://XXX.XXX

Troubleshooting the LAN:AppCritical (Commercial) Preview - AppCritical

Source - http://apparentNetworks.com Local server - http://XXX.XXXX.XXX Local instructions –

http://XXX.XXX Login: “guest” , “bcnet2007” Downloads

User Interface

Download User Interface Install

Start and login (see above)

INTRO

Network Performance Measurement

How big? How long? How much? Quantification and characterization

Troubleshooting Where is the problem? What is causing it? Diagnosis and remediation

Optimization What is the limiter? What application affected? Design analysis and planning

“Functional” vs. “Dysfunctional” Functional networks operate as spec’d

Consistent with Only problem is congestion “Bandwidth” is the answer (or QoS)

Dysfunctional networks operate otherwise “Broken” but ping works Does not meet application requirements Bandwidth and QoS will NOT help

Causes of Degradation Five categories of degradation:

exceeds specification Insufficient capacity

diverges from design Failed over to T1; auto-negotiate selects half-duplex

presents dysfunction EM interference on cable

includes devices and interfaces that are mis-configured

Duplex mismatch manifests emergent features

Extreme burstiness on high capacity links; TCP

STATS AND EXPERIENCE

Trillions of Dollars Global Annual Spend on telecom = $2 Trillion

Network/Systems Mgmt = $10 Billion

82% of network problems identified by end users complaining about application performance (Network World)

38% of 20,000 helpdesk tests showed network issues impacting application performance (Apparent Networks)

78% of network problems are beyond our control (TELUS)

50% of network alerts are false positives (Netuitive)

85% of networks are not ready for VOIP (Gartner 2004)

60% of IT problems are due to human error (Networking/CompTIA 2006)

Real World Customer Feedback Based on survey of 20,000 customer tests,

serious network issue 38% of the time 20% of networks have bad NIC card drivers 29% of devices have packet loss, caused by:

50% high utilization 20% duplex conflicts 11% rate limiting behaviors 8% media errors 8% firewall issues

Last Mile Last 100m LAN

Workstations Office environment Servers

WAN Leased lines Limited capacities

Service providers / core networks

METHODOLOGIES

Real examples from the SFU network 2 links out - one to CA*net4 at 1G usually empty 100M commodity link heavily loaded

(typically 6 times the volume of the C4 link) Physics grad student doing something data

intensive to a grid site in Taiwan

first indication total saturation of commodity link

Argus pointed at the grid transfer as symptom routing problem as the cause

Real examples (cont.) problem of asymmetric route

12:45:52 tcp taiwan_ip.port -> sfu_ip.port 809 0 1224826 0

^ ^ ^ ^ packets in out bytes in out

reported the problem to Canarie NOC who quickly got it fixed

user's throughput much increased, commodity link less saturated!

Use of NDT might have increased stress !

Network Life Cycle (NLC) Network life cycle

Business case Requirements Request for Proposal Planning Staging Deployment Operation Review

Planning

Requirements

Request

For

Proposal

Business

Case

Review

Staging

OperationOperation

Deployment

Network

Life Cycle

NLC: Staging/Deployment Two hosts with a cross over cable

insure the end points work.

Move one segment closer to the end (testing each time) Not easy to do if sites are geographically/politically distinct

Establish connectivity to the end points

Tune for required throughput One of multiple possible points of failure – lack of visibility

Tools (even very disruptive tools) can help by stressing the network Localize and characterize

NLC: Staging/Deployment (cont.) Various bits of hardware (typically network cards) and

software (the IP stack) flaws default configurations that are inappropriate for very high

throughput networks. Careful what you buy

(cheapest is not best and may be disastrous) optical is much better, but also much more expensive than

copper)

Tune the IP stack for high performance

If possible try whatever you want to buy in a similar to environment (RFP/Staging)

Staging won't guarantee anything something unexpected will always bite you.

NLC: Operation Easier if that the network was known to work at

implementation

Probably disrupting work so pressure is higher may not be able to use the disruptive tools may be occurring at a time when the staff unavailable

Support user (e.g. NDT) researcher can point the web browser on their machine

at an ndt server save (even if they don't understand) the results for a

network person to look at and comment on later

NLC: Operation (cont.) automated monitoring / data collection

can be very expensive to implement someone must eventually interpret it

consider issues/costs when applying for funding

passive continuous monitor on the network can make your life (and success) much easier

multiple lightpath endpoints or dynamically routed network can be challenging issues may be (or appear to be) intermittant due to changes happen automatically – can be

maddening.

NLC Dependencies

Planning

Requirements

Request

For

Proposal

Business

Case

Review

Staging

Operation

Deployment

0

5

10

15

20

25

30

35

40

Biz

Case

Req's

R.F

.P.

Pla

nnin

g

Sta

gin

g

Dep

loy

Ops/

Main

Revi

ew

Time

Cost

Risk

METHODOLOGIES:Measurement

Visibility Basic problem is lack of visibility at the

network level Performance “depends”

Application type End-user / task Benchmarks

Healthy networks have design limits Broken networks are everything else

Measurement Methodologies Device-centric (NMS)

SNMP RTCP/XR / NetCONF

E.g. HP OpenView

Network behaviors Passive

Flow-based - e.g. Cisco NetFlow Packet-based – e.g. Network General “Sniffer”

Active Flooding – e.g. AdTech AX/4000 Probing – e.g. AppCritical

E2E Measurement Challenges Layer 1

Optical / light paths Wireless

Layer 2 MPLS Ethernet switch fabric Wireless

Layer 3 Layer 4

TCP Layer 5

Federation

Existing Observatory Capabilities

One way latency, jitter, loss IPv4 and IPv6 (“owamp”)

Regular TCP/UDP throughput tests – ~1 Gbps IPv4 and IPv6; On-demand available (“bwctl”)

SNMP Octets, packets, errors; collected 1/min

Flow data Addresses anonymized by 0-ing the low order 11 bits

Routing updates Both IGP and BGP - Measurement device participates in both

Router configuration Visible Backbone – Collect 1/hr from all routers

Dynamic updates Syslog; also alarm generation (~nagios); polling via router proxy

Observatory Databases – Datа Types

Data is collected locally and stored in distributed databasesDatabases

Usage DataNetflow Data Routing Data Latency Data Throughput Data Router Data Syslog Data

29

GARR User Interface

METHODOLOGIES:Troubleshooting

Challenges to Troubleshooting Need resolution quickly Operational networks May not be able to instrument everywhere Often relies on expert engineers Does not work across 3rd party networks Authorization/access Converged networks Application-specific symptoms End-user driven

HPC Networks Three potential problem sources

user site to edge (x 2) core network

Quickly eliminate as many of these as possible binary search

Easiest during implementation phase

Ideally - 2 boxes at the same site and move them one link at a time Often impractical – deploy and pray (and troubleshoot)

HPC Networks (cont.) Major difference between dedicated

lightpaths and a shared network

Lightpath: end to end test iperf/netperf on loopback this is likely too disruptive on shared network DANGEROUS

Alternately, NDT to local server to isolate Recommended to have at least mid-path ping!

HPC Networks (cont.) Shared: see if other users have problems

If no core problem, not common If core, outside agencies involved

Start trouble shooting both end user segments in parallel

Preventive measures support user runnable diagnostics ping and owamp - low impact monitoring

E2EPI Problem Statement:“The Network is Broken”

How the can user self-diagnosis first mile problems without being a network expert?

How can the user do partial path decomposition across multiple administrative domains?

StrategyMost problems are local…Test ahead of time!Is there connectivity & reasonable latency? (ping -> OWAMP)Is routing reasonable (traceroute)Is host reasonable (NDT; Web100)Is path reasonable (iperf -> BWCTL)

What Are The Problems?

TCP: lack of buffer spaceForces protocol into stop-and-waitNumber one TCP-related performance problem.70ms * 1Gbps = 70*10^6 bits, or 8.4MB70ms * 100Mbps = 855KBMany stacks default to 64KB, or 7.4Mbps

What Are The Problems?

Video/Audio: lack of buffer spaceMakes broadcast streams very sensitive to previous problems

Application behaviorsStop-and-wait behavior; Can’t streamLack of robustness to network anomalies

The Usual SuspectsHost configuration errors (TCP buffers)Duplex mismatch (Ethernet)Wiring/Fiber problemBad equipmentBad routingCongestion

“Real” trafficUnnecessary traffic (broadcasts, multicast, denial of

service attacks)

Typical Sources ofPerformance Degradation

Half/Full-Duplex Conflicts

Poorly Performing NICs

MTU Conflicts

Bandwidth Bottlenecks

Rate-Limiting Queues

Media Errors

Overlong Half-duplex

High Latency

Self-DiagnosisFind a measurement server “near me”.Detect common tests in first mile.Don’t need to be a network engineer.Instead of:

“The network is broken.”Hoped for result:

“I don’t know what I’m talking about, but I think I have a duplex mismatch problem.”

Partial Path Decomposition

Identify end-to-end path.Discover measurement nodes “near to” and “representative of” hops along the route.

Authenticate to multiple measurement domains (locally-defined policies).

Initiate tests between remote hosts.See test data for already run tests. (Future)

Partial Path Decomposition

Instead of:“Can you give me an account on your machine?”“Can you set up and leave up and Iperf server?”“Can you get up at 2 AM to start up Iperf?”“Can you make up a policy on the fly for just me?”

Hoped for result:Regular means of authenticationMeasurement peering agreementsNo chance of polluted test resultsRegular and consistent policy for access and limits

METHODOLOGIES:Application Performance Network Dependent Vendors Applications groups (e.g. VoIP) Field engineers Industry focused on QoE

Simplified Three Layer Model

Network Behaviors

User Experience

Application Behaviors

OSILayer

Description

7 Application6 Presentation5 Session4 Transport3 Network2 Data Link1 Physical

New Layer Model

OSILayer

Description

7 Application6 Presentation5 Session4 Transport3 Network2 Data Link1 Physical

User Experience

App Behaviors

Network Behaviors

App-to-Net Coupling

Codec Dynamics Requirements

Application Model

Outcomes

Loss Jitter Latency

E-Model Mapping: R MOS

E-model generated “R-value” (0-100)

- maps to well-known MOS score

R -value range

speech transmissionquality category

90 - 100 best

80 - 90 high

70 - 80 medium

60 - 70 low

0 - 60 * (very) poor

MOS (QoE)

E-ModelAnalysis

Application Behaviors

Network Behaviors

Application Models

Coupling the Layers

test/monitorfor

QoE

network requirements

(QoS/SLA)

User / Task / Process

METHODOLOGIES:Optimization

Network visibility

End-to-end visibility

App

-to-

net

coup

ling

End-to-end network path

Iterating to Performance

Wizard Gap

Reprinted with permission (Matt Mathis, PSC)http://www.psc.edu/~mathis/

Wizard Gap

Working definition:

Ratio of effective network performance attained by an average user to that attainable by a network wizard….

Fix the Network FirstThree Steps to Performance

1. Clean the networka) Pre-deploymentb) Monitoring

2. Model traffica) Application requirements for QoS/SLAb) Monitoring for application performance

3. Deploy QoS

Lessons Learned

Guy Almes, chief engineer Abilene“The general consensus is that it's easier

to fix a performance problem by host tuning and healthy provisioning rather than reserving. But it's understood that this may change over time. [...] For example, of the many performance problems being reported by users, very few are problems that would have been solved by QoS if we'd have had it.”

Tools CAIDA Tools (Public)

http://www.caida.org/tools/ Taxonomies

Topology Workload Performance Routing Multicast

Recommended (Public) Tools MRTG (SNMP-based router stats) iPerf / NetPerf (active stress testing) Ethereal/WireShark (passive sniffing) NDT (TCP/UDP e2e active probing) Argus (Flow-based traffic monitoring) perfSonar (test/monitor infrastructure)

Including OWAMP, BWCTL(iPerf), etc.

Tools: OWAMP/BWCTL OWAMP: one way active measurement protocol

Ping by any other name would smell as sweet

depends on stratum 1 time server at both ends

allows finding one way latency problems

BWCTL: Bandwidth control management front end to iperf

prevent disruption of the network with iperf

Tools: BWCTLTypical constraints to running “iperf”

Need software on all test systemsNeed permissions on all systems involved (usually full shell accounts*)Need to coordinate testing with others *Need to run software on both sides with specified test parameters *

(* BWCTL was designed to help with these)

Tools: ARGUS http://www.qosient.com/argus

open source IP auditing tool

entirely passive

operates from network taps

network accounting down to the port level

Traffic Summary from ArgusFrom: Wed Aug 25 5:59:00 2004 To: Thu Aug 26 5:59:00 2004

18,972,261,362 Total 10,057,240,289 Out 8,915,021,073 In

aaa.bb.cc.ddd 6,064,683,683 Tot 5,009,199,711 Out 1,055,483,972 In

ww.www.ww.www 1,490,107,096 1,396,534,031 93,573,065ww.www.ww.www:11003 1,490,107,096 1,396,534,031 93,573,065

xx.xx.xx.xxx 574,727,508 548,101,513 26,625,995xx.xx.xx.xxx:6885 574,727,508 548,101,513 26,625,995

yy.yyy.yyy.yyy 545,320,698 519,392,671 25,928,027yy.yyy.yyy.yyy:6884 545,320,698 519,392,671 25,928,027

zzz.zzz.zz.zzz 428,146,146 414,054,598 14,091,548zzz.zzz.zz.zzz:6890 428,146,146 414,054,598 14,091,548

Tools: ARGUS using ARGUS to identify retransmission type

problems. compare total packet size to application data size

full (complete packet including IP headers)12:59:06 d tcp tcp sfu_ip.port -> taiwan_ip.port 9217 18455 497718

27940870

app (application data bytes delivered to the user)

12:59:06 d tcp tcp sfu_ip.port -> taiwan_ip.port 9217 18455 0 26944300

data transfer one way acks back have no user data

Tools: ARGUS compare to misconfigured IP stack

full:

15:27:38 * tcp outside_ip.port -> sfu_ip.port 967 964 65885 119588

app:

15:27:38 * tcp outside_ip.port -> sfu_ip.port 967 964 2051 55952

retransmit rate is constantly above %50 poor throughput

this should (and did) set off alarm bells

Tools: NDT (Many thanks to Lixin Liu) Test 1: 50% signal on 802.11G

WEB100 Enabled Statistics:Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done

checking for firewalls . . . . . . . . . . . . . . . . . . . Done running 10s outbound test (client-to-server [C2S]) . . . . . 12.00Mb/s running 10s inbound test (server-to-client [S2C]) . . . . . . 13.90Mb/s

------ Client System Details ------OS data: Name =3D Windows XP, Architecture =3D x86,

Version =3D 5.1 Java data: Vendor =3D Sun Microsystems Inc., Version =3D 1.5.0_11

Tools: NDT------ Web100 Detailed Analysis ------

45 Mbps T3/DS3 link found.Link set to Full Duplex modeNo network congestion discovered.Good network cable(s) foundNormal duplex operation found.

Web100 reports the Round trip time =3D 13.09 msec; the Packet size =3D 1460= Bytes; and =

There were 63 packets retransmitted, 447 duplicate acks received, and 0 SAC= K blocks received The connection was idle 0 seconds (0%) of the time C2S throughput test: Packet queuing detected: 0.10% S2C throughput test: Packet queuing detected: 22.81% This connection is receiver limited 3.88% of the time.

This connection is network limited 95.87% of the time.

Web100 reports TCP negotiated the optional Performance Settings to: =

RFC 2018 Selective Acknowledgment: OFFRFC 896 Nagle Algorithm: ONRFC 3168 Explicit Congestion Notification: OFF RFC 1323 Time Stamping: OFF RFC 1323

Window Scaling: ON

Tools: NDTServer 'sniffer.ucs.sfu.ca' is not behind a firewall. [Connection to the ep= hemeral port

was successful] Client is not behind a firewall. [Connection to the ephemeral port was succ= essful] Packet size is preserved End-to-End Server IP addresses are preserved End-to-End Client IP addresses are preserved End-to-End

... (lots of web100 stats removed!)aspd: 0.00000CWND-Limited: 4449.30

The theoretical network limit is 23.74 Mbps The NDT server has a 8192.0 KByte buffer which limits the throughput to 977=

6.96 MbpsYour PC/Workstation has a 63.0 KByte buffer which limits the throughput to =38.19 MbpsThe network based flow control limits the throughput to 38.29 Mbps

Client Data reports link is 'T3', Client Acks report link is 'T3'Server Data reports link is 'OC-48', Server Acks report link is 'OC-12'

Tools: NetPerf netperf on the same link.

available throughput less than max

liu@CLM ~$ netperf -l 60 -H sniffer.ucs.sfu.ca -- -s 1048576 -S 1048576 -m

1048576 TCP STREAM TEST from CLM (0.0.0.0) port 0 AF_INET to sniffer.ucs.sfu.ca (14= 2.58.200.252) port 0 AF_INET

Recv Send SendSocket Socket Message ElapsedSize Size Size Time Throughputbytes bytes bytes secs. 10^6bits/sec

2097152 1048576 1048576 60.10 9.91

(second run)

2097152 1048576 1048576 61.52 5.32

Tools: NDT Test 3: 80% on 802.11A

WEB100 Enabled Statistics:Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done checking for

firewalls . . . . . . . . . . . . . . . . . . . Done running 10s outbound test (client-to-server [C2S]) . . . . . 20.35Mb/s running 10s inbound test (server-to-client [S2C]) . . . . . . 20.61Mb/s

...The theoretical network limit is 26.7 Mbps The NDT server has a 8192.0 KByte

buffer which limits the throughput to 993= 4.80 Mbps Your PC/Workstation has a 63.0 KByte buffer which limits the throughput to = 38.80 Mbps The network based flow control limits the throughput to 38.90 Mbps

Client Data reports link is 'T3', Client Acks report link is 'T3'Server Data reports link is 'OC-48', Server Acks report link is 'OC-12'

Tools: NetPerfiu@CLM ~$ netperf -l 60 -H sniffer.ucs.sfu.ca -- -s 1048576 -S 1048576 -m

1048576 TCP STREAM TEST from CLM (0.0.0.0) port 0 AF_INET to sniffer.ucs.sfu.ca (14= 2.58.

200.252) port 0 AF_INETRecv Send SendSocket Socket Message ElapsedSize Size Size Time Throughputbytes bytes bytes secs. 10^6bits/sec

2097152 1048576 1048576 60.25 21.86

No one else using wireless on A (i.e. the case on a lightpath) NetPerf gets full throughput unlike the G case

Tools: perfSONAR Performance Middleware

perfSONAR is an international consortium in which Internet2 is a founder and leading participant

perfSONAR is a set of protocol standards for interoperability between measurement and monitoring systems

perfSONAR is a set of open source web services that can be mixed-and-matched and extended to create a performance monitoring framework

Design Goals: Standards-based Modular Decentralized Locally controlled Open Source Extensible

perfSONAR Integrates

•Network measurement tools•Network measurement archives•Discovery•Authentication and authorization•Data manipulation•Resource protection•Topology

Performance Measurement: Project Phases

Phase 1: Tool Beacons (Today)BWCTL (Complete), http://e2epi.internet2.edu/bwctlOWAMP (Complete), http://e2epi.internet2.edu/owampNDT (Complete), http://e2epi.internet2.edu/ndt

Phase 2: Measurement Domain SupportGeneral Measurement Infrastructure (Prototype in Progress)Abilene Measurement Infrastructure Deployment

(Complete), http://abilene.internet2.edu/observatoryPhase 3: Federation Support (Future)

AA (Prototype – optional AES key, policy file, limits file)Discovery (Measurement Nodes, Databases) (Prototype –

nearest NDT server, web page)Test Request/Response Schema Support (Prototype – GGF

NMWG Schema)

ImplementationApplications

bwctld daemonbwctl client

Built upon protocol abstraction librarySupports one-off applicationsAllows authentication/policy hooks to be incorporated

LIVE DEMOS NDT

AppCritical

Q&A

Outline (REMOVE) Set the stage – how bad is it (stats)

Some stats from industry and SFU What kinds of problems are typical

Overview of contexts LAN and campus Core networks including MPLS and optical Wireless

Methodologies – sniffing, flows, synthetic traffic, active probing

Recommended tools with examples and demo

Breakdown of Presentation (REMOVE) Intro and overview (both) Quick demos (both) Stats and experience

Industry stats (Loki) Campus experience (Peter)

Problem types Seven Deadly Sins (Loki) SFU/BCnet/CANARIE idiosyncrasies (Peter)

Context overview (Peter) Methodologies overview (Loki) Tools lists and recommended tools Demos

Application Ecology

Paraphrasing ITU categories Real-time

Jitter sensitive Voice, video, collaborative

Synchronous/transactional Response time (RTT) sensitive Database, remote control

Data Bandwidth sensitive Transfer, backup/recover

Best-effort Not sensitive

Documents

LJorgensonPvanEpp.ppt