IEPM-BW Deployment Experiences

IEPM-BW Deployment ExperiencesConnie LoggSLACJoint Techs Workshop February 4-9, 2006

Background• Originally conceived September 13, 2001 and

developed as an exhibit for SC2001 in November 2001 on Solaris

• Looked to be useful so development continued• After SC2001, it was installed on a Solaris host shared

with other applications. Other applications interfered. • Original configuration file was a set of perl commands

which defined the nodes, their configuration information, and the probes and parameters for each node. Very hard to understand, maintain, modify, and manage.

• Quick port to Linux and moved to its own host. Still used perl commands as configuration database.

Background - continued• As the development proceeded, it was obvious that

the configuration information for nodes and probes was no longer manageable. Enter the MYSQL data base.

• Whole package was redesigned using MySQL data base

• Node specifications, monitoring host specifications, probe specifications, plot specifications, path specifications, and data are all maintained in the MySQL data base.

Much more readable (web pages to display contents of monitoring configuration information), manageable and adaptable to changing needs and specifications

Conceptual Changes/ChallengesThe conception of what IEPM-BW should do and which probes it

should use has changed over time.

• Monitor with ping, traceroute, abwe• Added iperf• Added file transfers to the tests: bbcp, bbftp, gridftp –

Discontinued because:• Performance tracked iperf • Disk speed is the overriding factor in throughput• Monitoring and target hosts not likely to be equipped with high speed disks• Disk latency studies are important, but should not be part of IEPM-BW

• Added Pathload for available bandwidth measurements• Removed Pathload (suggestion that it was too intense)• Added Pathchirp• Added Thrulay to compare with Iperf• Added Pathload back in per suggestions from collaborators at

Ultralight Meeting

Currently• Removed abwe as it was too noisy and did not

work well on gigabit networks.• Evaluating Pathload vs Pathchirp and may

remove Pathchirp…more likely will not run it to all nodes, just ones for which it works well.

• May have to use different types of probes for different types of networks and distances between nodes – ONE SIZE DOES NOT FIT ALL

All probes, presentation, and analysis is evolving as we understand more about the networking environments…which are themselves evolving.

Analysis and PresentationAnalysis and data presentation ideas change

• Timeseries plots first plots we had• Of interest from the plot below

• Pathchirp not very good in some cases – reports > 1Gb thruput• Pathload more stable and probably accurate• RTT change in ping very clear - and seems to have no effect in

this case – but does in others – note that it correlated with traceroute change

Analysis and Presentation• Added diurnal analysis to look at it and how it might

be useful in event detection (bandwidth change) and possibly prediction

Analysis and Presentation• Scatterplots – useful for looking at correlations

• Cross-plots (Y axis: pathchirp & iperf) vs X-axis: Thrulay

Analysis and Presentation

• Added histograms to provide frequency distribution and CDF

Shows possible multimodal distribution of achievable thruput measurements via thrulay

But available bandwidth for the same node (by pathchirp) is stable

Analysis and Presentation

• Packet Loss

Traceroute Visualization

• One compact page per day• One row per host, one column per hour• One character per traceroute to indicate pathology or change

(usually period(.) = no change)• Identify unique routes with a number

• Be able to inspect the route associated with a route number• Provide for analysis of long term route evolutions

Route # at start of day, gives idea of route stability

Multiple route changes (due to GEANT), later restored to original route

Period (.) means no change

Event Detection (throughput drops)

• Must clearly define what you are looking for• How much change and in what time period• How to determine if it is time to alert again (don’t

want repeated alerts for same drop)

• Use the above to figure out how often you want to probe.

• Do not overprobe…try to establish necessary frequency, and if that does the job, that is enough

Implementation Challenges• Functions such as ping require different options and parsing on

different OSs.• When upgrading versions of the probe software, processing

code may need to be modified because of output format changes.

• Not only must upgrade monitoring host probe software, but also target host server versions

• Being able to track what is working and what is not working and troubleshooting when code performance changes for the worse.

• Automating distribution and maintenance• Which versions of gnuplot and drivers, MySQL and perl are

available? Do they meet our needs?• Keeping the servers alive (target kit)• Monitoring and target hosts losing disks or having the OSs

upgraded.• Maintaining proper TCP buffer sizes

Implementation Challenges

• Many probes have to be done in a synchronous fashion. Do not run iperf, thrulay, and pathload at the same time.

• Do not want to overload the network with probing activities – this constrains the number and frequency of probes that can be made

• Currently high impact probes are short (20 seconds or less) and code only allows at most one probe to run within a minute.

• If a process (probe, script, gnuplot, etc.) cannot hang…it will hang – Time everything out and watch for hangs so they can be automatically cleaned up.

Current Implementation

MySQL tables for all configuration information• NODES – contains node definitions and path

information for that node; all nodes, target and monitoring hosts are defined in this table

• MONHOST – monitoring host specific information and plotting spec for all the data

• TOOLSPECS – specification for each probe as well a plotting spec for the data and ‘last run’ field.

• PLOTSPECS – miscellaneous plotting specifications (scatterplots, timeseries plots, other plot types)


MySQL tables for data storage• ABWEDATA – being discontinued (first data table)• BWDATA – All bandwidth data is stored here contains

fields for:• RTT min, max, average, standard deviation• Thruput min, max, average, standard deviation, and final

throughput• Number of streams, windowsize• Text results from probe• Time of probe

Not all fields used for all data types


Tables for Traceroute data• ROUTENO – each route seen is given a unique

identifier(routeno), and the row contains srcnode, destnode, firstseen, lastseen, ip hop list

• ROUTEDATA – routeno (from ROUTENO table), text of traceroute output, number of hops, ip hop list, time of probe

• Historical route data may be interesting to analyze for route changes over time, but no one has had the time or interest to do it.

• NEW Coming Soon: ASN tables to store ASN info for hops – this is useful as it speeds up interactive drawing and display, and analysis of the traceroutes


SCHEDULE table holds the scheduling information for each probe, and tracks what state it is in. Each and every probe made (including ping) has a unique schedule ID which identifies the probe and all the parameters of the probe

Scheduler checks the TOOLSPEC table to ascertain what probes are due to be run and inserts them in the SCHEDULE table

Scheduled probes are only run if they are within the “current” time period. This prevents a large number of probes from being stacked up and flooding the network for a long time.

Trouble Shooting• Every script has a log file where it records

errors and performance information such as how long it took to make a pass.

These log files are rotated nightly, and kept for 7 days (easily changed)

• Hanging processes (probes) are a fact of life.• Timeout all probes• Create a cleanup script that looks for processes

which have been active longer than they should be and kills them

Troubleshooting

• Lingering tasks report – A report showing schedule probes that were not run is generated every day. This is important, as if there are many probes not being run in the nick of time, it may mean that too many are being scheduled to run or that there is a performance problem.

• Logging Report – A report showing the number of successful probes made, data base write failures, and other failure modes is generated. The info for this report is taken from the data logging log files.

Troubleshooting

• NETFLOW records are valuable tool• Code running fine for years• TCP orphan sockets messages crashed machine• Netflow records for some 20 second iperf probes

were lasting for > 1 minute (some 4 minutes)• Change in behavior from the past – were lasting 20-

25 seconds• Disabled iperf probes and system stabilized

• Now need to figure out what goes on with iperf probes…not all troublesome, just a few nodes

Performance Issues

When probes show degradation in network performance• Is it the network?• Is it the monitoring node? – JAVA very bad experience• Is it the target node?

Recommendation: • Have a local target host as a sanity check – also good to use

as a target host from other monitoring hosts• The monitoring hosts should be dedicated systems• Monitor monitoring host load with Lisa, Ganglia, Nagios,

APmon to MonALISA, etc.

Performance Issue Example – Bad JAVA Program

Caltech monitoring host as seen from iepm-bw@slac

CALTECH target host as seen from iepm-bw@slac

SLAC target host as seen from iepm-bw@slac

Problems• Node name disappears from DNS• Ports get suddenly blocked• Disks crash (lost the entire CALTECH data base –

backup was on same physical disk) – need separate physical disk for local backup

• Monitoring and target hosts get OS upgrades without warning • installed code disappears• Data bases get zappedWe are now working on backing up data bases and source

code configuration information to SLAC once a day.• Utility packages (gnuplot, for example) get silently

upgradedDiscussion about distributing our own

Future Directions

• Automate installation and configuration process• Manage code with CVS and distribute via pacman

cache • Deploy IEPM-BW for LHC monitoring – see if it is

useful and/or relevant – if so, it can be expanded and developed to meet changing needs

• Upload monitoring data and alerts to MonALISA• Implement OWAMP and BWCTL• Look at Pathneck • Implement min and max (maybe also average) RTT

analysis and integrate it with other change analysis

Summary• Are you monitoring to determine problems or monitoring for forecasting?

They are very different but can both be done with same monitoring • With respect to real disk to disk transfers – the disk latency is the

overwhelming factor. The monitoring can tell you how the network is performing, but this is not necessarily related to application performance.

• Bearing this in mind, I do not think we need to perform disk to disk transfers with the monitoring systems or intensive network testing

• Be prepared to be flexible in your architecture. Networks themselves are constantly evolving and so the probes, analysis, and presentation must also evolve.

What would I have done differently along the way?In hindsight, not a lot. It has been a constant process of learning. The code

adapted fairly well to the research we needed to do – Remember it started as an exhibit for SC2001 and has been a research and learning tool since then.

More manpower would have been very useful and if it had been available, the code, package structure and the documentation would be more professional, and the change analysis and prediction/forecasting would be more complete.

References:• http://www-iepm.slac.stanford.edu/• http://www.slac.stanford.edu/comp/net/iepm-bw.slac.stanford.e

du/slac_wan_bw_tests.html• Papers/web pages on web100, netflow, and active

measurement correlation: • http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-9641.pdf• http://www.slac.stanford.edu/comp/net/bandwidth-tests/web100/

• Recommended monitoring and target host configurations• IEPM-BW Installation and PLM (being updated and reorganized)

Contributors: Les Cottrell, Jerrod Williams, Mahesh Chhaparia, I-Heng Mei, Manish Bhargava, Jiri Navratil, Yee Ting-Li, all at

SLAC now or in the past; Maxim Grigoriev(FNAL), and developers of the probes we use.

QUESTIONS?

Extra Slides

Installation Challenges - perl

• Perl is not located on all machines in the same place.

• Tried to settle on /usr/bin/perl

• Needed to install various perl modules

• Some conflict with already installed modules

Haven’t done this but one possibility is to have a private version of perl which has been configured for this application.

Installation Challenges - MySQL• Is MySQL already installed? • What version? 3.x.y is not compatible with 4.x.y and

5.x.y – may have to install it• Used RPMs. Some releases of RPMs were buggy

(5.0.16 vs 5.0.18). Install from source?• Need to install perl bundle for MySQL. Where to install

it (/usr/local/bin/perl or /usr/bin/perl or ?)• Could not install it in /usr/local/bin/perl at SLAC

because it had an old version and was in AFS.• Selected /usr/bin/perl.• Installation of MySQL and perl drivers requires

attention to the details and order is important

Documents

IEPM-BW Deployment Experiences