Upload
rafe-johnston
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Case Study: Debugging Multicast Problems from an Applications
Perspective
Case Study: Debugging Multicast Problems from an Applications
Perspective
Steven Senger, Ph.D.
Dept. of Computer Science
University of Wisconsin - La Crosse
HAVnet ProjectHAVnet Project
• Parvati Dev, PI, Stanford SUMMIT• National Library of Medicine, NGI & SII
programs since 1999.• Applications of high-performance networks to
anatomical and surgical education.• http://havnet.stanford.edu• http://visu.uwlax.edu
Immersive SegmentationImmersive Segmentation
Remote Stereo ViewerRemote Stereo Viewer
Nomadic Anatomy ViewerNomadic Anatomy Viewer
Other Apps and ComponentsOther Apps and Components
• Information Channels– Multicast based announcement/discovery
mechanism.– Supports other app requirements such as
logging.• Access Grid
TestbedTestbed
Network/App MonitoringNetwork/App Monitoring
Potholes Along the WayPotholes Along the Way
• Stanford / CENIC– Multicast setup delay
• WiscNet– Conflict between sender and receiver
• Michigan / Merit– Multicast setup delay– Inbound flow stops after 209 secs
Stanford / CENIC …Stanford / CENIC …
• Longstanding problem (observed in ‘01).• Large delays (~15 min) in multicast setup.• Stanford / La Crosse / NLM
– Significant delays except for La Crosse / NLM
• Originally thought to be at Stanford Border and RP.
• 04 hardware/ios upgrades at Stanford.• Situation improved.
Stanford / CENIC …Stanford / CENIC …
• Only Michigan to Stanford delayed, ~6 mins. • Oct 04, Phone calls, Stanford, CENIC,
Vendor support, La Crosse. Escalate through 3 layers of vendor support.
• Test/Debug every couple of weeks through March ‘05.
• Identified as MSDP propagation delay related to encap/unencap data received by MSDP.
Stanford / CENICStanford / CENIC
• Delay occurred at each CENIC router. • At some point problem had been internally
found and resolved by vendor.• Solution: upgrade OS on CENIC routers.
La Crosse / WiscNet …La Crosse / WiscNet …
• First observed spring 05 using AccessGrid.• La Crosse sender and Stanford receiver OK.• Starting a La Crosse receiver breaks the flow.• WiscNet identified problem router.• Vendor support engaged.• Discovered rpd restart sufficient to fix.• Reoccurs every 2 months.
La Crosse / WiscNet …La Crosse / WiscNet …
• When failing– Upstream interface on router gets set to
unreasonable value.– Sender continues to send data in
encapsulated PIM-register messages.– Router never sends register-stop
messages.
La Crosse / WiscNetLa Crosse / WiscNet
• Problem has survived router chassis upgrade. • No solution as yet.
U. Michigan / Merit …U. Michigan / Merit …
• Discovered after CENIC problem solved.• Small delay in setup for Michigan to Stanford.• Varies between 0 and 60 sec.• Similar behavior for Milwaukee to Stanford.• Does not appear to be in CENIC?
U. Michigan / Merit …U. Michigan / Merit …
• Presence of other receivers seems to change the setup delay.
• Merit engaged in isolating problem. • No solution as yet.
U. Michigan / MeritU. Michigan / Merit
• Discovered Jan ‘06 using AccessGrid.• Traffic from Stanford to MCBI/Merit starts
correctly but stops after 208 seconds. • When stopped IPLSng shows as pruned.• Merit identified problem with a switch in
Chicago not allowing streams to setup correctly.
• Problem resolved with OS upgrade.
Diagnostic HelpDiagnostic Help
• Debugging strategies• Tools• Monitoring