Sting: a TCP-based Network Measurement Tool
Stefan Savage
Jianxuan Xu
Measurement & Analysis
The Internet is supremely hard to measure– VERY heterogeneous – VERY large – Heisenberg effects
The Heisenberg effect describes a system in which the observation or measurement of an event changes the event.
Still… lots of efforts to measure and understand traffic dynamics, routing, user characteristics, etc…
Understanding wide-area network characteristics is critical for evaluating the performance of Internet applications.
Measurement & Analysis
ICMP-based tools (e.g. ping,traceroute)
--Can’t measure one-way loss
Measurement infrastructures (e.g. NIMI)
--Require cooperation from remote endpoints
Features
Measures one-way packet loss rates
TCP-based measurement traffic (not filtered)
Only uses the TCP algorithm
Target only needs to run a TCP service, such as a web server, Does not require remote cooperation
Basic approach
Send selected TCP packets to remote host
Analyze TCP behavior to deduce which packets were lost in each direction
Deducing losses in a TCP transfer
What we know
How many data packets we sent
How many acknowledgements we received What we need to know
How many data packets were received?
Remote host’s TCP MUST know
How many acknowledgements were sent?
Easy, if one ACK is sent for each data packet (ACK parity)
How TCP reveals packet loss
Data packets ordered by seq#
ACK packets specify next seq# expected
Basic loss deduction algorithm
Forward Loss Data Seeding:
– Source sends in-sequence TCP data packets to target, each of which will be a loss sample
Hole-filling:– Sends TCP data packet with sequence number one greater
than the last seeding packet– If target ACKs this new packet, no loss– Else, each ACK indicates missing packets– Should be reliable, the retransmissions must be made in
Hole-filling
Data Seeding phase
for i = 1 to n for each ack received
send packet w/seq #i ackReceived++
dataSent++
wait for long time
Hole Filling Phase
lastACK := 0 for each ack received w/ack #j
while lastAck = 0 lastAck = MAX(lastAck,j)
send packet w/seq # n+1
while lastAck < n + 1
dataLlost++
retransPkt := lastAck
while lastAck = retransPkt
send packet w/seq# retransPkt
dataReceveid := dataSent – dataLost
ackSent := dataReceived
Example
Basic loss deduction algorithm
Reverse Loss Data Seeding:
– Skip first sequence number, ensuring out-of-sequence data (Fast Retransmit)
– Receiver will immediately acknowledge each data packet received
– Measure lost ACKs
Hole-filling:– Transmit first sequence number– Continue as before
Guaranteeing ACK parity
How do we know one ACK is sent for each data packet received?
Exploit TCP’s fast retransmit algorithm
TCP must send an immediate ACK for each out-of-order packet it receives
Send all data packets out-of-order
Skip first sequence number
Don’t count first “hole” in hole filling phase
Sending Large Bursts
•Large packets can overflow receiver buffer•Mitigate by overlapping sequence numbers
Delaying connection termination
Some Web servers/firewalls terminate connections abruptly by sending RST
Solutions:
Format data packets as valid HTTP request Set advertised receiver window to 0 bytes
Sting implementation details
Raw sockets to send TCP datagrams
Packet filter (libpcap) to get responses
Currently runs on Tru64 and FreeBSD
Last-generation user interface
# sting –c 100 –f poisson –m 0.500 –p 80 www.audiofind.com
Source = 128.95.2.93
Target = 207.138.37.3:80
dataSent = 100
dataReceived = 98
acksSent = 98
acksReceived = 97
Forward drop rate = 0.020000
Reverse drop rate = 0.010204
Forward Loss Results
Reverse Loss Results
“ Popular” Web Servers
Random Web Servers
Result
Loss rates increase during business hours, and then decrease
Forward and reverse loss rates vary independently On average, with popular web servers, the reverse
loss rate is more than 10 times greater than the forward loss rate
Conclusions TCP protocol features can be leveraged for non-
standard purposes
Packet loss is highly asymmetric
Ongoing work:
Using TCP to estimate one-way queuing delays, bottleneck bandwidths, propagation delay and server load
Useful or Useless
Purpose of the Network Measurement–Diagnose current problem–Design future service
Real Time data needed for Network Control
Data sample–Event driven–fixed Interval
Research Goal
Implement new TCP congestion control algorithm with fuzzy logic control.
Develop, test and debug it in Linux
Performance Evaluation
Traditional protocol hacking
Directly modify the kernel source
Migrate protocol stack and related stuff to user space
Simulate the algorithm with NS-2
Kernel Hacking
Insert and modify the algorithm in kernel source directly
Example–Vegas, Westwood+ and BIC implementation within
Linux kernel before version 2.6.13
Kernel Hacking
Pros–Welcome to the Real World–Less overhead
Cons–Not easy to develop, trace, debug and
maintenance–Incompatible with difference kernel version
User space migration
Move all protocol stack and related stuff to user space
Can gain the total control of variable status
Example–Sting
User space migration
Pros–High flexibility in protocol hacking–Can use general debug method tools, e.g. gdb
Cons–A large and thorny project for migrating protocol
stack to user space–Incompatible with difference kernel version–Large overhead
Simulation
Algorithm is implemented base on a virtual testbed
Virtual experiment can be held easily
Usually use NS-2 as simulator
E.g. Research of FAST TCP,HighSpeed TCP
Simulation
Pros–Quick implementation of algorithm –Low cost in experiment–Easy in data statistic
Cons–Result is too idealistic–Need further development for final product
Traditional methods are not suitable
Source code modification and user space migration required a well understanding of kernel architecture
NS-2 is not as realistic as testing on top of PlanetLab
All of them are kernel version dependent
My new approach
Combine the use of pluggable congestion control algorithm and Kernel Hacking
Implementation of new control algorithm within a single kernel module
Pluggable congestion control module
Starting from version 2.6.13, a new method of TCP congestion control hacking was published
New algorithms can be written as modules file, insert to kernel during run time as like as general I/O drivers
BIC,Cubic, HighSpeed, H-TCP, Hybla, Scalable, Vegas and Westwood+ are all implemented as module already
Pluggable congestion control module
A congestion control mechanism can be registered through function in tcp_cong.c
The functions used by the congestion control mechanism are registered via passing a tcp_congestion_ops struct to tcp_register_congestion_control.
As a minimum name, ssthresh, cong_avoid and min_cwnd must be valid.
Pluggable congestion control module
The method that is used to determine which congestion control mechanism is determined by the sysctl net.ipv4.tcp_congestion_contrl.
The default congestion control will be the last one registered (LIFO)
newReno will be built as build-in supporting and always available
A particular default value can be set by using sysctl
Pluggable congestion control module
tcp_congestion_ops sturct provide the below function entry points:–init–release–ssthresh–min_cwnd–cong_avoid–rtt_sample
–set_state–cwnd_event–undo_cwnd–pkts_acked–get_info
Pluggable congestion control module
All algorithm related code are packed within a single module file
A standardized framework can be followedCodes required for implement an algorithm are greatly reduced. For example, newReno uses 77 lines where BIC uses 335 lines
The module will be compatible unless the framework changes
Kernel Hacking Still Needed
Raw, Accurate, Real time Data needed for control algorithm
–Packet Loss Rate–Bandwidth Estimation–RTT
–(tcp vegas----rtt ,westwood—be….)
PLR Calculation in Linux Kernel
tcp_input.c is the core of the implementation of the TCP protocol
–handle incoming packets and acks,–identify duplicate acks and packet losses,–adjust congestion window accordingly
PLR Calculation in Linux Kernel
Two types of events are incurred due to congestion: one is retransmission Timeout(rto), and the other is Packet-Loss.
The Timeout event is checked by tcp_head_timedout(),
The Packet-Loss event is checked by tcp_mark_head_lost function.
PLR Calculation in Linux Kernel
the TCP's congestion avoidance (CA) phase is decomposed into five states (defined in the ca_state filed of the tcp_opt data structure).
– TCP_CA_OPEN
– TCP_CA_Disorder
– TCP_CA_CWR
– TCP_CA_Recovery
– TCP_CA_Loss
PLR Calculation in Linux Kernel
The process of the state machine is implemented in function tcp_fastretrans_alert():Processing dubious ack event
PLR Calculation in Linux Kernel
( tcp_update_scoreboard ) –This function will mark all the packets which were not
sacked (till the maximum seq number sacked) as lost packets. Also the packets which have waited for the acks to arrive for interval equivalent to retransmission time are marked as lost packets. The accounting for lost
, sacked and left packets is also done in this function.
PLR Calculation in Linux Kernel
left_out = sacked_out + lost_out; sacked_out: Packets, which arrived to receiver out of
order and hence not ACKed. With SACKs this number is simply amount of SACKed data. Even without SACKs it is easy to give pretty reliable estimate of this number, counting duplicate ACKs.
lost_out: Packets lost by network. TCP has no explicit "loss notification" feedback from network (for now).It means that this number can be only _guessed_. Actually, it is the heuristics to predict lossage that distinguishes different algorithms.