UDP High Speed

Embed Size (px)

Citation preview

  • 8/8/2019 UDP High Speed

    1/45

    UDP-based schemes for High Speed Networks

    Presented By : Sumitha Bhandarkar

    Presented On : 03.24.04

  • 8/8/2019 UDP High Speed

    2/45

    2

    Agenda

    RBUDP E. He, J. Leigh, O. Yu, T. A. DeFanti, Reliable Blast UDP : Predictable High Performance Bulk Data

    Transfer , IEEE Cluster Computing 2002, Chicago, Illinois, Sept 2002.

    Tsunami (No technical resources available) http://www.ncne.org/training/techs/2002/0728/presentations/200207-wallace1_files/v3_document.htm

    SABUL/UDT

    H. Sivakumar, R. L. Grossman, M. Mazzucco, Y. Pan, Q. Zhang, Simple Available Bandwidth

    Utilization Library for High-Speed Wide Area Networks, to appear in Journal of Supercomputing, 2004.

    Y. Gu and R. Grossman, UDT: An Application Level Transport Protocol for Grid Computing , Second

    International Workshop on Protocols for Fast Long-Distance Networks, February 2004 (PFLDnet 2004).

    Y. Gu and R. Grossman, UDT: A Transport Protocol for Data Intensive Applications, IETF DRAFT.http://bebas.vlsm.org/v08/org/rfc-editor/internet-drafts/draft-gg-udt-00.txt

    GTP R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM

    International Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    3/45

    3

    TCP based SchemesThe problems

    .

    Slow Startup

    Slow loss recovery

    RTT bias

    Burstiness caused by window control

    Large amount of control traffic due to per-packet ack

  • 8/8/2019 UDP High Speed

    4/45

    4

    RBUDP

    Intended to be aggressive.

    Intended for high bandwidth dedicated or QOS enabled networks - not

    for deployment on the broader internet.

    Uses UDP for data traffic and TCP for signaling traffic.

    Estimates available bandwidth on the network using Iperf/app_perf

    (NOTE : this requires user interaction ie, NOT automated )

    Tries to send just below this rate in blasts to avoid losses (payload =

    RTT * Estimated BW)

    If losses do occur within a blast, TCP is used to exchange loss reports

    Lost packets are recovered by retransmitting the lost packets in smaller

    blasts

  • 8/8/2019 UDP High Speed

    5/45

    5

    RBUDP

    E. He, J. Leigh, O. Yu, T. A. DeFanti, Reliable Blast UDP : Predictable High Performance Bulk DataTransfer, IEEE Cluster Computing 2002, Chicago, Illinois, Sept 2002.

  • 8/8/2019 UDP High Speed

    6/45

    6E. He, J. Leigh, O. Yu, T. A. DeFanti, Reliable Blast UDP : Predictable High Performance Bulk DataTransfer, IEEE Cluster Computing 2002, Chicago, Illinois, Sept 2002.

    RBUDPSample Results (with network bottleneck)

  • 8/8/2019 UDP High Speed

    7/45

    7E. He, J. Leigh, O. Yu, T. A. DeFanti, Reliable Blast UDP : Predictable High Performance Bulk DataTransfer, IEEE Cluster Computing 2002, Chicago, Illinois, Sept 2002.

    RBUDPSample Results (with receiver bottleneck)

  • 8/8/2019 UDP High Speed

    8/45

    8

    Advantages

    Keeps the pipe as full as possible

    Avoid TCPs per-packet ack interaction

    Paper provides analytical model- so performance is predictable

    Disadvantages

    Sending rate needs to be adjusted by the user (no means of automatically

    adjusting sending rate in response to the dynamic network conditions) -

    Thus the solution is good ONLY in dedicated/QOS supported networks.

    No flow control - a fast sender can flood a slow receiver. Offered

    solution is to use app_perf (modified Iperf developed by the authors to

    take into account the receiver bottleneck) for bandwidth estimation.

    RBUDPConclusions

  • 8/8/2019 UDP High Speed

    9/45

    9

    Tsunami

    No tech papers. This info is from a presentation at July 2002

    NLANR/Internet2 Techs Workshop. Available for download at

    http://www.indiana.edu/~anml/anmlresearch.html. Latest version is dated

    12/09/02

    Very simple and primitive scheme - NOT TCP-FRIENDLY

    Application level protocol - uses UDP for data and TCP for signaling

    Receiver keeps track of lost packets and requests for retransmission

    So how is this different fromRBUDP ?

  • 8/8/2019 UDP High Speed

    10/45

    10

    SABUL / UDT

    SABUL (Simple Available Bandwidth Utilization Library) uses UDP to

    transfer data and TCP to transfer control information.

    UDT (UDP-based Data Transfer Protocol) uses UDP only for both data

    and control information.

    UDT is the successor to SABUL.

    Both are application level protocols available as open source C++ library

    on Linux/BSD/Solaris and NS-2 simulation modules.

  • 8/8/2019 UDP High Speed

    11/45

    11

    SABUL / UDT

    Rate control : for handling dynamic congestion - uses constant ratecontrol interval (called SYN - set to 0.01 seconds) to avoid RTT bias.

    Window based flow control : used in slow start, to ensure that fast sender

    does not swamp a slow receiver, to limit unacknowledged pkts.

    Selective positive acknowledgement (one per SYN) and immediate

    negative acknowledgement.

    Uses both packet loss and packet delay for inferring congestion

    TCP Friendly - less aggressive than TCP in low BDP networks ; better

    than TCP in higher BDP networks.

    PFLDNet 2004 claim : Orthogonal Design - The UDP based framework

    can be used with any congestion control algorithm and the UDT

    congestion control algorithm can be ported to any TCP implementation.

  • 8/8/2019 UDP High Speed

    12/45

    12

    SABUL / UDT

    Y. Gu and R. Grossman, UDT: An Application Level Transport Protocol for Grid Computing, PFLDnet2004.

  • 8/8/2019 UDP High Speed

    13/45

    13

    SABUL / UDTRate Control (AIMD)

    Y. Gu, X. Hong, M. Mazzucco and R. Grossman, SABUL: A High Performance Data Transfer Protocol,

    Submitted for publication.

    Y. Gu and R. Grossman, UDT: An Application Level Transport Protocol for Grid Computing, PFLDnet2004.

    Increase

    If loss rate during the last SYN is less than a threshold (0.1%)

    sending rate is increased.

    Old version (SABUL) :

    New version (UDT) :

    Estimated BW calculated using packet-pair technique

    Every 16th data packet and its successor are sent back to back

    to form packet pair

    Receiver uses median filter on interval between arrival times of

    each packet pair to estimate link capacity

  • 8/8/2019 UDP High Speed

    14/45

    14

    SABUL / UDTRate Control (AIMD)

    Y. Gu and R. Grossman, UDT: An Application Level Transport Protocol for Grid Computing, PFLDnet2004.

  • 8/8/2019 UDP High Speed

    15/45

    15

    SABUL / UDTRate Control (AIMD)

    Decrease

    increase inter-packet time by 1/8 (or equivalently, decrease sending

    rate by 1/9) for one of these conditions -

    if largest lost seq no. in NAK is greater than the largest sent

    sequence number when last decrease occurred

    if it is the 2dec_countth NAK since last time the above condition is

    satisfied. dec_countis reset to 4 each time the first condition is

    satisfied, and incremented by 1 each time the second condition is

    satisfied.

    delay warning is received

    Loss information carried in NAK are also compressed, for loss of

    consecutive packets.

    No data is sent in the next SYN time after a decrease

    Delay warning is generated by the rcvr based on observed RTT

    trend

  • 8/8/2019 UDP High Speed

    16/45

    16

    SABUL / UDTRate Control (AIMD)

    Y. Gu and R. Grossman, UDT: An Application Level Transport Protocol for Grid Computing, PFLDnet2004.

    Flow Control

    Receiver calculates the packet arrival rate (AS) using a median filter

    and sends it back with the ACK

    On sender side if the AS value in the ack is greater than 0, then

    window is updated as

    During congestion loss reports can be dropped or delayed. If sender

    keeps sending new packets, it worsens congestion. Flow control helps

    prevent this.

    Flow control also used in the slow start phase starts with flow window of 2

    similar to TCP

    only beginning of a new session.

  • 8/8/2019 UDP High Speed

    17/45

    17

    SABUL / UDTTimers

    SYN timer - trigger rate control event (fixed at 0.01s)

    SND timer - schedule the data packet sending (updated by rate control

    scheme)

    ACK timer - trigger an ACK. (same as SYN interval)

    NAK timer - Used to trigger a NAK. Its interval is updated to the

    current RTT value each time the SYN timer is expired.

    EXP timer - Used to trigger data packets retransmission and maintain

    connection status. It is somewhat similar to the TCP RTO.

  • 8/8/2019 UDP High Speed

    18/45

    18

    SABUL / UDTSimulation Results

    100Mbps/

    1ms

    1Gbps/

    100ms

    Y. Gu and R. Grossman, Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product

    Networks, submitted for publication.

  • 8/8/2019 UDP High Speed

    19/45

    19

    SABUL / UDTSimulation Results

    Y. Gu and R. Grossman, Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product

    Networks, submitted for publication.

    7 concurrent

    flows

    100Mbpsbottleneck link

  • 8/8/2019 UDP High Speed

    20/45

    20

    SABUL / UDTSimulation Results

    Y. Gu and R. Grossman, Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product

    Networks, submitted for publication.

  • 8/8/2019 UDP High Speed

    21/45

    21

    SABUL / UDTSimulation Results

    Y. Gu and R. Grossman, Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product

    Networks, submitted for publication.

  • 8/8/2019 UDP High Speed

    22/45

    22

    SABUL / UDTReal Implementation Results

    Y. Gu and R. Grossman, Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product

    Networks, submitted for publication.

  • 8/8/2019 UDP High Speed

    23/45

    23

    SABUL / UDTReal Implementation Results

    1Gbps/

    40us

    Y. Gu and R. Grossman, Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product

    Networks, submitted for publication.

  • 8/8/2019 UDP High Speed

    24/45

    24

    SABUL / UDTReal Implementation Results

    1Gbps/ 110ms

    I-TCP = TCP with

    concurrent UDT flows

    S-TCP = TCP without

    concurrent UDT flows

    Y. Gu and R. Grossman, Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product

    Networks, submitted for publication.

  • 8/8/2019 UDP High Speed

    25/45

    25

    SABUL / UDTReal Implementation Results

    Y. Gu and R. Grossman, Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product

    Networks, submitted for publication.

  • 8/8/2019 UDP High Speed

    26/45

    26

    SABUL / UDTConclusions

    1http://www.slac.stanford.edu/grp/scs/net/talk03/pfld-feb04.ppt

    From one of the SLAC talks 1 - Looks good, BUT 4*CPU Utilization ofTCP

    Reordering robustness worse than TCP - all out-of-order packets are treated

    as losses. Suggested solution is to delay NAK reports by a short delay.

    All losses are treated as congestion - bad performance at high link errorrates. (Better than TCP though, since it does not respond to each and every

    loss event).

    Router queue size is maintained smaller compared to TCP due to less

    burstiness.

    Increase algorithm relies on bandwidth estimation - may not be suitable forlinks with large number of concurrent flows.

  • 8/8/2019 UDP High Speed

    27/45

    27

    GTPGroup Transport Protocol

    Motivated by the following observations about lambda grids

    Very high speed (1Gig, 10Gig etc.) dedicated links connecting small

    number of end points (eg 103, and not 108) and possibly long delays

    (eg. 60ms between experimental sites)

    Communication patterns not necessarily just point-to-point ;multipoint-to-point and multipoint-to-multipoint very likely.

    Aggregate capacity of multiple connections could be far greater

    than data handling speed of end system end point congestion far

    more likely than network congestion

  • 8/8/2019 UDP High Speed

    28/45

    28

    GTPOverview

    Receiver-driven (dumb sender, very smart receiver)

    Request-response data transfer model

    Rate-based explicit flow control

    Receiver-centric max-min fair allocation across multiple flows(irrespective of individual RTTS)

    UDP for data, TCP for control connection.

  • 8/8/2019 UDP High Speed

    29/45

  • 8/8/2019 UDP High Speed

    30/45

    30

    GTPFramework (cont.)

    Single Flow Controller (SFC) : manages sending data packet requests,

    chooses/requests sending rate, manages receiver buffer requirements

    Single Flow Monitor (SFM) : Measures flow statistics such as allocated

    rate, achieved rate, packet loss rate, rtt estimate etc, which will be used by

    both SFC and CE

    Capacity Estimator (CE) : Estimates flow capacity for each individual

    flow based on statistics from SFM

    Max-min Fairness Scheduler : Estimates max-min fair share for each

    individual flow

  • 8/8/2019 UDP High Speed

    31/45

    31

    GTPFlow Control and Rate Allocation

    Single Flow Controller (SFC) :

    flow rate adjusted per RTT

    loss proportional-decrease and proportional-increase for rate adaptation

    Capacity Estimator (CE) : flow rate adjusted per centralized control interval (default 3*RTTmax)

    Exponential Increase and loss proportional-decrease

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    32/45

    32

    GTPFlow Control and Rate Allocation (cont.)

    Target rate for each flow is

    Max-min Fairness Scheduler adjusts the target flow rate to ensure max-min

    fairness

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    33/45

    33

    GTPOther Details

    Current implementation expects in-order deliver. Can be augmented infuture for handling out-of-order packets.

    TCP-Friendliness is tunable by allocating a fixed share of the total

    bandwidth for TCP in the CE

    Currently congestion detection is only loss based. Future work willaugment the algorithm to include delay-based congestion detection.

    Transition management ensures max-min fairness is maintained even

    when flows join/leave dynamically.

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    34/45

    34

    GTPSimulation Results

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    35/45

    35

    GTPSimulation Results (Cont.)

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    36/45

    36

    GTPSimulation Results (Cont.)

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    37/45

    37

    GTPEmulation Results

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    38/45

    38

    GTPEmulation Results (Cont.)

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    39/45

    39

    GTPReal Implementation Results

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    40/45

    40

    GTPReal Implementation Results (Cont.)

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    41/45

    41

    GTPReal Implementation Results (Cont.)

    R.X. Wu and A.A. Chien, GTP: Group Transport Protocol for Lambda-Grids, 4th IEEE/ACM International

    Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

  • 8/8/2019 UDP High Speed

    42/45

    42

    Questions ???

  • 8/8/2019 UDP High Speed

    43/45

    43

    Extra Slides

  • 8/8/2019 UDP High Speed

    44/45

    44

    Scatter/Gather DMA

    Optimization for improving network stack processing

    Under normal circumstances, data is copied between kernel and app memory

    This is required because the network device drivers read/write contiguous

    memory locations, whereas applications use mapped virtual memory

    When the NIC drivers are capable of scatter/gather DMA a scatter/gather listis maintained so that the NICS can do direct read/write to the final memory

    location where the data is intended to go. The scatter/gather data structure

    makes the memory look contiguous to the NID drivers

    All protocol processing is done by reference. Eliminating the memory copy

    has shown to improve performance dramatically

    In practice, the process is a little more complicated. At the send side copy-on-

    write should be enforced so that packets sent out but not acknowledged are not

    overwritten. At the recv side, page borders should be enforced .

  • 8/8/2019 UDP High Speed

    45/45

    45

    Packet Pair BW Estimation

    Two packets of same size (L) are transmitted back to back

    Bottleneck link capacity (C) is smaller than the capacity of all other the

    links (by definition)

    Packets face transmission delay at the bottleneck link

    As a result at the receiver they arrive with larger inter-packet delay than

    when they were sent

    This delay can be used to computer the bottleneck link capacity

    (Makes lots of assumptions. Also will work only with FIFO queuing)