68
4. Behind Traditional TCP: protocols for high-throughput and high-latency networks PA191: Advanced Computer Networking Eva Hladk´ a Slides by: Petr Holub, Tom´ s Rebok Faculty of Informatics Masaryk University Autumn 2014 Eva Hladk´ a (FI MU) 4. Behind Traditional TCP . Autumn 2014 1 / 66

4. Behind Traditional TCP: protocols for high-throughput and high … · 2014. 11. 11. · Lecture overview 1 Traditional TCP and its issues 2 Improving the traditional TCP Multi-stream

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • 4. Behind Traditional TCP:protocols for high-throughput and high-latency networks

    PA191: Advanced Computer Networking

    Eva Hladká

    Slides by: Petr Holub, Tomáš Rebok

    Faculty of Informatics Masaryk University

    Autumn 2014

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 1 / 66

  • Lecture overview

    1 Traditional TCP and its issues2 Improving the traditional TCP

    Multi-stream TCPWeb100

    3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST

    5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP

    6 Conclusions7 Literature

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 2 / 66

  • Traditional TCP

    Lecture overview

    1 Traditional TCP and its issues2 Improving the traditional TCP

    Multi-stream TCPWeb100

    3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST

    5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP

    6 Conclusions7 Literature

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 3 / 66

  • Traditional TCP

    Protocols for reliable data transmission

    Protocols for reliable data transmission have to:

    ensure the reliability of the transfer

    retransmissions of lost packetsFEC might be usefully employed

    a protection from congestion

    network, receiver

    Behavior evaluation:

    aggressiveness – ability to utilise available bandwidth

    responsiveness – ability to recover from a packet loss

    fairness – getting a fair portion of network throughput when morestreams/participants use the network

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 4 / 66

  • Traditional TCP

    Problem statement

    network links with high capacity and high latency

    iGrid 2005: San Diego ↔ Brno, RTT = 205 msSC|05: Seattle ↔ Brno, RTT = 174 ms

    traditional TCP is not suitable for such an environment:

    10 Gb/s, RTT = 100 ms, 1500B MTU=⇒ sending/outstanding window 83.333 packets=⇒ a single packet may be lost in at most 1:36 hour

    1 terribly slow2 if errors are more frequent, the maximum throughput cannot be reached

    How could be a better network utilization achieved?

    How could be a reasonable co-existence with traditional TCP ensured?

    How could be a gradual deployment of a new protocol ensured?

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 5 / 66

  • Traditional TCP

    Traditional TCP I.

    flow control vs. congestion control

    no control:

    sender network receiver

    flow control (rwnd):

    sender network receiver

    congestion control (cwnd):

    sender network receiver

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 6 / 66

  • Traditional TCP

    Traditional TCP I.

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 7 / 66

  • Traditional TCP

    Traditional TCP II.

    Flow control

    an explicit feedback from receiver(s) using rwnddeterministic

    Congestion control

    an approximate sender’s estimation of available throughput (usingcwnd)

    the final window used: ownd

    ownd = min{rwnd , cwnd}

    The bandwidth bw could be computed as:

    bw =8 ·MSS · ownd

    RTT(1)

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 8 / 66

  • Traditional TCP

    Traditional TCP II.Flow Control

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 9 / 66

  • Traditional TCP

    Traditional TCP – Tahoe and Reno

    Congestion control:traditionally based on AIMD – Additive Increase Multiplicative Decrease approach

    Tahoe [1]cwnd = cwnd + 1. . . per RTT without loss (above sstresh)sstresh = 0, 5cwndcwnd = 1. . . per every loss

    Reno [2] addsfast retransmission

    a TCP receiver sends an immediate duplicate ACK when an out-of-ordersegment arrives

    all segments after the dropped one trigger duplicate ACKsa loss is indicated by 3 duplicate ACKs (≈ four successive identical ACKswithout intervening packets)

    once received, TCP performs a fast retransmission without waiting forthe retransmission timer to expire

    fast recovery – slow-start phase not used any moresstresh = cwnd = 0, 5cwndEva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 10 / 66

  • Traditional TCP

    Traditional TCP – Tahoe I.

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 11 / 66

  • Traditional TCP

    Traditional TCP – Tahoe II.

    0 10 20 30 40 50

    time [RTT]

    0

    5

    10

    15

    20

    25

    30[M

    SS

    ]

    cwnd sstresh

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 12 / 66

  • Traditional TCP

    Traditional TCP – Reno I.

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 13 / 66

  • Traditional TCP

    Traditional TCP – Reno II.

    0 10 20 30 40 50

    time [RTT]

    0

    5

    10

    15

    20

    25

    30[M

    SS

    ]

    cwnd sstresh

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 14 / 66

  • Traditional TCP

    TCP Vegas

    Vegas—a concept of congestion control [3]

    when a network is congested, the RTT becomes higherRTT is monitored during the transmissionwhen a RTT increase is detected, the congestion’s window size islinearly reduced

    a possibility to measure an available network bandwidth usinginter-packet spacing/dispersion

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 15 / 66

  • Traditional TCP

    Traditional TCP

    a reaction to packet loss—retransmission

    Tahoe: the whole actual window owndReno: a single segment in the Fast Retransmission modeNewReno: more segments in the Fast Retransmission modeSelective Acknowledgement (SACK): just the lost packets

    fundamental question:How could be a sufficient size of cwnd (under real conditions)achieved in the network having high capacity and high RTT?. . . without affecting/disallowing the “common” users from using thenetwork?

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 16 / 66

  • Traditional TCP

    Traditional TCP – Response Function

    Response Function represents a relation between bw and asteady-state packet loss rate p

    cwndaverage ≈ 1,2√p (for MSS-sized segments)using (1): bw ≈ 9,6 MSSRTT√p

    the responsiveness of traditional TCP

    assuming, that the packet has been lost when cwnd = bw · RTT

    % =bw RTT2

    2MSS

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 17 / 66

  • Traditional TCP

    Traditional TCP – Responsiveness

    PSaAP II2004-03-19

    12/52

    Traditional TCP responsivness

    TCP responsiveness

    02000400060008000

    1000012000140001600018000

    0 50 100 150 200

    RTT (ms)

    Tim

    e (s

    ) C= 622 Mbit/sC= 2.5 Gbit/sC= 10 Gbit/s

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 18 / 66

  • Traditional TCP

    Traditional TCP – Fairness I.

    a fairness in a point of equilibrium

    the fairness is considered for

    streams with different RTTstreams with different MTU

    The speed of convergence to the point of equilibrium DOES matter!

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 19 / 66

  • Traditional TCP

    Traditional TCP – Fairness II.

    cwnd += MSS, cwnd ∗= 0, 5 (30 steps)

    00

    20

    20

    40

    40

    60

    60

    80

    80

    100

    100

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 20 / 66

  • Traditional TCP

    Traditional TCP – Fairness III.

    cwnd += MSS, cwnd ∗= 0, 83 (30 steps)

    00

    20

    20

    40

    40

    60

    60

    80

    80

    100

    100

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 21 / 66

  • Improving the traditional TCP Multi-stream TCP

    Lecture overview

    1 Traditional TCP and its issues2 Improving the traditional TCP

    Multi-stream TCPWeb100

    3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST

    5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP

    6 Conclusions7 Literature

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 22 / 66

  • Improving the traditional TCP Multi-stream TCP

    Multi-stream TCP

    assumes multiple TCP streams transferring a single data flow

    in fact, improves the TCP’s performace/behavior just in cases ofisolated packet losses

    a loss of more packets usually affects more TCP streams

    usually available because of a simple implementation

    bbftp, GridFTP, Internet Backplane Protocol, . . .

    drawbacks:

    more complicated than traditional TCP (more threads are necessary)the startup is accelerated linearly onlyleads to a synchronous overloading of queues and caches in the routers

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 23 / 66

  • Improving the traditional TCP Multi-stream TCP

    TCP implementation tuning I.

    cooperation with HW

    Rx/Tx TCP Checksum Offloadingordinarily available

    zero copy

    accessing the network usually leads to several data copies:user-land ↔ kernel ↔ network cardpage flipping – user-land ↔ kernel data movement

    support for, e.g., sendfile()

    implementations for Linux, FreeBSD, Solaris, . . .

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 24 / 66

    sendfile()

  • Improving the traditional TCP Web100

    TCP implementation tuning II.

    Web100 [4, 5]a software that implements instruments in the Linux TCP/IP stack –TCP Kernel Instrumentation Set (TCP-KIS)

    more than 125 “puls/rods”information available via /proc

    distributed in two pieces:

    a kernel patch adding the instrumentsa suite of “userland” libraries and tools for accessing the kernelinstrumentation (command-line, GUI)

    the Web100 software allows:

    monitoring (extended statistics)instruments’ tuningsupport for auto-tuning

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 25 / 66

  • Conservative Extensions to TCP GridDT

    Lecture overview

    1 Traditional TCP and its issues2 Improving the traditional TCP

    Multi-stream TCPWeb100

    3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST

    5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP

    6 Conclusions7 Literature

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 26 / 66

  • Conservative Extensions to TCP GridDT

    GridDT

    a collection of ad-hoc modifications :(

    correction of sstresh

    faster slowstart

    AIMD’s modification for congestion control:

    cwnd = cwnd + a. . . per RTT without packet losscwnd = b cwnd. . . per packet loss

    just the sender’s side has to be modified

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 27 / 66

  • Conservative Extensions to TCP GridDT

    GridDT – fairness

    00

    20

    20

    40

    40

    60

    60

    80

    80

    100

    100

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 28 / 66

  • Conservative Extensions to TCP GridDT

    GridDT – example

    PSaAP II2004-03-19

    24/52

    GridDT example

    SunnyvaleSunnyvale

    Starlight (CHI)Starlight (CHI)

    CERN (GVA)CERN (GVA)

    RR RRGbEGbESwitchSwitch POS 2.5POS 2.5 Gb/sGb/s1 GE1 GE

    1 GE1 GEHost #2Host #2

    Host #1Host #1

    Host #2Host #2

    1 GE1 GE

    1 GE1 GEBottleneckBottleneck

    RRPOS 10POS 10 Gb/sGb/sRR10GE10GE

    Host #1Host #1

    TCP Reno performance (see slide #8): First stream GVA Sunnyvale : RTT = 181 ms ; Avg. throughput over a period of 7000s = 202 Mb/s Second stream GVACHI : RTT = 117 ms; Avg. throughput over a period of 7000s = 514 Mb/s Links utilization 71,6%

    Grid DT tuning in order to improve fairness between two TCP streams with different RTT:First stream GVA Sunnyvale : RTT = 181 ms, Additive increment = A = 7 ; Average throughput = 330 Mb/sSecond stream GVACHI : RTT = 117 ms, Additive increment = B = 3 ; Average throughput = 388 Mb/sLinks utilization 71.8%

    Throughput of two streams with different RTT sharing a 1Gbps bottleneck

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    0 1000 2000 3000 4000 5000 6000Time (s)

    Thro

    ughp

    ut (M

    bps)

    A=7 ; RTT=181ms

    Average over the life of the connection RTT=181msB=3 ; RTT=117ms

    Average over the life of the connection RTT=117ms

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 29 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    Scalable TCP

    proposed by Tom Kelly [1]

    congestion control is not AIMD any more:

    cwnd = cwnd + 0, 01 cwnd. . . per RTT without packet losscwnd = cwnd + 0, 01. . . per ACKcwnd = 0, 875 cwnd. . . per packet loss

    =⇒ Multiplicative Increase Multiplicative Decrease (MIMD)for smaller window size and/or higher loss rate in the network theScalable-TCP switches into AIMD mode

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 30 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    Scalable TCP

    PSaAP II2004-03-19

    26/52

    Scalable TCP (2)

    Figure: Packet loss recovery times for the traditional TCP (left) are proportional tocwnd and RTT. A Scalable TCP connection (right) has packet loss recovery times thatare proportional to connection’s RTT only. (Note: link capacity c < C)

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 31 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    Scalable TCP – fairness I.

    Two concurrent Scalable TCP streams, Scalable control switched on when>30Mb/s, twiced number of steps in comparison with previous simulations

    00

    20

    20

    40

    40

    60

    60

    80

    80

    100

    100

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 32 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    Scalable TCP – fairness II.

    Scalable TCP and traditional TCP streams, Scalable control switched onwhen >30Mb/s, twiced number of steps

    00

    20

    20

    40

    40

    60

    60

    80

    80

    100

    100

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 33 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    Scalable TCP – Response curve

    PSaAP II2004-03-19

    28/52

    Scalable TCP - response curve

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 34 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    High-Speed TCP (HSTCP)

    Sally Floyd, RFC3649, [2]

    congestion control AIMD/MIMD:

    cwnd = cwnd + a(cwnd). . . per RTT without loss

    cwnd = cwnd + a(cwnd)cwnd. . . per ACKcwnd = b(cwnd) cwnd. . . per packet loss

    emulates the behavior of traditional TCP for small window sizesand/or higher packet loss rates in the network

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 35 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    High-Speed TCP (HSTCP)

    proposed MIMD parametrization:

    b(cwnd) =−0, 4(log(cwnd)− 3, 64)

    7, 69+ 0, 5

    a(cwnd) =2cwnd2b(cwnd)

    12, 8(2− b(cwnd))w 1,2

    200 400 600 800 1000 1200 1400 1600 1800 2000

    time [RTT]

    10000

    20000

    30000

    40000

    50000

    60000

    70000

    80000

    cwn

    d[M

    SS

    ]

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 36 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    High-Speed TCP (HSTCP)

    a parametrization equivalent to the Scalable-TCP is possible:⇒ Linear HSTCPa comparison with the Multi-stream TCP

    N(cwnd) ≈ 0, 23cwnd0,4

    N(cwnd) – the number of parallel TCP connections emulated by theHighSpeed TCP response function with congestion window cwnd

    Neither Scalable TCP nor HSTCP (sophistically) deal with the slow-startphase.

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 37 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    HSTCP – Response curve

    PSaAP II2004-03-19

    35/52

    HSTCP (4)

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 38 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    H-TCP I.

    created by researchers at the Hamilton Institute in Irelanda simple change to cwnd increase function

    increases its aggressiveness (in particular, the rate of additiveincrease) as the time since the previous loss (backoff) increases

    increase rate α is a function of the elapsed time since the last backoffthe AIMD mechanism is used

    preserves many of the key properties of standard TCP: fairness,responsiveness, relationship to buffering

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 39 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    H-TCP II.

    ∆ . . . time elapsed from last congestion experienced

    ∆L . . . for ∆ ≤ ∆L a TCP’s grow is used∆B . . . the bandwidth threshold, above which the TCP fall is used(for significant bandwidth changes the 0.5 fall is used)

    Tmin, Tmax . . . the minimal resp. maximal RTTs measured

    B(k) . . . maximum throughput measurement for the last intervalwithout packet loss

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 40 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    H-TCP III.

    cwnd = cwnd + 2(1−β) a(∆)cwnd. . . per ACK

    cwnd = b(B) cwnd. . . per loss

    a(∆) =

    {1max{a′(∆)Tmin; 1}

    ∆ ≤ ∆L∆ > ∆L

    b(B) =

    {0, 5

    min{ TminTmax ; 0, 8}

    ∣∣∣B(k+1)−B(k)B(k) ∣∣∣ > ∆Bin the other case

    a′(∆) = 1 + 10(∆−∆L) + 0, 5(∆−∆L)2

    . . . quadratic increment function

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 41 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    BIC-TCP

    the default algorithm in Linux kernels (2.6.8 and above)

    uses binary-search algorithm for cwnd update [3]4 phases:(1) a reaction to a packet loss(2) additive increase(3) binary search(4) maximum probing

    additive increase binary search maximum probing

    > Smax

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 42 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    BIC-TCP

    (1) Packet loss

    BIC-TCP starts from the TCP slow startwhen a loss is detected, it uses multiplicative decrease (as standardTCP) and sets the windows just before and after loss event as:

    previous window size →Wmax (the size of cwnd before the loss)reduced window size →Wmin (the size of cwnd after the loss)

    =⇒ because the loss occured when cwnd ≤Wmax , the point ofequilibrium of cwnd will be searched in the range 〈Wmin; Wmax〉

    (2) Additive increase

    starting the search from cwnd = Wmin+Wmax2 might be too challengingfor the networkthus, when Wmin+Wmax2 > Wmin + Smax , the additive increase takes place→ cwnd = Wmin + Smax

    the window linearly increases by Smax every RTT

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 43 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    BIC-TCP

    (3) Binary search

    once the target (cwnd = Wmin+Wmax2 ) is reached, the Wmin = cwnd

    otherwise (a packet loss happened) Wmax = cwnd

    and the searching continues to the new target (using the additiveincrease, if necessary) until the change of cwnd is less than theSminconstant

    here, cwnd = Wmax is set

    The points (2) and (3) lead to linear (additive) increase, which turns intologarithmic one (binary search).

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 44 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    BIC-TCP

    (4) Maximum probing

    inverse process to points (3) and (2)first, the inverse binary search takes place (until the cwnd growth isgreater than Smax)once the cwnd growth is greater than Smax , the linear growth (by areasonably large fixed increment) takes place

    first exponencial growth, then linear growth

    Assumed benefits:

    traditional TCP “friendliness”

    during the “plateau” (3), the TCP flows are able to growAIMD behavior (even though faster) during (2) and (4) phases

    more stable window size ⇒ better network utilizationmost of the time, the BIC-TCP should spend in the “plateau” (3)

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 45 / 66

  • Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    CUBIC-TCP

    even though being pretty good scalable, fair, and stable, BIC’s growth function is

    considered to be still aggressive for TCPespecially under short RTTs or low speed networks

    CUBIC-TCPa new release of BIC, which uses a cubic functionfor the purpose of simplicity in protocol analysis, the number of phases wasfurther reduced

    Wcubic = C (T − K )3 + Wmaxwhere C is a scaling constant, T is the time elapsed since last loss event, Wmax is the window

    size before loss event, K = 3√

    WmaxβC

    , and β is a constant decrease factor

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 46 / 66

  • TCP Extensions with IP Support QuickStart, E-TCP, FAST

    Lecture overview

    1 Traditional TCP and its issues2 Improving the traditional TCP

    Multi-stream TCPWeb100

    3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST

    5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP

    6 Conclusions7 Literature

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 47 / 66

  • TCP Extensions with IP Support QuickStart, E-TCP, FAST

    Quickstart (QS)/Limited Slowstart I.

    there is a strong assumption, that the slow-start phase cannot beimproved without an interaction with lower network layers

    a proposal: 4-byte option in IP header, which comprises of QS TTLand Initial Rate fields

    sender, which wants to use the QS, sets the QS TTL to an arbitrary(but high enough) value and the Initial Rate to requested rate, whichit wants to start the sending at, and sends the SYN packet

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 48 / 66

  • TCP Extensions with IP Support QuickStart, E-TCP, FAST

    Quickstart (QS)/Limited Slowstart II.

    each router on the path, which support the QS, decreases theQS TTL by one and decreases the Initial Rate, if necessary

    receiver sends the QS TTL and Initial Rate in the SYN/ACK packetto the sender

    sender knows, whether all the routers on the path support the QS(by comparing the QS TTL and the TTL)

    sender sets the appropriate cwnd and starts using its congestioncontrol mechanism (e.g., AIMD)

    Requires changes in the IP layer! :-(

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 49 / 66

  • TCP Extensions with IP Support QuickStart, E-TCP, FAST

    E-TCP I.

    Early Congestion Notification (ECN)

    a component of Advanced Queue Management (AQM)a bit, which is set by routers when a congestion of link/buffer/queue iscomingECN flag has to be mirrored by the receiverthe TCP should react to the ECN bit being set in the same way as to apacket lossrequires the routers’ administrators to configure the AQM/ECN :-(

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 50 / 66

  • TCP Extensions with IP Support QuickStart, E-TCP, FAST

    E-TCP II.

    E-TCPproposes to mirror the ECN bit just once (for the first time only)freezes the cwnd when an ACK having ECN-bit set is received from thereceiverrequires introducing of small (synthetic) losses to the network in orderto perform multiplicative decrease because of fairnessrequires a change in receivers’ behavior to ECN bit :-(

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 51 / 66

  • TCP Extensions with IP Support QuickStart, E-TCP, FAST

    FAST

    Fast AQM Scalable TCP (FAST) [5]

    uses end-to-end delay, ECN and packet losses for congestiondetection/avoidance

    if too few packets are queued in the routers (detected by RTTmonitoring), the sending rate is increased

    differences from the TCP Vegas:

    TCP Vegas makes fixed size adjustments to the rate, independent ofhow far the current rate is from the target rateFAST TCP makes larger steps when the system is further fromequilibrium and smaller steps near equilibriumif the ECN is available in the network, FAST TCP can be extended touse ECN marking to replace/supplement queueing delay and packetloss as the congestion measure

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 52 / 66

  • Approaches Different from TCP

    Lecture overview

    1 Traditional TCP and its issues2 Improving the traditional TCP

    Multi-stream TCPWeb100

    3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST

    5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP

    6 Conclusions7 Literature

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 53 / 66

  • Approaches Different from TCP tsunami

    tsunami

    TCP connection for out-of-band control channel

    connection parameters negitiationrequirements for retransmissions – uses NACKs instead of ACKsconnection termination negotiation

    UDP channel for data transmission

    MIMD congestion controlhighly configurable/customizable

    MIMD parameters, losses threshold, maximum size of the queue forretransmissions, the interval of sending the retransmissions’ requests,etc.

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 54 / 66

  • Approaches Different from TCP RBUDP

    Reliable Blast UDP – RBUDP

    similar to tsunami – out-of-band TCP channel for control, UDP fordata transmission

    proposed for disk-to-disk transmissions, resp. the transmissions wherethe complete transmitted data could be saved in the sender’s memory

    sends data in a user-defined rate

    app perf (a clon of iperf) is used for an estimation ofnetworks’/receivers’ capacity

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 55 / 66

  • Approaches Different from TCP RBUDP

    Reliable Blast UDP – RBUDP

    E. He, J. Leigh, O. Yu, T. A. DeFanti, Reliable Blast UDP : Predictable High Performance Bulk Data Transfer, accepted paper, IEEE Cluster Computing 2002, Chicago, Illinois, Sept, 2002.

    3

    In RBUDP, the most important input parameter is the sending rate of the UDP blasts. To minimize loss, the sending rate should not be larger than the bandwidth of the bottleneck link (typically a router). Tools such as Iperf [Iperf] and netperf [Netperf] are typically used to measure the bottleneck bandwidth. In theory if one could send data just below this rate, data loss should be near zero. In practice however, other factors need to be considered. In our first implementation of RBUDP, we chose a send rate of 5% less than the available network bandwidth predicted by Iperf. Surprisingly this resulted in approximately 33% loss! After further investigation we found that the problem was in the end host rather than the network. Specifically, the receiver was not fast enough to keep up with the network while moving data from the kernel buffer to application buffers. When we used a faster computer as the receiver, the loss rate decreased to less than 2%. The details of this experiment are further discussed in Section 5. The chief problem with using Iperf as a measure of possible throughput over a link is that it does not take into account the fact that in a real application, data is not simply streamed to a receiver and discarded. It has to be moved into main memory for the application to use. This has motivated us to produce app_perf (a modified version of iperf) to take into account an extra memory copy that most applications must perform. We can therefore use app_perf as a more realistic bound for how well a transmission scheme should be able to reasonably obtain. In the experiments detailed in Section 5, we however include both iperf and app_perf’s prediction of available bandwidth.

    Figure 1. The Time Sequence Diagram of RBUDP

    Three versions of RBUDP were developed:

    1. RBUDP without scatter/gather optimization – this is a naïve implementation of RBUDP where each incoming packet is examined (to determine where it should go in the application’s memory buffer) and then moved there.

    2. RBUDP with scatter/gather optimization – this implementation takes advantage of the fact that most incoming packets are likely to arrive in order, and if transmission rates are below the maximum throughput of the network, packets are unlikely to be lost. The algorithm works by using readv() to directly move the data from kernel memory to its predicted location in the application’s memory. After performing this readv() the packet header is examined to determine if it was placed in the correct location. If it was not (either because it was an out-of-order packet, or an intermediate packet was lost), then the packet is moved to the correct location in the user’s memory buffer.

    3. “Fake” RBUDP – this implementation is the same as the scheme without the scatter/gather optimization except the incoming data is never moved to application memory. This was used to examine the overhead of the RBUDP protocol compared to raw transmission of UDP packets via Iperf.

    Sender Receiver

    A

    B C

    D

    E F

    G

    UDP data traffic

    TCP signaling traffic

    Source: E. He, J. Leigh, O. Yu, T. A. DeFanti, “Reliable Blast UDP: Predictable High Performance BulkData Transfer,” IEEE Cluster Computing 2002, Chicago, Illinois, Sept, 2002.

    A start of the transmission (using pre-defined rate)B end of the transmissionC sending the DONE signal via the control channel; the receiver responses with a

    mask of data, that had arrivedD re-sending of missing data

    E-F-G end of transmission

    The steps C and D repeat until all the data are delivered.

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 56 / 66

  • Approaches Different from TCP XCP

    eXplicit Control Protocol – XPC

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 57 / 66

    uses a feedback from routers per paket

    PSaAP II2004-03-19

    44/52

    XCP

    • per packet feedback from routers

    Feedback

    Round Trip Time

    Congestion Window

    Congestion Header

    Feedback

    Round Trip Time

    Congestion Window

    Feedback =+ 0.1 packet

  • Approaches Different from TCP XCP

    eXplicit Control Protocol – XPC

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 57 / 66

    uses a feedback from routers per paket

    PSaAP II2004-03-19

    45/52

    XCP (2)

    Feedback =+ 0.1 packet

    Round Trip Time

    Congestion Window

    Feedback =- 0.3 packet

  • Approaches Different from TCP XCP

    eXplicit Control Protocol – XPC

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 57 / 66

    uses a feedback from routers per paket

    PSaAP II2004-03-19

    46/52

    XCP (3)

    Congestion Window = Congestion Window + Feedback

  • Approaches Different from TCP SCTP, DCCP, STP, Reliable UDP, XTP

    Different approaches I.

    SCTP

    multi-stream, multi-homed transport (end node might have several IPaddresses)message-oriented like UDP, ensures reliable, in-sequence transport ofmessages with congestion control like TCPhttp://www.sctp.org/

    DCCP

    non-reliable protocol (UDP) with a congestion control compatible withthe TCPhttp://www.ietf.org/html.charters/dccp-charter.html

    http://www.icir.org/kohler/dcp/

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 58 / 66

    http://www.sctp.org/http://www.ietf.org/html.charters/dccp-charter.htmlhttp://www.icir.org/kohler/dcp/

  • Approaches Different from TCP SCTP, DCCP, STP, Reliable UDP, XTP

    Different approaches II.

    STP

    based on CTS/RTSa simple protocol designed for a simple implementation in HWwithout any sophisticated congestion control mechanismhttp://lwn.net/2001/features/OLS/pdf/pdf/stlinux.pdf

    Reliable UDP

    ensures reliable and in-order delivery (up to the maximum number ofretransmissions)RFC908 a RFC1151originally proposed for IP telephonyconnection parameters can be set per-connectionhttp://www.javvin.com/protocolRUDP.html

    XTP (Xpress Transfer Protocol), . . .

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 59 / 66

    http://lwn.net/2001/features/OLS/pdf/pdf/stlinux.pdfhttp://www.javvin.com/protocolRUDP.html

  • Conclusions

    Lecture overview

    1 Traditional TCP and its issues2 Improving the traditional TCP

    Multi-stream TCPWeb100

    3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST

    5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP

    6 Conclusions7 Literature

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 60 / 66

  • Conclusions

    Conclusions I.

    Current state:

    multi-stream TCP is intensively used (e.g., Grid applications)looking for a way which will allow safe (i.e., backward compatible)development/deployment of post-TCP protocolsaggressive protocols are used on private/dedicated networks/circuits(e.g., λ-networks CzechLight/CESNET2, SurfNet, CaNET*4, . . . )implementation SCTP under FreeBSD 7.0implementation DCCP under Linux

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 61 / 66

  • Conclusions

    Conclusions II.

    interaction with L3 (IP)

    interaction with data link layer

    variable delay and throughput in wireless networksoptical burst switching

    specific per-flow states in routers:

    e.g., per-flow setting for packet loss generation (→ E-TCP)may help short-term flows with high capacity demands (macro-bursts)problem with scalability and cost :-(

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 62 / 66

  • Literature

    Lecture overview

    1 Traditional TCP and its issues2 Improving the traditional TCP

    Multi-stream TCPWeb100

    3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST

    5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP

    6 Conclusions7 Literature

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 63 / 66

  • Literature

    Literature

    Jacobson V. “Congestion Avoidance and Control”, Proceedings of ACM SIGCOMM’88(Standford, CA, Aug. 1988), pp. 314–329.ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z

    Allman M., Paxson V., Stevens W. “TCP Congestion Control”, RFC2581, Apr. 1999.http://www.rfc-editor.org/rfc/rfc2581.txt

    Brakmo L., Peterson L. “TCP Vegas: End to End Congestion Avoidance on a GlobalInternet”, IEEE Journal of Selected Areas in Communication, Vol. 13, No. 8, pp.1465–1480, Oct. 1995. ftp://ftp.cs.arizona.edu/xkernel/Papers/jsac.ps

    http://www.web100.org

    Hacker T. J., Athey B. D., Sommerfield J. “Experiences Using Web100 for End-To-EndNetwork Performance Tuning”http://www.web100.org/docs/ExperiencesUsingWeb100forHostTuning.pdf

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 64 / 66

    ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Zhttp://www.rfc-editor.org/rfc/rfc2581.txtftp://ftp.cs.arizona.edu/xkernel/Papers/jsac.pshttp://www.web100.orghttp://www.web100.org/docs/ExperiencesUsingWeb100forHostTuning.pdf

  • Literature

    Literature

    Kelly T. “Scalable TCP: Improving Performance in Highspeed Wide Area Networks”,PFLDnet 2003,http://datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdf,http://wwwlce.eng.cam.ac.uk/~ctk21/scalable/

    Floyd S. “HighSpeed TCP for Large Congestion Windows”, 2003,http://www.potaroo.net/ietf/all-ids/draft-floyd-tcp-highspeed-03.txt

    BIC-TCP, http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/

    Floyd S., Allman M., Jain A., Sarolahti P. “Quick-Start for TCP and IP”, 2006,http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-quickstart-02.txt

    Jin C., Wei D., Low S. H., Buhrmaster G., Bunn J., Choe D. H., Cottrell R. L. A., Doyle J.C., Newman H., Paganini F., Ravot S., Singh S. “FAST – Fast AQM Scalable TCP.”http://netlab.caltech.edu/FAST/

    http://netlab.caltech.edu/pub/papers/FAST-infocom2004.pdf

    tsunami, http://www.anml.iu.edu/anmlresearch.html

    Zdroj: E. He, J. Leigh, O. Yu, T. A. DeFanti, “Reliable Blast UDP: Predictable HighPerformance Bulk Data Transfer,” IEEE Cluster Computing 2002, Chicago, Illinois, Sept,2002.

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 65 / 66

    http://datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdfhttp://wwwlce.eng.cam.ac.uk/~ctk21/scalable/http://www.potaroo.net/ietf/all-ids/draft-floyd-tcp-highspeed-03.txthttp://www.csc.ncsu.edu/faculty/rhee/export/bitcp/http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-quickstart-02.txthttp://netlab.caltech.edu/FAST/http://netlab.caltech.edu/pub/papers/FAST-infocom2004.pdfhttp://www.anml.iu.edu/anmlresearch.html

  • Literature

    Further materials

    Workshops PFLDnet 2003–2010

    http:

    //datatag.web.cern.ch/datatag/pfldnet2003/program.html

    http://www-didc.lbl.gov/PFLDnet2004/

    http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/

    http://www.hpcc.jp/pfldnet2006/

    http://wil.cs.caltech.edu/pfldnet2007/

    prof. Sally Floyd’s pages:

    http://www.icir.org/floyd/papers.html

    RFC3426 – “General Architectural and Policy Considerations”

    http://www.hamilton.ie/net/eval/results_HI2005.pdf

    Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 66 / 66

    http://datatag.web.cern.ch/datatag/pfldnet2003/program.htmlhttp://datatag.web.cern.ch/datatag/pfldnet2003/program.htmlhttp://www-didc.lbl.gov/PFLDnet2004/http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/http://www.hpcc.jp/pfldnet2006/http://wil.cs.caltech.edu/pfldnet2007/http://www.icir.org/floyd/papers.htmlhttp://www.hamilton.ie/net/eval/results_HI2005.pdf

    Traditional TCP and its issuesImproving the traditional TCPMulti-stream TCPWeb100

    Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP

    TCP Extensions with IP SupportQuickStart, E-TCP, FAST

    Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP

    ConclusionsLiterature