60
Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single bit to transit the link And, for this simplified case, assume no queuing delay or processing delay in routers. End-to-End Delay First, note that there is no queue at each router, so packets are transmitted onto links 2, 3 and 4 with no delay. That is, as soon as last bit of each packet arrives in a router it begins transmission onto next link. For example, the first packet begins transmission onto Link 2 after its end-to-end delay on Link 1 (L/R + d/s), and finishes transmission onto Link 2 at time (L/R + d/s) + L/R. The second packet arrives at the first router at time L/R (wait time on first packet at source host) + L/R + d/s (its end-to-end delay on Link 1), or L/R + L/R + d/s, THE SAME TIME THAT THE PRIOR PACKET HAS COMPLETED TRANSMISSION onto the next link. Link 1 Link 4 Link 3 Link 2 Link 1 Link 4 Link 3 Link 2 You should convince yourself that in this simplified case, since L is the same for each packet and link characteristics R, d and s are the same for each link, that this behavior holds for each packet at each router, regardless of number of packets or links.

Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Embed Size (px)

Citation preview

Page 1: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transmission delay (per packet) = LR = amount of time to transmit each packet onto linkPropagation delay (per bit) = ds = amount of time for a single bit to transit the linkAnd for this simplified case assume no queuing delay or processing delay in routers

End-to-End Delay

First note that there is no queue at each router so packets are transmitted onto links 2 3 and 4 with no delay That is as soon as last bit of each packet arrives in a router it begins transmission onto next link

For example the first packet begins transmission onto Link 2 after its end-to-end delay on Link 1 (LR + ds) and finishes transmission onto Link 2 at time (LR + ds) + LR The second packet arrives at the first router at time LR (wait time on first packet at source host) + LR + ds (its end-to-end delay on Link 1) or LR + LR + ds THE SAME TIME THAT THE PRIOR PACKET HAS COMPLETED TRANSMISSION onto the next link

Link 1 Link 4Link 3Link 2

Link 1 Link 4Link 3Link 2

You should convince yourself that in this simplified case since L is the same for each packet and link characteristics R d and s are the same for each link that this behavior holds for each packet at each router regardless of number of packets or links

Now end-to-end delay for all 4 packets through this network is simply the time that the last bit of last packet packet 4 arrives at Host 2

End-to-End Delay

Note that packet 4 cannot begin transmission until the 3 packets ldquoin front of itrdquo have completed transmission or LR (packet 1) + LR (packet 2) + LR (packet 3) = 3 LR

Link 1 Link 4Link 3Link 2

Link 1 Link 4Link 3Link 2

Note also that by prior argument this packet proceeds through the network unimpeded by any queuing delay at the intervening routers so its total end-to-end delay is LR + ds for each link

Link 1 Link 4Link 3Link 2

Time 3LR

Time 3LR + LR+ds + LR+ds + LR+ds + LR+ds

So Total end-to-end delay = 3LR + 4LR + 4 ds Generalized for n packets and k links

(n-1) LR + k (LR + ds)

Transport Layer 3-3

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-4

TCP Overview RFCs 79311221323 2018 2581

connection-oriented ie requires setup in

end-systems before data can be exchanged

ldquohandshakingrdquo (exchange of control messages) initializes sender amp receiver states (per-connection variables) before data exchange

flow controlled sender will not

overwhelm receiver by sending data too fast

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundaries

rdquo pipelined

TCP congestion and flow control set window size

full duplex data bi-directional data flow

in same connection MSS maximum segment

size

TCP Logical End-to-End Connection

Transport Layer 3-5

socketdoor

TCP send buffer

TCP receive buffer

socketdoor

segment

application processwrites data

application processwrites data

segment

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

Transport Layer 3-6

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

URG data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINConnection mgmt

(setup teardowncommands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

32-bit wordsin header

Transport Layer 3-7

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementssequence of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementer

A SACK option possible per RFC 2018

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-8

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-9

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ldquobest practicerdquo uses TCP

timer option per RFC 1323 ignore retransmissions

SampleRTT will vary so we want estimated RTT to be ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 2: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Now end-to-end delay for all 4 packets through this network is simply the time that the last bit of last packet packet 4 arrives at Host 2

End-to-End Delay

Note that packet 4 cannot begin transmission until the 3 packets ldquoin front of itrdquo have completed transmission or LR (packet 1) + LR (packet 2) + LR (packet 3) = 3 LR

Link 1 Link 4Link 3Link 2

Link 1 Link 4Link 3Link 2

Note also that by prior argument this packet proceeds through the network unimpeded by any queuing delay at the intervening routers so its total end-to-end delay is LR + ds for each link

Link 1 Link 4Link 3Link 2

Time 3LR

Time 3LR + LR+ds + LR+ds + LR+ds + LR+ds

So Total end-to-end delay = 3LR + 4LR + 4 ds Generalized for n packets and k links

(n-1) LR + k (LR + ds)

Transport Layer 3-3

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-4

TCP Overview RFCs 79311221323 2018 2581

connection-oriented ie requires setup in

end-systems before data can be exchanged

ldquohandshakingrdquo (exchange of control messages) initializes sender amp receiver states (per-connection variables) before data exchange

flow controlled sender will not

overwhelm receiver by sending data too fast

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundaries

rdquo pipelined

TCP congestion and flow control set window size

full duplex data bi-directional data flow

in same connection MSS maximum segment

size

TCP Logical End-to-End Connection

Transport Layer 3-5

socketdoor

TCP send buffer

TCP receive buffer

socketdoor

segment

application processwrites data

application processwrites data

segment

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

Transport Layer 3-6

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

URG data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINConnection mgmt

(setup teardowncommands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

32-bit wordsin header

Transport Layer 3-7

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementssequence of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementer

A SACK option possible per RFC 2018

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-8

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-9

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ldquobest practicerdquo uses TCP

timer option per RFC 1323 ignore retransmissions

SampleRTT will vary so we want estimated RTT to be ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 3: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-3

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-4

TCP Overview RFCs 79311221323 2018 2581

connection-oriented ie requires setup in

end-systems before data can be exchanged

ldquohandshakingrdquo (exchange of control messages) initializes sender amp receiver states (per-connection variables) before data exchange

flow controlled sender will not

overwhelm receiver by sending data too fast

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundaries

rdquo pipelined

TCP congestion and flow control set window size

full duplex data bi-directional data flow

in same connection MSS maximum segment

size

TCP Logical End-to-End Connection

Transport Layer 3-5

socketdoor

TCP send buffer

TCP receive buffer

socketdoor

segment

application processwrites data

application processwrites data

segment

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

Transport Layer 3-6

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

URG data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINConnection mgmt

(setup teardowncommands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

32-bit wordsin header

Transport Layer 3-7

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementssequence of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementer

A SACK option possible per RFC 2018

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-8

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-9

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ldquobest practicerdquo uses TCP

timer option per RFC 1323 ignore retransmissions

SampleRTT will vary so we want estimated RTT to be ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 4: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-4

TCP Overview RFCs 79311221323 2018 2581

connection-oriented ie requires setup in

end-systems before data can be exchanged

ldquohandshakingrdquo (exchange of control messages) initializes sender amp receiver states (per-connection variables) before data exchange

flow controlled sender will not

overwhelm receiver by sending data too fast

point-to-point one sender one receiver

reliable in-order byte steam no ldquomessage boundaries

rdquo pipelined

TCP congestion and flow control set window size

full duplex data bi-directional data flow

in same connection MSS maximum segment

size

TCP Logical End-to-End Connection

Transport Layer 3-5

socketdoor

TCP send buffer

TCP receive buffer

socketdoor

segment

application processwrites data

application processwrites data

segment

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

Transport Layer 3-6

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

URG data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINConnection mgmt

(setup teardowncommands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

32-bit wordsin header

Transport Layer 3-7

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementssequence of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementer

A SACK option possible per RFC 2018

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-8

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-9

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ldquobest practicerdquo uses TCP

timer option per RFC 1323 ignore retransmissions

SampleRTT will vary so we want estimated RTT to be ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 5: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

TCP Logical End-to-End Connection

Transport Layer 3-5

socketdoor

TCP send buffer

TCP receive buffer

socketdoor

segment

application processwrites data

application processwrites data

segment

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

a TCP connection is point-to-point only between a single sender and a single receiver

Multicast with TCP is not possible

Transport Layer 3-6

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

URG data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINConnection mgmt

(setup teardowncommands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

32-bit wordsin header

Transport Layer 3-7

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementssequence of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementer

A SACK option possible per RFC 2018

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-8

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-9

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ldquobest practicerdquo uses TCP

timer option per RFC 1323 ignore retransmissions

SampleRTT will vary so we want estimated RTT to be ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 6: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-6

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

URG data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINConnection mgmt

(setup teardowncommands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

32-bit wordsin header

Transport Layer 3-7

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementssequence of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementer

A SACK option possible per RFC 2018

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-8

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-9

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ldquobest practicerdquo uses TCP

timer option per RFC 1323 ignore retransmissions

SampleRTT will vary so we want estimated RTT to be ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 7: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-7

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementssequence of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementer

A SACK option possible per RFC 2018

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-8

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-9

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ldquobest practicerdquo uses TCP

timer option per RFC 1323 ignore retransmissions

SampleRTT will vary so we want estimated RTT to be ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 8: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-8

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-9

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ldquobest practicerdquo uses TCP

timer option per RFC 1323 ignore retransmissions

SampleRTT will vary so we want estimated RTT to be ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 9: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-9

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ldquobest practicerdquo uses TCP

timer option per RFC 1323 ignore retransmissions

SampleRTT will vary so we want estimated RTT to be ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 10: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-10

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 11: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-11

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval(RTO) = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 12: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-12

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 13: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-13

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks single

retransmission timer retransmissions

triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 14: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-14

TCP sender eventsdata rcvd from app create segment with

seq seq is byte-

stream number of first data byte in segment

start timer if not already running think of timer as for

oldest unacked segment

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed start timer if there

are still unacked segments

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 15: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-15

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-ACKed segment with smallest seq restart timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-ACKed segments) restart timer else stop timer

ACK received with ACK field value y

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 16: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-16

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 17: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-17

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 18: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-18

TCP ACK generation [RFC 1122 RFC

2581 5681]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 19: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-19

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unACKed segment with smallest sequence

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 20: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-20

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 21: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-21

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 22: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-22

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

receiverrsquos application may remove data from

TCP socket buffer hellip

hellip slower than TCP is delivering

it to the buffer

(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 23: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size is set by

operating system via socket options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer based on available resources

sender limits amount of unACKed (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 24: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-24

TCP flow control receiver OS tracks

rwnd current size of its receive window LastByteReceived bytestream number of last byte placed in

buffer LastByteRead bytestream number of last byte read from

buffer

hellipand informs sender of its available buffer space by setting TCP header field in itrsquos acknowledgment segments as

rwnd = RcvBuffer ndash [LastByteReceived ndash LastByteRead]

sender OS tracks LastByteSent bytestream number of last byte sent to receiver LastByteACKed bytestream number of last byte acknowledged

by receiver

hellipand restricts sending rate such thatLastByteSent ndash LastByteACKed rwnd

Q What happens if receive buffer becomes full so that rwnd = 0

rwnd = 4096 ndash [120000 ndash 118000] = 4096 - 2000 = 2096

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 25: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-25

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 26: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-26

Connection Managementbefore exchanging data sender amp receiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 27: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-27

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 28: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-28

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 29: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-29

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 30: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-30

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 31: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-31

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 32: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-32

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 33: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-33

TCP connection life cycle

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 34: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-34

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 35: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-35

congestion informally ldquotoo many sources sending sending

too much too much data too fast too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queuing in router buffers) another top-10 problem

Principles of congestion control

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 36: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-36

Causescosts of congestion scenario 1

two senders two receivers

Host apps generates data at rate in

one router infinite buffers

output link capacity R no retransmission

flow control etc

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

R

Recall traffic

intensity

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 37: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-37

one router finite buffers sender retransmission of timed-out packet

application-layer input = application-layer outputin

= out

transport-layer input includes retransmissions in in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

lsquo

Causescosts of congestion scenario 2

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 38: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-38

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 39: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-39

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 40: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-40

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 41: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-41

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 42: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-42

R2

out

when sending at R2 some packets are retransmissions including duplicates that are delivered

ldquocostsrdquo of congestion more work (retrans) to compensate for lost

packets unneeded retransmissions link carries multiple

copies of packet

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 43: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-43

four senders multihop paths timeoutretransmit

Q what happens as in and in

rsquo increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host C

Host D

in original data

in original data plus

retransmitted data

A as red inrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 44: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-44

another ldquocostrdquo of congestion when packet dropped any ldquoupstreamrdquo

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

bullbuffers fill toward capacitybullpackets discardeddelayedbullsources re-transmit lost

packetsbullgood packets are resent

(ack lostdelayed)bull routers generate more

traffic to update pathsbullDelaysloads propagate

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 45: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-45

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit send rate for sender

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 46: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-46

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 47: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-47

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 48: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-48

Chapter 3 outline

31 transport-layer services

32 multiplexing and demultiplexing

33 connectionless transport UDP

34 principles of reliable data transfer

35 connection-oriented transport TCP segment structure reliable data transfer flow control connection

management36 principles of

congestion control37 TCP congestion

control

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 49: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-49

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 50: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-50

TCP Congestion Control details

sender limits transmission

cwnd is dynamic and a function of perceived network congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-yet

ACKed(ldquoin-flightrdquo)

last byte sent

cwndsender sequence number space

rate ~~cwnd

RTTbytessec

LastByteSent-LastByteAcked

lt mincwndrwnd

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 51: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-51

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS increment cwnd by 1

MSS for every ACK received

effect is doubling of cwnd size every RTT

result initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 52: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-52

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half (+3 MSS) window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) then slowstart

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 53: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-53

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 54: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-54

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 55: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-55

TCP throughput avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 56: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-56

TCP Futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms RTT

want 10 Gbps throughput requires W = 83333 in-flight segments throughput in terms of segment loss

probability L [Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 or one loss event every 5000000000 segments ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 57: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-57

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 58: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-58

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 59: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-59

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

rate R10 new app asks for 11 TCPs gets

R2

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary
Page 60: Transmission delay (per packet) = L/R = amount of time to transmit each packet onto link Propagation delay (per bit) = d/s = amount of time for a single

Transport Layer 3-60

Chapter 3 summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation implementation in the Internet UDP TCP

next leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • End-to-End Delay
  • Slide 2
  • Chapter 3 outline
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP Logical End-to-End Connection
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 8
  • TCP round trip time timeout
  • Slide 10
  • Slide 11
  • Slide 12
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 17
  • TCP ACK generation [RFC 1122 RFC 2581 5681]
  • TCP fast retransmit
  • Slide 20
  • Slide 21
  • TCP flow control
  • Slide 23
  • Slide 24
  • Slide 25
  • Connection Management
  • Agreeing to establish a connection
  • Slide 28
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 32
  • TCP connection life cycle
  • Slide 34
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Causescosts of congestion scenario 3
  • Slide 44
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 47
  • Slide 48
  • TCP congestion control additive increase multiplicative decrease
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 summary