54
TRANSPORT LAYER Dr. Nawaporn Wisitpongphan Credit: Prof. Nick McKeown http://www.stanford.edu/~nickm

T RANSPORT L AYER Dr. Nawaporn Wisitpongphan Credit: Prof. Nick McKeown nickm

Embed Size (px)

Citation preview

TRANSPORT LAYERDr. Nawaporn Wisitpongphan

Credit: Prof. Nick McKeownhttp://www.stanford.edu/~nickm

OUTLINE

The Transport Layer The UDP Protocol The TCP Protocol

TCP Characteristics TCP Connection setup TCP Segments TCP Sequence Numbers TCP Sliding Window Timeouts and Retransmission Congestion Control and Avoidance

REVIEW OF THE TRANSPORT LAYER

Nick Dave

Leland.Stanford.edu Athena.MIT.edu

Network Layer

Link Layer

Application Layer

Transport Layer

O.S. O.S.HeaderData HeaderData

HD

HD

HD

HD HD

HD

LAYERING: THE OSI MODEL

Session

Network

Link

PhysicalPhysicalPhysical

Application

Presentation

Transport

Network

Link Link

Network

Transport

Session

Presentation

Application

Network

Link

Physical

Peer-layer communication

layer-to-layer communication

Router Router

1

2

3

4

5

6

7

1

2

3

4

5

6

7

USER DATAGRAM PROTOCOL (UDP) CHARACTERISTICS

UDP is a connectionless datagram service. There is no connection establishment: packets may show up

at any time. UDP is unreliable:

No acknowledgements to indicate delivery of data. Checksums cover the header, and only optionally cover the

data. Contains no mechanism to detect missing or mis-sequenced

packets. No mechanism for automatic retransmission. No mechanism for flow control, and so can over-run the

receiver.

USER-DATAGRAM PROTOCOL (UDP)

App

App

A1 A2

App

App

B1

B2

UDP

OS

IP

UDP uses port number to demultiplex packets

Port Description

123 Network Time Protocol (NTP)

67,68 Dynamic Host Configuration Protocol (DHCP)

500 Internet Security Association Key Management Protocol (ISAKMP)

520 Routing Information Protocol

SRC port DST port

checksum

length

DATA

USER-DATAGRAM PROTOCOL (UDP)PACKET FORMAT

Why do we have UDP? It is used by applications that don’t need reliable delivery, or Applications that have their own special needs, such as streaming of real-time audio/video.

By default, only covers the

header.

TCP CHARACTERISTICS TCP is connection-oriented.

3-way handshake used for connection setup. TCP provides a stream-of-bytes service. TCP is reliable:

Acknowledgements indicate delivery of data. Checksums are used to detect corrupted data. Sequence numbers detect missing, or mis-sequenced data. Corrupted data is retransmitted after a timeout. Mis-sequenced data is re-sequenced. (Window-based) Flow control prevents over-run of receiver.

TCP uses congestion control to share network capacity among users.

HTTP AND TCP

Port Description

80 HTTP

23 Telnet

20/21

FTP(data/control)

25 Simple Mail Transfer Protocol (SMTP)

TCP IS CONNECTION-ORIENTED

Connection Setup3-way handshake

(Active)Client

(Passive)Server

Syn

Syn + Ack

Ack

Connection Close/Teardown2 x 2-way handshake

(Active)Client

(Passive)Server

Fin

(Data +) Ack

Fin

Ack

THE TCP DIAGRAM

Which path does the Active Client or Passive Server

follow?

(Active)Client

(Passive)Server

Syn

Syn + Ack

Ack

TCP CLIENT

TCP SERVER

TCP SUPPORTS A “STREAM OF BYTES” SERVICE

Byte

0B

yte

1B

yte

2B

yte

3

Byte

0B

yte

1B

yte

2B

yte

3

Host A

Host B

Byte

80

Byte

80

TCP accepts data as a constant stream from the applicationsThere are no record markers automatically inserted by TCP. Example:

If the application on one end writes 10 bytes, followed by a write of 20 bytes, followed by a write of 50 bytes, the application at the other end of the connection cannot tell what size the individual writes were. The other end may read the 80 bytes in four reads of 20 bytes at a time.

One end puts a stream of bytes into TCP and the same, identical stream of bytes appears at the other end

…WHICH IS EMULATED USING TCP “SEGMENTS”

Byte

0B

yte

1B

yte

2B

yte

3

Byte

0B

yte

1B

yte

2B

yte

3

Host A

Host B

Byte

80

TCP Data

TCP Data

Byte

80

Segment sent when:1. Segment full (MSS

bytes),2. Not full, but times out, or3. “Pushed” by application.

THE TCP SEGMENT FORMAT

IP HdrIP Data

TCP HdrTCP Data

Src port Dst port

Sequence #

Ack Sequence #

HLEN4

RSVD6

UR

GA

CK

PS

HR

ST

SYN

FIN

FlagsWindow Size

Checksum Urgent Pointer

(TCP Options)

0 15 31

TCP Data

TCP Header and Data + IP

Addresses

Src/dst port numbersand IP addresses uniquely identify

socket

SEQUENCE NUMBERSHost A

Host B

TCP Data

TCP Data

TCP HDR

TCP HDR

ISN (initial sequence number)

Sequence number = 1st

byte Ack sequence number =

next expected byte

How does ISN get chosen?

INITIAL SEQUENCE NUMBERS

Connection Setup3-way handshake

(Active)Client

(Passive)Server

Syn +ISNA

Syn + Ack +ISNB

Ack

Sequence number = 32 bitsWhat if a message has more than 232

bytes?

Sequence Number wrap-around

Solution : Timestamp Option: Sender places timestamp in every segment: Receiver copies timestamp in the ACK it sends for a segment

TCP SLIDING WINDOW

How much data can a TCP sender have outstanding in the network?

How much data should TCP retransmit when an error occurs? Just selectively repeat the missing data?

How does the TCP sender avoid over-running the receiver’s buffers?

TCP SLIDING WINDOW

Window Size

OutstandingUn-ack’d data

Data OK to send

Data not OK to send yet

Data ACK’d

Window is meaningful to the sender. Current window size is “advertised” by receiver (usually 4k – 8k Bytes when connection set-up).

TCP SLIDING WINDOW

Host A

Host BACK

Window Size

Round-trip time

(1) RTT > Window size

ACK

Window Size

Round-trip time

(2) RTT = Window size

ACK

Window Size???

TCP: RETRANSMISSION AND TIMEOUTS

Host A

Host B

ACK

Round-trip time (RTT)

ACK

Retransmission TimeOut (RTO)

Estimated RTT

Data1 Data2

Guard

Band

TCP uses an adaptive retransmission timeout value:

CongestionChanges in Routing

RTT changes frequently

TCP: RETRANSMISSION AND TIMEOUTS Picking the RTO is important:

Pick a values that’s too big and it will wait too long to retransmit a packet,

Pick a value too small, and it will unnecessarily retransmit packets.

The original algorithm for picking RTO:1. EstimatedRTTk= EstimatedRTTk-1 + (1 - ) SampleRTT2. RTO = 2 * EstimatedRTT

Characteristics of the original algorithm: Variance is assumed to be fixed. But in practice, variance increases as congestion

increases.

Determined empirically

TCP: RETRANSMISSION AND TIMEOUTS There will be some (unknown) distribution of RTTs. We are trying to estimate an RTO to minimize the probability of a false timeout.

RTT

Pro

babili

ty

mean

variance

Load(Amount of trafficarriving to router)

Avera

ge Q

ueuein

g D

ela

y

Variance grows

rapidly with load

Router queues grow when there is more traffic, until they become unstable. As load grows, variance of delay grows rapidly.

TCP: RETRANSMISSION AND TIMEOUTS

Newer Algorithm includes estimate of variance in RTT:

Difference = SampleRTT - EstimatedRTT EstimatedRTTk = EstimatedRTTk-1 + (*Difference) Deviation = Deviation + *( |Difference| - Deviation )

RTO = * EstimatedRTT + * Deviation 1 4

Same as before

TCP: RETRANSMISSION AND TIMEOUTSKARN’S ALGORITHM

Retransmission

Wrong RTT Sample

Host A Host B

Retransmission

Wrong RTT Sample

Host A Host B

Problem: How can we estimate RTT when packets are retransmitted?Solution: On retransmission, don’t update estimated RTT (and double RTO).

CONGESTION CONTROL: MAIN POINTS

Congestion is inevitable Congestion happens at different scales – from

two individual packets colliding to too many users

TCP Senders can detect congestion and reduce their sending rate by reducing the window size

TCP modifies the rate according to “Additive Increase, Multiplicative Decrease (AIMD)”.

To probe and find the initial rate, TCP uses a restart mechanism called “slow start”.

Routers slow down TCP senders by buffering packets and thus increasing delay

CONGESTIONH1

H2

R1 H3

A1(t)10Mb/s

D(t)1.5Mb/s

A2(t)100Mb/s

A1(t)

A2(t)X(t)

D(t)

A1(t)

A2(t)

D(t)

X(t)

Cumulativebytes

t

TIME SCALES OF CONGESTION

Too many users using a link during a peak hour

TCP flows filling up allavailable bandwidth

Two packets collidingat a router

7:00 8:00 9:00

1s 2s 3s

100µs 200µs 300µs

DEALING WITH CONGESTIONEXAMPLE: TWO FLOWS ARRIVING AT A ROUTER

StrategyDrop one of the flows

Buffer one flow until the other has departed, then send it

Re-Schedule one of the two flows for a later time

Ask both flows to reduce their rates

R1

?A1(t)

A2(t)

CONGESTION IS UNAVOIDABLEARGUABLY IT’S GOOD!

We use packet switching because it makes efficient use of the links. Therefore, buffers in the routers are frequently occupied.

If buffers are always empty, delay is low, but our usage of the network is low.

If buffers are always occupied, delay is high, but we are using the network more efficiently.

So how much congestion is too much?

LOAD, DELAY AND POWER

AveragePacket delay

Load

Typical behavior of queueing systems with random arrivals:

Power

Load

A simple metric of how well the network is performing:

LoadPower

Delay

“optimalload”

Burstiness tends to moveasymptote to the left

OPTIONS FOR CONGESTION CONTROL

1. Implemented by host versus network2. Reservation-based, versus feedback-based3. Window-based versus rate-based.

TCP CONGESTION CONTROL

TCP implements host-based, feedback-based, window-based congestion control.

TCP sources attempts to determine how much capacity is available

TCP sends packets, then reacts to observable events (loss).

TCP CONGESTION CONTROL TCP sources change the sending rate by modifying

the window size:Window = min{Advertized window, Congestion Window}

In other words, send at the rate of the slowest component: network or receiver.

“cwnd” follows additive increase/multiplicative decrease On receipt of Ack: cwnd += 1 On packet loss (timeout): cwnd *= 0.5

Receiver Transmitter (“cwnd”)

ADDITIVE INCREASE/ MULTIPLICATIVE DECREASE

D A D D A A D D A AD A

Src

Dest

Additive Increase: Every time the source successfully sends a cwnd’s worth of packets (each pkt sent out during the last RTT has been ACKed) add the equivalent of 1 pkt to the cwnd

Increment = MSS×(MSS/CWND) ; CWND≥MSSCWND +=Increment

LEADS TO THE TCP “SAWTOOTH”

t

Window

halved

Timeouts

Could take a long time to get started!

Multiplicative Decrease: For each timeout, the source set CWND to half of its previous value.

CWND is largeall the packets dropped will be retransmitted congestion gets worseNeed to get out of this state quickly

“SLOW START” Designed to find the fair-share rate quickly at startup. How Does it work?

1. Increase cwnd exponentially for each ACK received, until it reaches SSthreshold.

2. If cwnd < SSthreshold {Do Slow Start}, else {Do Congestion Avoidance}

3. Initial SSThreshold = large value. After the pkt lost, SSThreshold = cwnd/24. Congestion Avoidance Increase cwnd linearly

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

SLOW START

Why is it called slow-start? Because TCP originally had no congestion control mechanism. The source would just start by sending a whole advertised window’s worth of data.

FAST RETRANSMIT AND FAST RECOVERY?

Homework!!

TCP SENDING RATE

What is the sending rate of TCP? Acknowledgement for sent packet is received

after one RTT Amount of data sent until ACK is received is the

current window size W Therefore sending rate is R = W/RTT

Is the TCP sending rate saw tooth shaped as well?

TCP AND BUFFERS

TCP AND BUFFERS For TCP with a single flow over a network link with

enough buffers, RTT and W are proportional to each other Therefore the sending rate R = W/RTT is constant (and not

a sawtooth) But experiments and theory suggest that with many

flows:

Where: p is the drop probability.

TCP rate can be controlled in two ways:

1. Buffering packets and increasing the RTT

2. Dropping packets to decrease TCP’s window size

pRTTR

1

CONGESTION CONTROL IN THE INTERNET Maximum window sizes of most TCP

implementations by default are very small Windows XP: 12 packets Linux/Mac: 40 packets

Often the buffer of a link is larger than the maximum window size of TCP A typical DSL line has 200 packets worth of buffer For a TCP session, the maximum number of packets

outstanding is 40 The buffer can never fill up The router will never drop a packet

CONGESTION AVOIDANCE

TCP reacts to congestion after it takes place. The data rate changes rapidly and the system is barely stable (or is even unstable).

Can we predict when congestion is about to happen and avoid it? E.g. by detecting the knee of the curve.

AveragePacket delay

Load

CONGESTION AVOIDANCE SCHEMES

Router-based Congestion Avoidance: DECbit:

Routers explicitly notify sources about congestion. Random Early Detection (RED):

Routers implicitly notify sources by dropping packets. RED drops packets at random, and as a function of the

level of congestion.

Host-based Congestion Avoidance Source monitors changes in RTT to detect onset of

congestion.

DECBIT Each packet has a “Congestion Notification” bit called

the DECbit in its header. If any router on the path is congested, it sets the DECbit.

Set if average queue length >= 1 packet, averaged since the start of the previous busy cycle.

To notify the source, the destination copies DECbit into ACK packets.

Source adjusts rate to avoid congestion. Counts fraction of DECbits set in each window. If <50% set, increase rate additively. If >=50% set, decrease rate multiplicatively.

Time

QueueLength

at router

Averaging period

RANDOM EARLY DETECTION (RED) RED is based on DECbit, and was designed to work well with

TCP. RED implicitly notifies sender by dropping packets. Drop probability is increased as the average queue length

increases. (Geometric) moving average of the queue length is used so as

to detect long term congestion, yet allow short term bursts to arrive.

1

11

(1 )

( )(1 )i.e.

n n n

nn i

n ii

AvgLen AvgLen Length

AvgLen Length

RED DROP PROBABILITIES

A(t)D(t)

maxP

1

minTh maxThAvgLen

:

ˆ

ˆPr( )

ˆ1

I f

Drop Packet

AvgLen

AvgLen

AvgLen

minTh AvgLen maxTh

AvgLen minThp maxP

maxTh minTh

p

count p

counts how long we've been in

since we last dropped a packet. i.e. drops are spaced out in

time, reducing likelihood of re-entering slow-start.

count minTh AvgLen maxTh

PROPERTIES OF RED

Drops packets before queue is full, in the hope of reducing the rates of some flows.

Drops packet for each flow roughly in proportion to its rate.

Drops are spaced out in time. Because it uses average queue length, RED is

tolerant of bursts. Random drops hopefully desynchronize TCP

sources.

SYNCHRONIZATION OF SOURCES

Source A

A

B

C

D

RTT

N RTT

SYNCHRONIZATION OF SOURCES

Aggregate Flow f(RTT)

A

B

C

D

RTT

Avg

DESYNCHRONIZED SOURCES

Source A

A

B

C

D

RTT

N RTT

DESYNCHRONIZED SOURCES

Aggregate Flow

A

B

C

D

RTT

Avg

N RTT