142
CSCI 547 Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach , 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July 2007. A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers). They’re in PowerPoint form so you can add, modify, and delete slides (including this one) and slide content to suit your needs. They obviously represent a lot of work on our part. In return for use, we only ask the following: If you use these slides (e.g., in a class) in substantially unaltered form, that you mention their source (after all, we’d like people to use our book!) If you post any slides in substantially unaltered form on a www site, that you note that they are adapted from (or perhaps identical to) our slides, and note our copyright of this material. Thanks and enjoy! JFK/KWR

CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

  • View
    232

  • Download
    1

Embed Size (px)

Citation preview

Page 1: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-1

Chapter 3Transport Layer

Computer Networking: A Top Down Approach ,4th edition. Jim Kurose, Keith RossAddison-Wesley, July 2007.

A note on the use of these ppt slides:We’re making these slides freely available to all (faculty, students, readers). They’re in PowerPoint form so you can add, modify, and delete slides (including this one) and slide content to suit your needs. They obviously represent a lot of work on our part. In return for use, we only ask the following: If you use these slides (e.g., in a class) in substantially unaltered form, that you mention their source (after all, we’d like people to use our book!) If you post any slides in substantially unaltered form on a www site, that you note that they are adapted from (or perhaps identical to) our slides, and note our copyright of this material.

Thanks and enjoy! JFK/KWR

All material copyright 1996-2007J.F Kurose and K.W. Ross, All Rights Reserved

Page 2: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-2

OSI Protocol Suite

Read http://en.wikipedia.org/wiki/OSI_model

Page 3: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-3

TCP/IP Protocol Suite

Defined in T

CP/IP Protocol S

uiteU

nd

efine

d

Page 4: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-4

Chapter 3: Transport Layer

Our goals: understand

principles behind transport layer services: multiplexing/

demultiplexing reliable data

transfer flow control congestion control

learn about transport layer protocols (TCP & UDP): UDP: connectionless

transport TCP: connection-oriented

transport TCP congestion control

Page 5: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-5

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 6: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-6

Transport services and protocols

provide end-to-end logical communication(virtual communication) between app processes running on different hosts

transport protocols run in end systems send side: breaks app

messages into segments, passes to network layer

rcv side: reassembles segments into messages, passes to app layer

more than one transport protocol available to apps Internet: TCP and UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

TCP (or UDP) is an end-to-end protocol compared the lower layer protocols—they are hop-to-hop protocol

Page 7: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-7

Transport vs. network layer

network layer: logical communication between hosts

transport layer: logical communication between processes uses the services of

network layer and enhances

Remember the concept of “service provider” & “service user”?

Household analogy:12 kids sending letters

to 12 kids processes = kids app messages =

letters in envelopes hosts = houses transport protocol =

Ann and Bill network-layer protocol

= postal service

Page 8: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-8

Internet transport-layer protocols

reliable, in-order delivery (TCP) congestion control flow control Error recovery connection setup

unreliable, unordered delivery: UDP no-frills extension of

“best-effort” IP services not available:

delay guarantees bandwidth guarantees Why not available?

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

Page 9: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-9

A layer’s Services

In a layered model of network protocols, a layer provides a predefined set of services, e.g. connection management, message delivery, etc.

A higher layer uses the services of a lower layer

Therefore, the concept of “service user”, “service provider”

Page 10: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-10

Layer service—service user, service provider

Layer N

.

..

..

.

Layer N + 1

Layer N - 1

Layer N entity

Services to layer N + 1

Services from layer N - 1

Communicatewith peer layer Nusing Nth layerprotocol

Layered structure ofprotocols

Page 11: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-11

Connection services of any layer except phycical layer3 possible services provided to upper layer 1. Unacknowledged Connectionless Service

-Also called as Datagram (DG) service or simply Connectionless-Frames can get lost-No flow control-No error control-Like regular mail

 2. Acknowledged Connectionless Service

-No connection is established prior to data transmission-A message will be answered by a message(=Acknowledged)-Like a return receipt-Usually not used in networking

 3. Connection-oriented Service

-Also called as Virtual Circuit (VC) service-Connection management-Error recovery-Flow control-Ensures correct delivery of frames

Page 12: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-12

Page 13: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-13

Connection-oriented(VC)   vs   Conncectionless(DG) ISSUES               Connection-oriented             Connectionless-------------------------------------------------------------------------------------------------------------------Initial setup Required                                  Noand termination

Routing Routing only done on Each packet routedDecisions initial VC setup independently

Connection state Routers keep state info. Router do not hold state info.

for each connection

Need for Needed during initial setup Full address neededFull address Afterwards only VC# always

needed                                                               

Packet                        Guaranteed                              Not guaranteedSequencing                                                                                                           Error recovery            Handles error conditions          Left to a higher layer

                           

Page 14: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-14

ISSUES      Connection-oriented    Connectionless-------------------------------------------------------------------------------------------------------------Congestion control Easy Difficult

QOS Easy Difficult

Flow Control        Handles          Not done

Overhead           High                  Low

Examples:         TCP                 UDP, IP, IPX, ISO-IP

Connection-oriented(VC)   vs   Conncectionless(DG)—Cont’d

Page 15: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-15

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 16: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-16

Multiplexing/demultiplexing

application

transport

network

link

physical

P1

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv host:gets data from multiplesockets, enveloping data with header (later used for demultiplexing) and delivers

Multiplexing at send host:

P5

Page 17: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-17

How demultiplexing works

host receives IP datagrams each datagram has

source IP address, destination IP address

each datagram carries 1 transport-layer segment

each segment has source, destination port number

host uses IP addresses & port numbers to direct segment to appropriate socket

TCP/UDP segment format

IP header20 Bytes or more

UDP or TCP header

Source port # Destination port #

32 bits

Data

Page 18: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-18

Connectionless demultiplexing

Create sockets with port numbers:DatagramSocket mySocket1 = new DatagramSocket(12534);DatagramSocket mySocket2 = new DatagramSocket(12535);

UDP socket identified by two-tuple:

(Dest IP addr, Dest port #)

When host receives UDP segment: checks destination port

number in segment directs UDP segment to

socket program with that port number

IP datagrams with different source IP addresses and/or source port numbers directed to same socket

Page 19: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-19

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428);

ClientIP:B

P2

client IP: A

P1P1P3

serverIP: C

SP: 6428

DP: 9157

SP: 9157

DP: 6428

SP: 6428

DP: 5775

SP: 5775

DP: 6428

SP(Source Port #) provides “return address”

6428

Page 20: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-20

Connection-oriented demux

TCP socket connection identified by 4-tuple (unique #): source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP socket connections: each socket connection

identified by its own 4-tuple

Web servers have different sockets for each connecting client non-persistent HTTP will

have different socket connections for each request

Page 21: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-21

Connection-oriented demux (cont)

clientIP:B

P1

client IP: A

P1P2P4

serverIP: C

SP: 9157

DP: 80

SP: 9157

DP: 80

P5 P6 P3

D-IP: CS-IP: A

D-IP: C

S-IP: B

SP: 5775

DP: 80

D-IP: CS-IP: B

TCPTCP TCP

4-tuples are unique

Pn Processes—not Ports

Page 22: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-22

Connection-oriented demux: Concurrent Web Server

ClientIP:B

P1

client IP: A

P1P2

serverIP: C

SP: 9157

DP: 80

SP: 9157

DP: 80

P4 P3

D-IP: CS-IP: A

D-IP: C

S-IP: B

SP: 5775

DP: 80

D-IP: CS-IP: B

Child processe

s

Page 23: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-23

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 24: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-24

UDP: User Datagram Protocol [RFC 768]

“no frills,” “bare bones” Internet transport protocol

“best effort” service, UDP segments may be: lost delivered out of order

to app connectionless:

no handshaking between UDP sender, receiver

each UDP segment handled independently of others

Why is there a UDP? no connection

establishment—No delay simple: no connection

state at sender or receiver

small segment header no congestion control:

UDP can blast away as fast as desired but receiver may not keep the pace with sender!

Page 25: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-25

UDP: more

often used for streaming multimedia apps loss tolerant rate sensitive

other apps that use UDP DNS SNMP

achieve reliable transfer over UDP?We need to add reliability at application layer application-level error

recovery!

source port # dest port #

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength, in

bytes of UDPsegment,including

header

Page 26: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-26

Apps use TCP or UDP ?

Page 27: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-27

UDP checksum

Sender: treat segment contents

as sequence of 16-bit integers

checksum: addition (1’s complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver: compute checksum of

received segment check if computed checksum

equals checksum field value: NO - error detected YES - no error detected.

But maybe errors nonetheless? More later ….

Goal: detect “errors” in transmitted segment

Page 28: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-28

Internet Checksum Example

Note When adding numbers, a carryout from the

most significant bit needs to be added to the result

Example: add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

Checksum (1’s complement)

Page 29: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-29

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 30: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-30

Principles of Reliable data transfer

top-10 list of most important networking topics!

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Page 31: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-31

Principles of Reliable data transfer

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Page 32: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-32

Principles of Reliable data transfer

important in app., transport, link layers top-10 list of important networking topics!

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Page 33: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-33

Reliable data transfer: getting started

sendside

receiveside

rdt_send(): called from above, (e.g., by app.). Passed data to

deliver to receiver’s upper layer

udt_send(): called by rdt,to transfer packet over unreliable channel to

receiver

rdt_rcv(): called when packet arrives on rcv-side of channel

deliver_data(): called by rdt to deliver data to

upper

Page 34: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-34

Reliable data transfer: getting started

We’ll design a protocol: incrementally develop sender, receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions!

use finite state machines (FSM) to specify sender, receiver

state1

state2

event causing state transitionactions taken on state transition

state: when in this “state” next state

uniquely determined by next event

eventactions

Page 35: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-35

Rdt1.0: reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender, receiver: sender sends data into underlying channel receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packet,data)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiver

Statename Output action

Input event

State transition

Page 36: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-36

Rdt2.0: channel with bit errors

underlying channel may get hit by noise Using checksum to detect bit errors

the question: how to recover from errors: acknowledgements (ACKs): receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs): receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt2.0 (beyond rdt1.0): error detection receiver feedback: control msgs (ACK,NAK) rcvr

sender

Page 37: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-37

rdt2.0: FSM specification

Wait for call from above

sndpkt = make_pkt(data, checksum)udt_send(sndpkt)

extract(rcvpkt,data)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) && corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

&& = andDo nothing

Page 38: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-38

rdt2.0: operation with no errors

Wait for call from above

snkpkt = make_pkt(data, checksum)udt_send(sndpkt)

extract(rcvpkt,data)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) && corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

1

2

3

4

Page 39: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-39

rdt2.0: error scenario

Wait for call from above

snkpkt = make_pkt(data, checksum)udt_send(sndpkt)

extract(rcvpkt,data)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) && corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

1

2

3

4

5

6

7

8

rdt2.0 has a fatal flaw!What is it?

Page 40: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-40

rdt2.0 has a fatal flaw!

What happens if ACK/NAK corrupted?

sender doesn’t know what happened at receiver!

can’t just retransmit: possible duplicate

Handling duplicates: sender retransmits current

pkt if ACK/NAK garbled sender adds sequence

number to each pkt receiver discards (doesn’t

deliver up) duplicate pkt

Sender sends one packet, then waits for receiver response

stop and wait

Page 41: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-41

rdt2.1: sender, handles garbled ACK/NAKs

Wait for call 0 from

above

sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) )

sndpkt = make_pkt(1, data, checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) )

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

&& = “and”|| = “or”

Page 42: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-42

rdt2.1: receiver, handles garbled ACK/NAKs

Wait for 0 from below

sndpkt = make_pkt(NAK, chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq0(rcvpkt)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)

extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt)

extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) && (corrupt(rcvpkt)

sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq1(rcvpkt)

rdt_rcv(rcvpkt) && (corrupt(rcvpkt)

sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK, chksum)udt_send(sndpkt)

Page 43: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-43

rdt2.1: discussion

Sender: seq # added to pkt two seq. #’s (0,1)

will suffice. Why? must check if

received ACK/NAK is corrupted

twice as many states state must

“remember” whether “current” pkt has 0 or 1 seq. #

Receiver: must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq #

note: receiver can not know if its last ACK/NAK was received correctly at sender

Page 44: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-44

rdt2.2: a NAK-free protocol

same functionality as rdt2.1, but using ACKs only instead of NAK, receiver sends ACK for last pkt

received OK receiver must explicitly include seq # of pkt being

ACKed

duplicate ACKs at sender results in same action as NAK: retransmit current pkt

Page 45: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-45

rdt2.2: sender, receiver fragments—NAK-free

Wait for call 0 from

above

sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,1) )

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)

extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK1, chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Page 46: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-46

rdt3.0: channels with errors and loss

New assumption: underlying channel can also lose packets (data or ACKs is completely lost—does not arrive at destination) This can happen when

entire packet is hit by noise or when the beginning of a packet is not recognized by receiver (synchronization bits lost)

checksum, seq. #, ACKs, retransmissions will be of help, but not enough

Approach: sender waits “reasonable” amount of time for ACK

retransmits if no ACK received within the time limit--timeout

if pkt (or ACK) just delayed (not lost): retransmission will be

duplicate, but use of seq. #’s already handles this

receiver must specify seq # of pkt being ACKed

requires countdown timer

Page 47: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-47

Ethernet frame format

If these bits are hit by noise, a receiver will not recognize a frame—so, the frame is completely lost !

Page 48: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-48

rdt3.0 sender

sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,1) )

Wait for call 1 from

above

sndpkt = make_pkt(1, data, checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0)

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,0) )

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,1)

stop_timer

stop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0 from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

Page 49: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-49

rdt3.0 in action

Page 50: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-50

rdt3.0 in action

Page 51: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-51

Performance of rdt3.0

rdt3.0 works, but performance stinks example: 1 Gbps link, 15 ms end-end prop. delay, 1KB packet:

Ttransmit

= 8kb/pkt10**9 b/sec

= 8 microsec

U sender: utilization -- fraction of time sender busy sending

U sender

= .008

30.008 = 0.00027

microseconds

L / R

RTT + L / R =

L (packet length in bits)R (transmission rate, bps)

=

1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link network protocol limits use of physical resources!

=Time taken for transmission of data

Total timeutilizatio

n

0.27%==

Page 52: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-52

rdt3.0: stop-and-wait protocol

first packet bit transmitted, t = 0

sender receiver

RTT

last packet bit transmitted, t = L / R

first packet bit arriveslast packet bit arrives, send ACK

ACK arrives, send next packet, t = RTT + L / R

U sender

= .008

30.008 = 0.00027

microseconds

L / R

RTT + L / R =

Assumption: The size of ACK is very small—so, ignore the time taken to transmit ACK

Utilization

Page 53: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-53

To increase Utilization ?

=Time taken for transmission of data

Total timeutilizatio

n

=RTT + L / R

=

L / R

D / S + L / R

=Distance

Speed of signalRTT

L / R

L / R

=

L / R

1

1

D / S

L / R+

We want utilization as high as possible---Maximum utilization possible is 1 (100%)

To achieve a closer to the max,

we need to make

D / S

L / R

very small, but D, S, R are fixedTherefore, what should we do ?

Total Distance

Make this as small as possible

Page 54: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-54

We should increase “L” --Length of packet! What is disadvantage having long packet?

To increase Utilization ?

Higher probability of error !

Also, increased buffer size

Page 55: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-55

Dilemma

To achieve high utilization Longer packet Longer packet higher probability of error Two conflicting aspects ! Solution is “windowing (pipelining)” Main idea of windowing is: Rather than

sending a long packet, send many small packets

Page 56: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-56

3 ARQ(Automatic Repeat Request) protocols

To cope with the problems of lost packets, error packets(either data packets or ACK packets), 3 popular protocols exist:

1) Stop-and-Wait ARQ—rdt 3.02) Go-Back-N ARQ3) Selective Repeat ARQ

2) & 3) are sometimes called as “pipelined protocol” or “windowing”

1) can be classified as windowing with windows size of “1”

Page 57: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-57

Pipelined protocols

Pipelining: sender made capable of multiple, “in-flight”, yet-to-be-acknowledged pkts—send several packets that are numbered— P1, P3, P3, … range of sequence numbers must be bigger than

[0, 1] (as in stop-and-wait) buffering required at sender and/or receiver

Two generic forms of pipelined protocols: go-Back-N, selective repeat

ACK packet

Page 58: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-58

Pipelining: increased utilization

first packet bit transmitted, t = 0

sender receiver

RTT

last bit transmitted, t = L / R

first packet bit arriveslast packet bit arrives, send ACK

ACK arrives, send next packet, t = RTT + L / R

last bit of 2nd packet arrives, send ACKlast bit of 3rd packet arrives, send ACK

U sender

= .024

30.008 = 0.0008

microseconds

3 * L / R

RTT + L / R =

Increase utilizationby a factor of 3!

example: 1 Gbps link, 15 ms end-end prop. delay, 1KB packet

Page 59: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-59

Go-Back-NSender: k-bit seq # in pkt header “window” of up to N, consecutive unack’ed pkts allowed

ACK(n): ACKs all pkts up to, including seq # n -- “cumulative ACK” may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n): retransmit pkt n and all higher seq # pkts in window

In actual protocols, the ACK(n) is used as “the next expected packet #”. In other words, ACK(n) says “I received all packets up to n-1 and I am expecting the “packet n” as the next packet

Page 60: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-60

Frame format of HDLC

Sequence number

ACK = next expected packet number

N(s)=0, N(R)=0

N(s)=0, N(R)

=1

N(s)=1, N(R)=1

Page 61: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-61

TCP Packet format

Byte numbers

Page 62: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-62

Go-Back-N: sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])…udt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ }else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) && corrupt(rcvpkt)

Page 63: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-63

GBN: receiver extended FSM

ACK-only: always send ACK for correctly-received pkt with highest in-order seq # may generate duplicate ACKs need only remember expectedseqnum

When out-of-order pkt received: discard (don’t buffer) -> no receiver buffering! Re-ACK pkt with highest in-order seq #

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) && notcurrupt(rcvpkt) && hasseqnum(rcvpkt,expectedseqnum)

extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(expectedseqnum,ACK,chksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnum,ACK,chksum)

initially

Page 64: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-64

Go-Back-N in action

What are buffer sizes needed?

For sender = ? For receiver = ?

Page 65: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-65

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts, as needed, for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq #’s=window size limits # of sent but unACKed pkts

Page 66: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-66

Selective repeat: sender & receiver windows

Page 67: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-67

Selective repeat

data from above : if next available seq # in

window, send pkt & start timer for the packet

timeout(n): resend pkt n, restart timer

ACK(n) came in & in the range [sendbase,sendbase+N-1]:

mark pkt n as “done” if n is smallest unACKed

pkt, advance window base to next unACKed seq #

senderpkt n came in & in the

range [rcvbase, rcvbase+N-1]

send ACK(n) out-of-order: buffer it in-order: deliver (also

deliver buffered, in-order pkts), advance window to next not-yet-received pkt

pkt n came in & in the range [rcvbase-N,rcvbase-1]

send ACK(n)

otherwise: ignore

receiver

Page 68: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-68

Selective repeat in action Windows size = 4

Windows advancesWindows advances

Windows advances

Page 69: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-69

Selective repeat: dilemma

Example: seq #’s: 0, 1, 2, 3 using

2 bits for seq# window size=3

receiver sees no difference in two scenarios!

incorrectly passes duplicate data as new in (a)

Q: What is the relationship between seq # size and window size?

A: Window size = N/2 where N = size of sequence numbers---ex. 0,1,2,3 N=4

Page 70: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-70

TCP implementations on Go-Back-N or Selective-Repeat? Hybrid of Go-Back-N & Selective-Repeat Originally, TCP was Go-Back-N but later SACK option is

added

Transmission Control Protocol, Src Port: 1459 (1459), Dst Port: ftp (21), Seq: 276805644, Len: 0 Source port: 1459 (1459) Destination port: ftp (21) Sequence number: 276805644 Header length: 28 bytes Flags: 0x0002 (SYN) Window size: 65535 Checksum: 0x0637 [correct] Options: (8 bytes) Maximum segment size: 1260 bytes NOP NOP SACK permitted

See RFC 2018

Tcp initialization packet captured using Ethereal

Page 71: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-71

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 72: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-72

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

full duplex data: bi-directional data flow

in a connection MSS: maximum

segment size set by MTU of link layer

connection-oriented: Initialization Maintenance Termination

flow controlled: Receiver controls the

flow so that sender will not overwhelm receiver

point-to-point: one sender, one receiver

(unicast) reliable, in-order byte

steam: no “message

boundaries” pipelined:

TCP congestion and flow control set window size

send & receive buffers needed for windowing

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applica tionwrites data

applica tionreads data

Seung Bae Im
Read Textbook page 230
Page 73: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-73

TCP segment structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data ptrchecksum

FSRPAUheadleng

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #is valid or not

PSH: push data now(generally not used)

RST, SYN, FIN:Used for connection management(setup, teardown, or reset connection)

# of bytes rcvr is willingto accept

Byte numbering of data(not packets or segments!)TCP is called “byte-Transfer protocol”

Checksum for header & data(as in UDP)

When U bit is set, then byte position Where urgent message starts

Page 74: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-74

TCP seq. #’s and ACKs

Seq. #’s: byte stream

“number” of first byte in segment’s data

ACKs(piggybacked ACKs): seq # of next byte

expected from other side

cumulative ACKQ: how receiver handles

out-of-order segments A: TCP spec doesn’t

say, - up to implementer (mostly buffer it—in Selective-Repeat mode)

Host A Host B

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of

‘C’, echoesback ‘C’

timesimple telnet scenario

Actual sequence numbers looks something like 1522036564

For each connection, a pseudorandom number is generated for the initial sequence number. Here we use relative sequence numbers

Page 75: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-75

TCP Round Trip Time and Timeout

Q: how to set TCP timeout value?

longer than RTT but RTT changes

dynamically If too short premature timeout

unnecessary retransmissions

If too long slow reaction to segment loss—less efficient

Q: how to estimate RTT? SampleRTT: measured time

from segment transmission until ACK receipt ignore retransmissions—

the RTT for retransmitted packet is not considered in the calculation of SampleRTT—why?

SampleRTT will vary, want estimated RTT “smoother” average several recent

measurements, not just current SampleRTT

Page 76: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-76

Timeout value

time

A B A Bpkt0

ack1

pkt0Duplicate

ignored

timeout

RTT

Timeout too short results in Premature

timeout

Unnecessary transmission Wasted bandwidth

RTT

timeout

pkt0

ack1X

lost

pkt0

Timeout too long

Inefficient usage of

bandwidth

Page 77: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-77

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

In statistics, it is called “Exponential weighted moving average” The influence of past sample decreases exponentially fast 0<= < 1 typical value: = 0.125

Example: EstimatedRTT = 250ms, SampleRTT = 70ms, = 0.125

EstimatedRTT = (1 - 0.125)*250 + 0.125*70 = 218.75 + 8.75 = 227.5ms

Current sampled value

Give 87.5 % of weight to the current EstimatedRTT

Give 12.5% of weight to the current SampleRTT

Page 78: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-78

Example RTT estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Page 79: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-79

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus “safety margin” –- a measure of variability

large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT:

TimeoutInterval = EstimatedRTT + 4*DevRTT

DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|

(typically, = 0.25)

Then set timeout interval:

For more detail: ftp://ftp.rfc-editor.org/in-notes/rfc2988.txt

Page 80: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-80

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 81: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-81

TCP reliable data transfer

TCP creates rdt service on top of IP’s unreliable service

Pipelined segments Cumulative acks—on

Windows systems, ack is sent for every other received packets to reduce the number of packets on the network

TCP implementations use single retransmission timer rather than one timer per packet as assumed in previous slides

Retransmissions are triggered by: timeout events duplicate acks

Let’s initially consider a simplified TCP sender: ignore duplicate acks ignore flow control,

congestion control

Seung Bae Im
Copied from http://support.microsoft.com/kb/328890/ As specified in RFC 1122, TCP uses delayed acknowledgments to reduce the number of packets that are sent on the media. Instead of sending an acknowledgment for each TCP segment received, TCP in Windows 2000 and later takes a common approach to implementing delayed acknowledgments. As data is received by TCP on a particular connection, it sends an acknowledgment back only if one of the following conditions is true: • No acknowledgment was sent for the previous segment received. • A segment is received, but no other segment arrives within 200 milliseconds for that connection. Typically, an acknowledgment is sent for every other TCP segment that is received on a connection unless the delayed ACK timer (200 milliseconds) expires. You can adjust the delayed ACK timer by editing the following registry entry.
Page 82: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-82

From RFC 2988The following is the RECOMMENDED algorithm for managing the

retransmission timer: (5.1) Every time a packet containing data is sent (including a

retransmission), if the timer is not running, start it running so that it will expire after RTO seconds (for the current value of RTO).

(5.2) When all outstanding data has been acknowledged, turn off the retransmission timer.

(5.3) When an ACK is received that acknowledges new data, restart the retransmission timer so that it will expire after RTO seconds (for the current value of RTO). When the retransmission timer expires, do the following:

(5.4) Retransmit the earliest segment that has not been acknowledged by the TCP receiver.

(5.5) The host MUST set RTO <- RTO * 2 ("back off the timer"). The maximum value discussed in (2.5) above may be used to provide an upper bound to this doubling operation.

(5.6) Start the retransmission timer, such that it expires after RTO seconds (for the value of RTO after the doubling operation outlined in 5.5).

In summary, when an ACK received, calculate RTO(using earlier formula (TimeoutInterval = EstimatedRTT + 4*DevRTT) and restart the timer

When a time out occurs, RTO 2*RTO and restart the timer

Here, we have only one timer that is managed by TCP

Page 83: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-83

TCP sender events:

data rcvd from app: Create segment with

seq # seq # is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval: TimeOutInterval

timeout: retransmit segment

that caused timeout restart timer Ack rcvd: If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Page 84: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-84

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) { switch(event)

event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer }

} /* end of loop forever */

Comment:• SendBase-1: last cumulatively ack’ed byteExample:• SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sothat new data is acked

Advances window

Restarts timer

Page 85: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-85

TCP: retransmission scenarios

Host A

Seq=100, 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92, 8 bytes data

ACK=120

Seq=92, 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92, 8 bytes data

ACK=100

timeSeq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Page 86: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-86

TCP retransmission scenarios (more)

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100, 20 bytes data

ACK=120

time

SendBase= 120

Host A

Seq=92, 8 bytes data

tim

eout

On Windows systems

Host B

Seq=100, 20 bytes data

ACK=120

time

SendBase= 120

Page 87: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-87

TCP ACK generation [RFC 1122, RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed

Arrival of in-order segment withexpected seq #. One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK

Immediately send single cumulative ACK, ACKing both in-order segments

Immediately send duplicate ACK, indicating seq. # of next expected byte

Immediately send ACK, provided thatsegment starts at lower end of gap

200 ms for Windows systems

Page 88: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-88

Fast Retransmit

Time-out period often relatively long: long delay before

resending lost packet Detect lost segments

via duplicate ACKs. Sender often sends

many segments back-to-back

If segment is lost, there will likely be many duplicate ACKs.

If sender receives 3 duplicated ACKs (for the same data), it assumes that segments after the ACKed data was lost and it fast retransmit: resend

segment before timer expires

Look at http://www.speedguide.net/read_articles.php?id=157 for adjusting TCP/IP parameters on Windows systems

Page 89: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-89

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y }

Fast retransmit algorithm:

a duplicate ACK for already ACKed segment

fast retransmit

For information about TCP implementation on Windows systems, visit

http://support.microsoft.com/kb/224829/EN-US/

Page 90: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-90

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 91: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-91

TCP Flow Control

receive side of TCP connection has a receive buffer:

speed-matching service: matching the send rate to the receiving app’s drain rate app process may be

slow at reading from buffer

sender won’t overflowreceiver’s buffer bytransmitting too much, too fast—usually controlled by receiver

flow control

Page 92: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-92

TCP Flow control: how it works

(Suppose TCP receiver discards out-of-order segments in above picture—but actually the out-of-order segments should be discounted from the spare room)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd – LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesn’t overflow Receiver throttles sender

by advertising a window size no larger than the amount it can buffer.

read by application

Page 93: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-93

TCP Flow control: how it works—by credit scheme

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data ptrchecksum

FSRPAUheadleng

notused

Options (variable length)

TCP Rcvr advertises spare roomRcvWindow=RcvBuffer-[LastByteRcvd – LastByteRead]

For example: 65535 (64 kbytes)Sender is limited to having no more thanRcvWindow bytes of unACKed dataat any time.

TCP flow control is sometimes called as “credit” scheme since receiver gives “credit” to sender (to “spend”)

Page 94: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-94

Credit=Window size

Page 95: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-95

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 96: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-96

TCP Connection Management

Recall: TCP sender, receiver establish “connection” before exchanging data segments

initialize TCP variables: seq. #s buffers, flow control info

(e.g. RcvWindow) client: connection initiator Socket clientSocket = new

Socket("hostname","port

number"); server: contacted by client Socket connectionSocket =

welcomeSocket.accept();

Three way handshake:

Step 1: client host sends TCP SYN segment to server specifies initial seq # no data sent here

Step 2: server host receives SYN, replies with SYNACK segment

server allocates buffers specifies server initial

seq. #Step 3: client receives

SYNACK, replies with ACK segment, which may contain data

Page 97: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-97

Host A

Window 4096, <mss 1024>

tim

eout

3-way handshaketo initialize connection

Host B

ACK236

time

SYN, Seq=92

SYN, Seq=235

ACK93, window 65536, <mss 500>

Window 4096

Connection request(SYN=1)

Gives credit(window 4096) of 4096 bytes, also sets mss(max segment size) to 1024

Connection request(SYN=1) for reverse direction

Acknowledges connection request by ACK93,

Gives credit(window 65536) of 65536 bytes, also sets mss(max segment size) to 500

Acknowledges connection for reverse connection by ACK236

Tells “you still have credit of 4096 bytes”

3-way handshake for TCP connection initialization

1

2

3

Page 98: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-98

Host A

Window 4096, <mss 1024>

tim

eout

3-way handshaketo initialize connection

Host B

ACK=236

time

SYN, Seq=92

SYN, Seq=235

ACK 93, window 65536, <mss 1024>

Window 4096

Why 3-way handshake?

First: We need to establish Full-duplex connections (both ways)

Second: To avoid the case of duplicated connections as shown in next slide

Page 99: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-99

Why 3-way handshake? Recovery from old Duplicated SYN

Host A

Window 4096, <mss 1024>

tim

eout

Host B

time

SYN, Seq=92

SYN, Seq=93Window 4096, <mss 1024>

SYN, Seq=235

ACK 93, window 65536, <mss 1024>

SYN, Seq=236

ACK 94, window 65536, <mss 1024>

Host B keeps the connection open and waits for data

Host A reopens(new) connection but Host B thinks it is a new connection and accepts it

RST, Seq=92

Host B aborts the connection

Page 100: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-100

TCP connection termination

Goal: Both sides agree to close connection Two-army problem:“Two blue armies are separated by a valley where white army is. Two

blue armies must attack simultaneously to defeat the white army. The only communication is sending birds which can be lost.”

Can you design a protocol that ensures the attack by blue armies?

Blue army

Blue army

White army

Page 101: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-101

Two army protocolLet’s

attack

Yes

Blue army

Blue army

White army

Sure?

Sure!

What is the problem?

Page 102: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-102

TCP connection termination—4-way handshake

Client Server

Client State

ESTABLISHED

FIN-WAIT-2

TIME-WAIT

FIN-WAIT-1

CLOSED

Wait for Double Maximum Segment Life (MSL) time

Receive FIN, Send ACK

Wait for Server FIN

Receive Close signal from App,

Send FIN

Wait for ACK and FIN from Server

Receive ACK

Server State

ESTABLISHED

LAST-ACK

CLOSE-WAIT

CLOSED

App is ready to Close, Send FIN

Normal Operation

Receive FIN, Send ACK, Tell App to Close

(Wait for App)

Wait for ACK to FIN

Receive ACK

FIN

#1

ACK

#2

FIN

#1

ACK

#2

Even when the 2nd ACK is lost, the connection will be closed

Page 103: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-103

TCP state transition diagram

CLOSED

LISTEN

passive open

ESTABLISHED

closeAppl:passive open / send:<nothing>

SYN_RCVD

rcv:RST /

send:<noting>

rcv:SYN,ACK /

send:ACK

timeout / send:RST

SYN_SENT

Appl: send data /

Send:SYN

Appl:close or timeout /

resetactive open

rcv:SYN; / send:SYN,ACK

rcv:ACK /Send: <nothing>

data transfer state

LAST_ACK

CLOSE_WAIT

CLOSINGFIN_WAIT_1

TIME_WAITFIN_WAIT_2

rcv: FIN /

send: ACK

AppL:close / send:FIN

rcv: SYN /

send:SYN, ACKsimultaneous open

rcv: ACK /

send: <nothing>

2MSL timeout

rcv: ACK /

send: <nothing>

rcv: FIN /

send: ACK

rcv: FIN /

send: ACK

rcv:FIN,ACK /send:ACK

Appl:close /

send:FIN

active close

Client

Server

passive close

Appl: close /

send: FIN

rcv: ACK /

send: <nothing>

Appl: active open /

send: SYN

MSL (Maximum Segment Lifetime) 2 minutes recommended—typically 30 seconds used

Timeout after 2 MSL

normal transition for client

normal transition for server

Appl -- state transition take when application issues operation

Page 104: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-104

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 105: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-105

Congestion Control: Congestion control vs Flow control

Congestion: informally: “too many sources sending too much data too fast for network to handle” In flow control, the sender adjusts its transmission rate so as not to overwhelm the receiver

o One end is sending data too fast for a receiving end to handle In congestion control the sender(s) adjust their transmission rate so as not to overwhelm routers in the network

o Many sources independently work to avoid sending too muchdata too fast for the network to handle

Symptoms of congestion:o Lost packets (buffer overflow at routers)o Long delays (queuing in router buffers)

One of the top problems of networking!

Page 106: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-106

Causes& Effects of Congestion:scenario 1: Two equal-rate senders share a single link

Two sources send (each at rate in) as fast as possible

to two receivers across a shared link with capacity Ro Data is delivered to the application at the receiver

at rate out Packets queue at the router

o Assume the router has infinite storage capacity (Thus no packets are lost and there are no retransmissions)

Page 107: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-107

Causes& Effects of Congestion:scenario 1: Two equal-rate senders share a single link

The maximum achievable per connection throughput is constrained by 1/2 the capacity of the shared link Exponentially large delays are experienced when the router becomes congested

o The queue grows without bound but packets are delivered with long delay—packets are not lost

Packets queued

Page 108: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-108

Causes & Effects of Congestionscenario 2: Finite capacity router queue

in

in

retransmit= +

Senders assume packets can now be losto Sender retransmits upon detection of loss

Define offered load(load coming in) as the original transmissions plus retransmissions

(For each sender)

Page 109: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-109

Causes& Effects of Congestionscenario 2: Finite capacity router queue—Throughput analysis

out

out

inR/2

R/4

R/2

R/3

Ideal throughput ( )

Perfect retransmissions

Premature retransmissions

(Each segment transmitted twice)

Premature retransmissions plus loss

in =

in

in<( )

in

in

= )(

“Effects” of congestion: Sender must retransmit to compensate for dropped packets unneeded retransmissions: link carries multiple copies of packet

Throughput

2/27/2009Assume Host is able to somehow (magically) determine whether or not a buffer is free in the router and thus sends a packet only when a buffer is free. Therefore, there is no retransmission.

Sender retransmitts when it is sure that a packet is lost. By setting timeout to a large enough value.

Sender may time out prematurely and retransmit a packet that has been delayed in the queue but not lost. In this case, both the original packet and the retransmission may both reach the receiver. In effect, each packet is sent twice.

Page 110: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-110

Causes & Effects of Congestionscenario 3: Four equal-rate senders share multiple hops

Assuming: Each source’s

traffic transits two routers

Routers have finite, same # of buffers

All links have the same capacity

Senders timeout and retransmit lost packets

in

Q: what happens as and increase ?

in

retransmitted

original

Page 111: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-111

Causes & Effects of Congestionscenario 3:Four equal-rate senders share multiple hops

Throughput increases linearly as the network remains underloaded At the saturation point loss starts to occur Once loss occurs the offered load increases—for both original & retransmitted Loss rates increase… all packets are retransmitted ones spiraling effect

Page 112: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-112

Causes & Effects of Congestionscenario 3:Four equal-rate senders share multiple hops—Throughput analysis

out

Congestion collapse All links are fully utilized but no data is delivered—all traffic

will be retransmissions of retransmissions!

R/2

R/2 in

Throughput

Why R/2? in from host + in in transit

in

Page 113: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-113

Causes & Effects of CongestionSummary

Uncontrolled, congestion can lead to dropped packetso This means that

bandwidth used delivering packets to the point of congestion was wasted

In the limit, it can lead to network collapse The network is fully busy

but no work gets done All packets in the

network are retransmissions

Page 114: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-114

Approaches towards congestion control: End-to-end vs. Hop-by-hop

End-end congestion control:

End-systems receive no feedback from network

congestion inferred by observing loss and/or delay by end systems

approach taken by TCP

Network-assisted congestion control (Hop-by-hop):

routers provide feedback to end systems Network determines an

explicit rate that a sender should transmit

Network signals congestion by setting a single bit in a packet (SNA, DECbit, TCP/IP ECN, ATM)

Two broad approaches towards congestion control:

Page 115: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-115

Explicit Congestion Notification (ECN)

From Wikipedia, the free encyclopedia

Explicit Congestion Notification (ECN) is an extension to the Internet Protocol and is defined in RFC 3168 (2001). ECN allows end-to-end notification of network congestion without dropping packets. It is an optional feature, and is only used when both endpoints signal that they want to use it.

Traditionally, TCP/IP networks signal congestion by dropping packets. When ECN is successfully negotiated, an ECN-aware router may set a bit in the IP header instead of dropping a packet in order to signal the beginning of congestion. The receiver of the packet echoes the congestion indication to the sender, which must react as though a packet drop were detected.

ECN uses two bits in the Differentiated Services field in the IP header, in the IPv4 TOS(Type Of Service) Byte or the IPv6 Traffic Class Octet. These two bits can be used to encode one of the values ECN-unaware transport, ECN-aware transport or congestion experienced.

Page 116: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-116

Example of Congestion ControlDigression: Asynchronous Transfer Mode (ATM) networks

ATM is a standard for B-ISDN (Broadband Integrated Service Digital Network) networks

o Operates at speeds from 155 Mbps to multi-Gbpso Employs packet-switching of fixed length packets (“cells”) using virtual circuits—54 byte cells

B-ISDN provides integrated end-to-end transport of data, real-time digital voice, and video

o Designed to meet the “quality-of-service” requirements of voice/video applications

ATM is scalable so that it can be used for LANs, MANs, or WANsATM used in the Internet today by ISPs as a WAN (Wide Area Network) technology but not very successful in LAN deployment

Page 117: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-117

A sample of Internet backbone using ATMFrom: http://www.nthelp.com/maps.htm

Page 118: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-118

ATM Protocol Architecture

Page 119: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-119

Hop-by-Hop Congestion Control of Asynchronous Transfer Mode (ATM) networks

“Integrated services” implies multiple service models

o As opposed to the Internet’s single “best-effort” model

ATM service models (levels of QOS):o Constant bit-rate (CBR) — Guaranteed throughput, end-to-end delay, delay-variation, and loss rate boundso Variable bit-rate (VBR) — Just like CBR except the sender is assumed to generate irregular traffico Available bit-rate (ABR) — Minimum guaranteed transmission rate, congestion notificationo Unspecified bit-rate (UBR) — Guaranteed in-order delivery

Page 120: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-120

Hop-by-Hop Congestion ControlExample: ATM ABR congestion control

ABR is an “elastic service”o If the sender’s path is “underloaded” then the sender can use the available bandwidtho If the sender’s path is congested then the sender is throttled back to its minimum guaranteed rate

An ABR sender periodically generates Resource Management (RM) packets Bits in RM packets are set by switches depending on the level of congestion (“network-assisted”)

o NI bit: No increase in rate (mild congestion)o CI bit: Congestion indication

RM packets are returned to the sender by receiver with bits intact

Page 121: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-121

Hop-by-Hop Congestion ControlATM ABR congestion control

RM packets contain a two byte explicit rate (ER) field

o Congested switches decrement the ER valueo RM packets arriving at the receiver contain the minimum supportable transmission rate for the path

Data packets contain an “EFCI” bit which can be set by a congested switch--Explicit Forward Congestion Indication

o If the data packet preceding an RM packet has EFCI set, then the receiver sets the CI bit in the RM packet before returning itBut these kinds of congestion control is possible only on a connection-oriented network service

Page 122: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-122

Chapter 3 outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Page 123: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-123

TCP Congestion Control

TCP must use end-to-end congestion control since IP provides no explicit feedback about network congestion

Approach taken by TCP: To have each sender limit the rate calculated as a function of perceived network congestion

3 Questions:o When & How to limit the rate?o How to perceive congestion?o What algorithm for changing the rate?

Page 124: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-124

TCP Congestion Control: details

sender limits transmission to keep the following condition:

LastByteSent-LastByteAcked

MIN (CongWin, RcvWindow) Roughly,

CongWin = w x MSS Bytes CongWin is dynamic, function

of perceived network congestion

Max rate = CongWin

RTT Bytes/sec

Data flow

W is a variable

Page 125: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-125

TCP Congestion Control: details

How does sender perceive congestion?

In TCP, when a loss event is detected, TCP perceives a “congestion”

loss event = timeout or 3 duplicate ACKs perceived network congestion

TCP sender reduces rate (CongWin) after a loss event In Flow control, receiver does the control In Congestion control, sender does the control

Reduces how much? Depends on the versions—TCP Tahoe, TCP Reno

Page 126: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-126

TCP Congestion Control: details

TCP congestion control algorithms:

AIMD (Additive-Increase, Multiplicative-Decrease)

Slow start Reaction to loss events

Page 127: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-127

TCP congestion control: AIMD(Additive Increase, Multiplicative Decrease)

Approach: increase transmission rate (congestion window size) gradually(linear), probing for usable bandwidth, until loss occurs additive increase: increase CongWin by 1 MSS every RTT

until loss detected or reaches RcvWindow multiplicative decrease: cut CongWin in half after loss—

why so much? Because effect of congestion is exponential—see slide 113

Saw toothbehavior: probing

for bandwidth

Page 128: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-128

TCP congestion control:

The congestion window grows in two phases:

oSlow start — Start with small window but ramp up transmission rate until loss occurso Congestion avoidance — After the congestion window grows over a threshold, increase the congestion window cautiously to avoid congestion

Page 129: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-129

TCP Slow Start—TCP starts slow(1 MSS)

When connection begins, CongWin typically initialized to 1 MSS—so, slow start

Example: MSS = 500 bytes & RTT = 200 msecinitial rate = 25 kbps

available bandwidth may be much larger than MSS/RTT

desirable to quickly ramp up to respectable rate—so, we need quicker than linear increase

Max rate = CongWin

RTT Bytes/sec

Slow start actually means, “Slow start with quick increase(exponential)”

Page 130: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-130

TCP Slow Start (more)

When connection begins, start with 1 MSS and increase rate exponentially until first loss event: double CongWin every RTT done by incrementing CongWin for every ACK received

Continue until a loss event (timeout or 3 dupAcks) or reaches RcvWindows, then go to AIMD(Additive Increase, Multiplicative Decrease)

Summary: initial rate is slow (“slow start”) but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Page 131: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-131

TCP congestion control: Actual implementationFor a loss detected by receiving 3 duplicated ACKs or by reaching RcvWindow, CongWin is cut in halfFor a timeout, TCP goes back to “slow start” mode (CongWin = 1 MSS), then exponentially grow until reaching ½ of previous CongWin, then grow linearlyNote: TCP implementations handles loss differentlyoTCP “Tahoe”: Timeout & 3 dup ACKs are treated same—older version of TCPoTCP “ Reno”: Timeout & three duplicate ACKs treated differently—most implementations currentlyFor more details, see Table 3.3

Page 132: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-132

Refinement

Q: When should the exponential increase switch to linear?

A: When CongWin gets to 1/2 of its value before a loss.

Implementation: Variable Threshold At loss event, Threshold

is set to 1/2 of CongWin just before loss event

Threshold set to 8 initially

Above diagram shows the case for 3-dup-ACKS received

For Tahoe, CongWin is set to 1—For Tahoe, it does not distinguish between Timeout and 3-dup-ACKS

For Reno, CongWin is set to half in case of 3-dup-ACKS but CongWin is set to 1 in case of Timeout

Page 133: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-133

Refinement: inferring loss—actual implementation After 3 dup ACKs:

CongWin is cut in half (only for Reno) but send the packet again—”Fast retransmit”

window then grows linearly

But after Timeout event: CongWin set to 1 MSS

—both Tahoe & Reno window then grows

exponentially until reaches to a threshold (half point), then grows linearly

3 dup ACKs indicates

network capable of delivering some segments timeout indicates a “more alarming” congestion scenario

Philosophy:

Page 134: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-134

Summary: TCP Congestion Control

Note: TCP implementations detect loss differently TCP “Tahoe”: Timeout & 3-dup-ACKS—older version of TCP TCP “ Reno”: Timeout or three duplicate ACKs--current

When CongWin is below Threshold, sender in slow-start phase, window grows exponentially.

When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.

When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold—for Reno only

When timeout occurs, Threshold set to CongWin/2 and then CongWin is set to 1 MSS—for both

Page 135: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-135

TCP Reno

Timeout

Used by most implementations

currently

Page 136: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-136

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS, If (CongWin > Threshold) set state to “Congestion Avoidance”

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS * (MSS/CongWin)

Additive increase, resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin/2, CongWin = Threshold,Set state to “Congestion Avoidance”

Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS.

SS or CA Timeout Threshold = CongWin/2, CongWin = 1 MSS,Set state to “Slow Start”

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Table 3.3

Page 137: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-137

TCP throughput

What’s the average throughout of TCP as a function of congestion window size and RTT? Ignore slow start

Let W be the window size when loss occurs.

When window is W, throughput is W/RTT Just after loss, window drops to W/2,

throughput to W/2RTT. Average throughout: .75 W/RTT

See Slide 125W

2RTT

W

RTT +

2

Page 138: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-138

Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Page 139: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-139

Why is TCP fair?

Two competing sessions with same MSS & RTT: Additive increase gives slope of 1, as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1’s throughput

Connection 2’s

throughput congestion avoidance: additive increaseloss: decrease window by factor of 2

congestion avoidance: additive increaseloss: decrease window by factor of 2

Throughput goal

Full bandwidth utilization line

Page 140: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-140

Fairness (more)

Fairness and UDP Multimedia apps often do

not use TCP do not want rate

throttled by congestion control, also avoid TCP’s overhead

Instead use UDP: pump audio/video at

constant rate, tolerate packet loss

Research area: How to prevent UDP(no flow or congestion control) traffic bringing down Internet TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts.

Web browsers do this Example: link of rate R

supporting 9 connections; new app asks for 1 TCP, gets

rate R/10 new app asks for 11 TCPs,

gets R/2 ! Is this fair?

Page 141: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-141

Chapter 3: Summary

principles behind transport layer services: multiplexing,

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next: leaving the network

“edge” (application, transport layers)

into the network “core”

Page 142: CSCI 547 Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach, 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July

CSCI 547 Transport Layer 3-142

Lab –Hand-in next class Using Wireshark, capture packets for a tcp session and

Identify tcp’s three way handshake for tcp connection initialization—show the window size negotiation also

Identify the change in the size of window during the connection—in other words, the receiver controls the flow by shrinking/expanding the window size

Identify the RTT adjustments on the captured packets Identify the four way handshakes for tcp connection

termination Cumulative acks—on Windows systems, ack is sent for

every other received packets to reduce the number of packets on the network—Prove this

Try to identify the case where the selective repeat is visible—this may not be easy—after trying, you state whatever you found out

Look for 3 duplicated ACKs to a segment—look for “[TCP previous segment lost”] in the info field. What is your TCP’s reaction to this? What is this feature called?

Show the packets and mark them clearly to illustrate the concepts