View
214
Download
0
Category
Tags:
Preview:
Citation preview
2
Announcements Lab 4 (5-7) due next week before your lab
slot Prelab 5 due next week. There will be Lab 5 next week. Midterm (March 10th, duration ~1.5 hours) Assignment 2 issues
aslookup compilation? ISP name: nslookup or whois for IP address
Lab 4 (count-to-infinity issues)
3
Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP)
4
Autonomous Systems Terminology local traffic = traffic with source or
destination in AS transit traffic = traffic that passes
through the AS Stub AS = has connection to
only one AS, only carry local traffic Multihomed AS = has connection to >1
AS, but does not carry transit traffic Transit AS = has connection to >1
AS and carries transit traffic
5
Stub and Transit Networks
AS 1, AS 2, and AS 5 are stub networks
AS 2 is a multi-homed stub network
AS 3 and AS 4 are transit networks
AS 3
AS 1
AS 4
AS 2
AS 5
6
Selective Transit
Example: Transit AS 3 carries
traffic between AS 1 and AS 4 and between AS 2 and AS 4
But AS 3 does not carry traffic between AS 1 and AS 2
The example shows a routing policy.
AS 2AS 1
AS 3
AS 4
7
Customer/Provider
A stub network typically obtains access to the Internet through a transit network.
Transit network that is a provider may be a customer for another network
Customer pays provider for service
AS 5
AS 2
Customer/Provider
AS 6
Customer/Provider
AS 6
Customer/Provider
AS 4
Customer/Provider
AS 6
Customer/Provider
8
Customer/Provider and Peers
Transit networks can have a peer relationship Peers provide transit between their respective customers Peers do not provide transit between peers Peers normally do not pay each other for service
AS 3
AS 5
AS 2Peers
Customer/Provider
AS 6
Customer/Provider
AS 1Peers
AS 6
Customer/Provider
AS 4
Customer/Provider
AS 6
Customer/Provider
9
Shortcuts through peering
Note that peering reduces upstream traffic Delays can be reduced through peering But: Peering may not generate revenue
AS 3
AS 5
AS 2Peers
Customer/Provider
AS 6
Customer/Provider
AS 1Peers
AS 6
Customer/Provider
AS 4
Customer/Provider
AS 6
Customer/Provider
Peers
13
Autonomous Routing Domains Don’t Always Need BGP or an ASN
Qwest
Yale University
Nail up default routes 0.0.0.0/0pointing to Qwest
Nail up routes 130.132.0.0/16pointing to Yale
130.132.0.0/16
Static routing is the most common way of connecting anautonomous routing domain to the Internet. This helps explain why BGP is a mystery to many …
ARDs versus ASes
14
ASNs Can Be “Shared” (RFC 2270)
AS 701UUNet
ASN 7046 is assigned to UUNet. It is used byCustomers single homed to UUNet, but needing BGP for some reason (load balancing, etc..) [RFC 2270]
AS 7046Crestar Bank
AS 7046 NJIT
AS 7046HoodCollege
128.235.0.0/16
15
ARDs and ASes: Summary Most ARDs have no ASN (statically routed
at Internet edge)
Some unrelated ARDs share the same ASN (RFC 2270)
Some ARDs are implemented with multiple ASNs (example: Worldcom)
ASes are just an implementation detail of Inter-domain routing
16
Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP)
17
Regional ISP1
Regional ISP2
Regional ISP3
Cust1Cust3 Cust2
National ISP1
National ISP2
YES
NO
Shortest path routing is not compatible with commercial relations
Why not minimize “AS hop Count”?
18
Customer versus Provider
Customer pays provider for access to the Internet
provider
customer
IP trafficprovider customer
19
peer peer
customerprovider
Peers provide transit between their respective customers
Peers do not provide transit between peers
Peers (often) do not exchange $$$trafficallowed
traffic NOTallowed
The “Peering” Relationship
20
Peering also allows connectivity betweenthe customers of “Tier 1” providers.
peer peer
customerprovider
Peering Provides Shortcuts
21
Peering Wars
Reduces upstream transit costs
Can increase end-to-end performance
May be the only way to connect your customers to some part of the Internet (“Tier 1”)
You would rather have customers
Peers are usually your competition
Peering relationships may require periodic renegotiation
Peering struggles are by far the most contentious issues in the ISP world!
Peering agreements are often confidential.
Peer Don’t Peer
22
Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP)
24
BGP Overview BGP = Border Gateway Protocol v4 . RFC 1771. (~
60 pages) Note: In the context of BGP, a gateway is nothing
else but an IP router that connects autonomous systems.
Interdomain routing protocol for routing between autonomous systems.
Uses TCP to establish a BGP session and to send routing messages over the BGP session.
Update only new routes. BGP is a path vector protocol. Routing messages in
BGP contain complete routes. Network administrators can specify routing
policies.
25
BGP Policy-based Routing Each node is assigned an AS number (ASN)
BGP’s goal is to find any AS-path (not an optimal one). Since the internals of the AS are never revealed, finding an optimal path is not feasible.
Network administrator sets BGP’s policies to determine the best path to reach a destination network.
26
BGP = RFC 1771
+ “optional” extensionsRFC 1997 (communities) RFC 2439 (damping) RFC 2796 (reflection) RFC3065 (confederation) …
+ routing policy configurationlanguages (vendor-specific)
+ Current Best Practices in management of Interdomain Routing
BGP was not DESIGNED. It EVOLVED.
The Border Gateway Protocol (BGP)
27
BGP Route Processing
Best Route Selection
Apply Import Policies
Best Route Table
Apply Export Policies
Install forwardingEntries for bestRoutes.
ReceiveBGPUpdates
BestRoutes
TransmitBGP Updates
Apply Policy =filter routes & tweak attributes
Based onAttributeValues
IP Forwarding Table
Apply Policy =filter routes & tweak attributes
Open ended programming.Constrained only by vendor configuration language
28
BGP Attributes
Value Code Reference----- --------------------------------- --------- 1 ORIGIN [RFC1771] 2 AS_PATH [RFC1771] 3 NEXT_HOP [RFC1771] 4 MULTI_EXIT_DISC [RFC1771] 5 LOCAL_PREF [RFC1771] 6 ATOMIC_AGGREGATE [RFC1771] 7 AGGREGATOR [RFC1771] 8 COMMUNITY [RFC1997] 9 ORIGINATOR_ID [RFC2796] 10 CLUSTER_LIST [RFC2796] 11 DPA [Chen] 12 ADVERTISER [RFC1863] 13 RCID_PATH / CLUSTER_ID [RFC1863] 14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen] ... 255 reserved for development
From IANA: http://www.iana.org/assignments/bgp-parameters
Mostimportantattributes
Not all attributesneed to be present inevery announcement
30
NEXT_HOP Attribute
EGP: IP address used to reach the advertising router IGP: next-hop address is carried into local AS
32
Prepending will (usually) force inbound traffic from AS 1to take primary linkAS 1
192.0.2.0/24ASPATH = 2 2 2
customerAS 2
provider
192.0.2.0/24
backupprimary
192.0.2.0/24ASPATH = 2
Yes, this is a Glorious Hack …
Shedding Inbound Traffic with ASPATH Prepending
33
AS 1
192.0.2.0/24ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
AS 3 will sendtraffic on “backup”link because it prefers customer routes and localpreference is considered before ASPATH length!
Padding in this way is oftenused as a form of loadbalancing
backupprimary
… But Padding Does Not Always Work
34
AS 1
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
backupprimary
192.0.2.0/24ASPATH = 2 COMMUNITY = 3:70
Customer import policy at AS 3:If 3:90 in COMMUNITY then set local preference to 90If 3:80 in COMMUNITY then set local preference to 80If 3:70 in COMMUNITY then set local preference to 70
AS 3: normal customer local pref is 100,peer local pref is 90
COMMUNITY Attribute to the Rescue!
35
BGP Issues - What is a BGP Wedgie?
BGP policies make sense locally Interaction of local policies allows
multiple stable routings Some routings are consistent with
intended policies, and some are not If an unintended routing is
installed (BGP is “wedged”), then manual intervention is needed to change to an intended routing
When an unintended routing is installed, no single group of network operators has enough knowledge to debug the problem
¾ wedgie
Full wedgie
36
YouTube blocking Pakistan blocks YouTube How? (according to BBC)
Advertise a shorter route to reach YouTube The incorrect short route gets propagated Seen by two thirds of the Internet Traffic to YouTube goes through Pakistan Since Pakistan blocked YouTube, all traffic
reaches a dead end!
37
Dynamic Routing Protocols: Summary Dynamic routing protocols: RIP, OSPF, BGP
RIP uses distance vector algorithm, and converges slow (the count-to-infinity problem)
OSPF uses link state algorithm, and converges fast. But it is more complicated than RIP.
Both RIP and OSPF finds lowest-cost path.
BGP uses path vector algorithm, and its path selection algorithm is complicated, and is influenced by policies.
BGP has its own problems see WIDGI by Tim Griffin
38
More Readings (Optional)BGP Wedgies: Bad Routing Policy Interactions that Cannot be Debugged
JI’s Intro to interdomain routing.
"Interdomain Setting of PlanetLab Nodes." PlanetLab Meeting, May 14, 2004.
Understanding the Border Gateway Protocol (BGP) ICNP 2002 Tutorial Session
39
Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP)
40
Transmission Control Protocol (RFC) Reliable and in-order byte-stream service
TCP format Connection establishment Flow control Reaction to congestion Packet corruption
41
TCP Format
IP header TCP header TCP data
Sequence number (32 bits)
DATA
20 bytes 20 bytes
0 15 16 31
Source Port Number Destination Port Number
Acknowledgement number (32 bits)
window sizeheaderlength
0 Flags
Options (if any)
TCP checksum urgent pointer
20 bytes• TCP segments have a 20 byte header with >= 0 bytes of data.
42
TCP header fields Sequence Number (SeqNo):
Sequence number is 32 bits long. So the range of SeqNo is
0 <= SeqNo <= 232 -1 4.3 Gbyte
Each sequence number identifies a byte in the byte stream
Initial Sequence Number (ISN) of a connection is set during connection establishmentQ: What are possible requirements for ISN ?
43
TCP header fields Acknowledgement Number (AckNo):
Acknowledgements are piggybacked, i.e.,a segment from A -> B can contain an acknowledgement for a data sent in the B -> A direction
Q: Why is piggybacking good ?
A hosts uses the AckNo field to send acknowledgements. (If a host sends an AckNo in a segment it sets the “ACK flag”)
The AckNo contains the next SeqNo that a hosts wants to receiveExample: The acknowledgement for a segment with sequence numbers 0-1500 is AckNo=1501
44
TCP header fields Acknowledge Number (cont’d)
TCP uses the sliding window flow protocol (see CS 457) to regulate the flow of traffic from sender to receiver
TCP uses the following variation of sliding window: no NACKs (Negative ACKnowledgement) only cumulative ACKs
Example: Assume: Sender sends two segments with “1..1500”
and “1501..3000”, but receiver only gets the second segment.
In this case, the receiver cannot acknowledge the second packet. It can only send AckNo=1
45
TCP header fields Header Length ( 4bits):
Length of header in 32-bit words Note that TCP header has variable length
(with minimum 20 bytes)
46
TCP header fields Flag bits:
URG: Urgent pointer is valid If the bit is set, the following bytes contain an urgent
message in the range:SeqNo <= urgent message <= SeqNo+urgent pointer
ACK: Acknowledgement Number is valid PSH: PUSH Flag
Notification from sender to the receiver that the receiver should pass all data that it has to the application.
Normally set by sender when the sender’s buffer is empty
47
TCP header fields Flag bits:
RST: Reset the connection The flag causes the receiver to reset the connection Receiver of a RST terminates the connection and
indicates higher layer application about the reset
SYN: Synchronize sequence numbers Sent in the first packet when initiating a connection
FIN: Sender is finished with sending Used for closing a connection Both sides of a connection must send a FIN
48
TCP header fields Window Size:
Each side of the connection advertises the window size
Window size is the maximum number of bytes that a receiver can accept.
Maximum window size is 216-1= 65535 bytes TCP Checksum:
TCP checksum covers over both TCP header and TCP data (also covers some parts of the IP header)
16-bit one’s complement Urgent Pointer:
Only valid if URG flag is set
49
TCP header fields Options:
End ofOptions kind=0
1 byte
NOP(no operation) kind=1
1 byte
MaximumSegment Size kind=2
1 byte
len=4
1 byte
maximumsegment size
2 bytes
Window ScaleFactor kind=3
1 byte
len=3
1 byte
shift count
1 byte
Timestamp kind=8
1 byte
len=10
1 byte
timestamp value
4 bytes
timestamp echo reply
4 bytes
50
TCP header fields Options:
NOP is used to pad TCP header to multiples of 4 bytes
Maximum Segment Size Window Scale Options
Increases the TCP window from 16 to 32 bits, i.e., the window size is interpreted differentlyQ: What is the different interpretation ?
This option can only be used in the SYN segment (first segment) during connection establishment time
Timestamp Option Can be used for roundtrip measurements
51
Three-Way Handshake
aida.poly.edu mng.poly.edu
S 1031880193:1031880193(0)win 16384 <mss 1460, ...>
S 172488586:172488586(0)
ack 1031880194 win 8760 <mss 1460>
ack 172488587 win 17520
52
Why is a Two-Way Handshake not enough?
aida.poly.edu mng.poly.edu
S 15322112354:15322112354(0)win 16384 <mss 1460, ...>
S 172488586:172488586(0)
win 8760 <mss 1460>
S 1031880193:1031880193(0)win 16384 <mss 1460, ...>
The redline is adelayedduplicatepacket.
When aida initiates the data transfer (starting with SeqNo=15322112355), mng will reject all data.
Will be discarded as a duplicate
SYN
53
TCP Connection Termination
aida.poly.edu mng.poly.edu
F 172488734:172488734(0)
ack 1031880221 win 8733
. ack 172488735 win 17484
. ack 1031880222 win 8733
F 1031880221:1031880221(0)ack 172488735 win 17520
54
Connection termination with tcpdump
1 mng.poly.edu.telnet > aida.poly.edu.1121: F 172488734:172488734(0) ack 1031880221 win 8733
2 aida.poly.edu.1121 > mng.poly.edu.telnet: . ack 172488735 win 174843 aida.poly.edu.1121 > mng.poly.edu.telnet: F 1031880221:1031880221(0)
ack 172488735 win 175204 mng.poly.edu.telnet > aida.poly.edu.1121: . ack 1031880222 win 8733
aida.poly.edu mng.poly.edu
aida issuesan "telnet mng"
55
TCP States in “Normal” Connection Lifetime
SYN (SeqNo = x)
SYN (SeqNo = y, AckNo = x + 1 )
(AckNo = y + 1 )
SYN_SENT(active open)
SYN_RCVD
ESTABLISHED
ESTABLISHED
FIN_WAIT_1(active close)
LISTEN(passive open)
FIN (SeqNo = m)
CLOSE_WAIT(passive close)
(AckNo = m+ 1 )
FIN (SeqNo = n )
(AckNo = n+1)LAST_ACK
FIN_WAIT_2
TIME_WAIT
CLOSED
56
TCP State Transition DiagramOpening A Connection
CLOSED
LISTEN
SYN RCVD SYN SENT
ESTABLISHED
active opensend: SYN
recv: SYN, ACKsend: ACK
recv: SYNse nd: SYN, ACK
recvd: ACKsend: . / .
recv:RST
Application sends datasend: SYN
simultaneous openrecv: SYNsend: SYN, ACK
close ortimeout
passive opensend: . / .
recvd: FIN send: FIN
send:FIN
57
TCP State Transition DiagramClosing A Connection
FIN_WAIT_1
FIN_WAIT_2
ESTABLISHED
recv: FINsend: ACK
recv: ACKsend: . / .
recvd: ACKsend: . / .
recv:FIN, ACKsend: ACK
active closesend: FIN
TIME_WAIT
CLOSING
recv: FINsend: ACK
CLOSED
Timeout(2 MSL)
CLOSE_WAIT
LAST_ACK
passive closerecv: FINsend: ACK
applicationclosessend: FIN
recv: ACKsend: . / .
Issue close()
58
2MSL Wait State2MSL Wait State = TIME_WAIT When TCP does an active close, and sends the final
ACK, the connection must stay in in the TIME_WAIT state for twice the maximum segment lifetime.
2MSL= 2 * Maximum Segment Lifetime
Why? TCP is given a chance to resent the final ACK. (Server will timeout after sending the FIN segment and resend the FIN)
The MSL is set to 2 minutes or 1 minute or 30 seconds.
59
Rules for sending Acknowledgments TCP has rules that influence the transmission of
acknowledgments
Rule 1: Delayed Acknowledgments Goal: Avoid sending ACK segments that do not carry data Implementation: Delay the transmission of (some) ACKs
Rule 2: Nagle’s rule Goal: Reduce transmission of small segments
Implementation: A sender cannot send multiple segments with a 1-byte payload (i.e., it must wait for an ACK)
60
Delayed Acknowledgement TCP delays transmission of ACKs for up to 200ms
Goal: Avoid to send ACK packets that do not carry data. The hope is that, within the delay, the receiver will have data ready to be sent to the receiver. Then, the ACK can be
piggybacked with a data segmentIn Example: Delayed ACK explains why the “ACK of character” and the “echo of character” are sent in the same segment The duration of delayed ACKs can be observed in the example when Argon sends ACKs
Exceptions: ACK should be sent for every second full sized segment Delayed ACK is not used when packets arrive out of order
61
Observing Delayed Acknowledgements
• Remote terminal applications (e.g., Telnet) send characters to a server. The server interprets the character and sends the output at the server to the client.
• For each character typed, you see three packets:1. Client Server: Send typed character 2. Server Client: Echo of character (or user output) and
acknowledgement for first packet3. Client Server: Acknowledgement for second packet
1.send character
2.interpretcharacter
3.send echo of character
and/or output
Host withTelnet client
Host withTelnet server
62
Observing Delayed Acknowledgements
Argon Neon
Telnet sessionfrom Argonto Neon
This is the output of typing 3 (three) characters :
Time 44.062449: Argon Neon: Push, SeqNo 0:1(1), AckNo 1 Time 44.063317: Neon Argon: Push, SeqNo 1:2(1), AckNo 1Time 44.182705: Argon Neon: No Data, AckNo 2
Time 48.946471: Argon Neon: Push, SeqNo 1:2(1), AckNo 2 Time 48.947326: Neon Argon: Push, SeqNo 2:3(1), AckNo 2 Time 48.982786: Argon Neon: No Data, AckNo 3
Time 55.116581: Argon Neon: Push, SeqNo 2:3(1) AckNo 3Time 55.117497: Neon Argon: Push, SeqNo 3:4(1) AckNo 3 Time 55.183694: Argon Neon: No Data, AckNo 4
63
Why 3 segments per character? We would expect four
segments per character:
But we only see three segments per character:
This is due to delayed acknowledgements
character
ACK of character
ACK of echoed character
echo of character
character
ACK and echo of character
ACK of echoed character
64
Observing Nagle’s Rule
argon.cs.virginia.edu
3000miles
tenet.cs.berkeley.edu
Telnet sessionbetween argon.cs.virginia.eduandtenet.cs.berkeley.edu
This is the output of typing 7 characters :
Time 16.401963: Argon Tenet: Push, SeqNo 1:2(1), AckNo 2 Time 16.481929: Tenet Argon: Push, SeqNo 2:3(1) , AckNo 2
Time 16.482154: Argon Tenet: Push, SeqNo 2:3(1) , AckNo 3Time 16.559447: Tenet Argon: Push, SeqNo 3:4(1), AckNo 3
Time 16.559684: Argon Tenet: Push, SeqNo 3:4(1), AckNo 4 Time 16.640508: Tenet Argon: Push, SeqNo 4:5(1) AckNo 4
Time 16.640761: Argon Tenet: Push, SeqNo 4:8(4) AckNo 5 Time 16.728402: Tenet Argon: Push, SeqNo 5:9(4) AckNo 8
65
Observing Nagle’s Rule Observation: Transmission
of segments follows a different pattern, i.e., there are only two segments per character typed
Delayed acknowledgment does not kick in at Argon
The reason is that there is always data at Argon ready to sent when the ACK arrives
Why is Argon not sending the data (typed character) as soon as it is available?
char1
ACK + char2
ACK + char3
ACK + char4-7
66
Resetting Connections Resetting connections is done by setting
the RST flag When is the RST flag set?
Connection request arrives and no server process is waiting on the destination port
Abort (Terminate) a connection Causes the receiver to throw away buffered data. Receiver does not acknowledge the RST segment
67
TCP Congestion Control TCP has a mechanism for congestion control.
The mechanism is implemented at the sender
The window size at the sender is set as follows:Send Window = MIN (flow control window, congestion window)
where flow control window is advertised by the receiver congestion window is adjusted based on feedback
from the network
68
TCP Congestion Control TCP congestion control is governed by
two parameters: Congestion Window (cwnd)
Slow-start threshhold Value (ssthresh)Initial value is 216-1
Congestion control works in two modes: slow start (cwnd < ssthresh) congestion avoidance (cwnd ≥ ssthresh
69
Slow Start Initial value: Set cwnd = 1
Note: Unit is a segment size. TCP actually is based on bytes and increments by 1 MSS (maximum segment size)
The receiver sends an acknowledgement (ACK) for each Segment Note: Generally, a TCP receiver sends an ACK for every other
segment. Each time an ACK is received by the sender, the congestion
window is increased by 1 segment:cwnd = cwnd + 1
If an ACK acknowledges two segments, cwnd is still increased by only 1 segment.
Even if ACK acknowledges a segment that is smaller than MSS bytes long, cwnd is increased by 1.
Does Slow Start increment slowly? Not really. In fact, the increase of cwnd is exponential
70
Slow Start Example The congestion
window size grows very rapidly For every ACK, we
increase cwnd by 1 irrespective of the number of segments ACK’ed
TCP slows down the increase of cwnd when cwnd > ssthresh
cwnd = 1
cwnd = 2
cwnd = 4
cwnd = 7
71
Congestion Avoidance Congestion avoidance phase is started if
cwnd has reached the slow-start threshold value
If cwnd ≥ ssthresh then each time an ACK is received, increment cwnd as follows:
cwnd = cwnd + 1/ cwnd
So cwnd is increased by one only if all cwnd segments have been acknowledged.
72
Example of Slow Start/Congestion Avoidance
Assume that ssthresh = 8
cwnd = 1
cwnd = 2
cwnd = 4
cwnd = 8
cwnd = 9
cwnd = 10
0
2
4
6
8
10
12
14
t=0
t=2
t=4
t=6
Roundtrip times
Cw
nd
(in
seg
men
ts)
ssthresh
73
Responses to Congestion So, TCP assumes there is congestion if it
detects a packet loss A TCP sender can detect lost packets via:
Timeout of a retransmission timer Receipt of a duplicate ACK
TCP interprets a Timeout as a binary congestion signal. When a timeout occurs, the sender performs: cwnd is reset to one:
cwnd = 1 ssthresh is set to half the current size of the congestion
window:ssthressh = cwnd / 2
and slow-start is entered
74
Fast Retransmit If three or more duplicate
ACKs are received in a row, the TCP sender believes that a segment has been lost.
Then TCP performs a retransmission of what seems to be the missing segment, without waiting for a timeout to happen.
Enter slow start:ssthresh = cwnd/2
cwnd = 1
1. duplicate
2. duplicate
3. duplicate
75
Fast Recovery Fast recovery avoids slow start
after a fast retransmit
Intuition: Duplicate ACKs indicate that data is getting through
After three duplicate ACKs set: Retransmit packet that is
presumed lost ssthresh = cwnd/2 cwnd = cwnd+3 (note the order of operations) Increment cwnd by one for each
additional duplicate ACK
When ACK arrives that acknowledges “new data” (here: AckNo=6148), set:
cwnd=ssthreshenter congestion avoidance
1K SeqNo=0
AckNo=1024
AckNo=1024
1K SeqNo=1024
SeqNo=20481K
AckNo=1024
SeqNo=30721K
SeqNo=40961K
1. duplicate
2. duplicate
AckNo=1024
SeqNo=10241K
SeqNo=51201K
3. duplicate
cwnd=12sshtresh=5
cwnd=12sshtresh=5
cwnd=12sshtresh=5
cwnd=12sshtresh=5
cwnd=15sshtresh=6
AckNo=6148cwnd=6sshtresh=6
ACK for new data
76
Flavors of TCP Congestion Control TCP Tahoe (1988, FreeBSD 4.3 Tahoe)
Slow Start Congestion Avoidance Fast Retransmit
TCP Reno (1990, FreeBSD 4.3 Reno) Fast Recovery
New Reno (1996) SACK (1996)
RED (Floyd and Jacobson 1993)
77
SACK SACK = Selective acknowledgment
Issue: Reno and New Reno retransmit at most 1 lost packet per round trip time
Selective acknowledgments: The receiver can acknowledge non-continuous blocks of data (SACK 0-1023, 1024-2047)
Multiple blocks can be sent in a single segment.
TCP SACK: Enters fast recovery upon 3 duplicate ACKs Sender keeps track of SACKs and infers if segments are lost.
Sender retransmits the next segment from the list of segments that are deemed lost.
78
TCP in Linux Congestion control algorithm is pluggable
/proc/sys/net/ipv4/tcp_congestion_control TCP read and write buffer sizes
/proc/sys/net/ipv4/tcp_r[w]mem
Recommended