Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
1
トランスポートレイヤ技術- TCP; Transmission Control Protocol -
2
トランスポートレイヤの仕事
• 計算機間での良好なデータのやり取りを実現する。
– 誤りがないように• 再送
• パリティー情報による自動再生(FEC; Forward Error Correction)
– データを取りこぼさないように
• それ以外に欲しくなる機能
– 並列データ転送
– ネットワークに “やさしく”• 道が混まないように
• ネットワークは単純化、エンドホストが賢く
3
4
5
インターネットアーキテクチャ- TCP : Transmission Control Protocol -
• TCP (Transmission Control Protocol) ; end-to-end– フロー制御
– エラー制御 / 再送制御
– コネクション管理
– セッションの多重化
Physical
NetworkInterface
IP
TCP
Application
IP IP
TCP
Application
Physical
NetworkInterface
Physical
NetworkInterface
6
TCP Features・ “Stream” Oriented Data Transmission
→ Connection確立(Three-way-handshake)・ Connection (“Stream”) Identifier = “Socket”
{dst_IP_addr, dst_port, src_IP_addr, src_port}・ “Sequence Number” ; 32 bits
→ バイト番号 : 0 - (2^32-1) → 2^32 でSequence NumberがWrapされる
・ “Full-Duplex”での通信・ Acknowledgement (ACK) ;
→ 次に受信すべきバイト番号(SN)の通知・ エラー回復: セグメント再送(Segment retransmission)
by Time-out, Dupilicated-ACK・ “Sliding Window Control” を用いたデータ転送制御
(*) Window_size ≦ 65,535 Bytes
7
8
9
TCP Header Format
UR : Urgent Pointer Field Significant (URG)AK : Acknowledgement Field Significant (ACK)PH : Push Function RT : Reset the Connection SY : Synchronize Sequence Numbers (SYN) FN : No More Data From Sender (FIN)
10
TCP Port Allocation (RFC1700)
1. Well-Known Ports ; 0 - 1,023 2. Registered Ports ; 1,024 - 49,1513. Dynamic and/or Private Ports ; 49,152 - 65,535
最新情報 : ftp://ftp.isi.edu/in-notes/iana/assignments/port-numbers
11
TCP Well-Known PortsPort Number Keyword Application
5 rje Remote Job Entry 20 ftp-data File Transfer [Default data]21 ftp File Transfer [Control] 23 telnet Telnet 25 smtp Simple Management Protocol39 rlp Resource Location Protocol 53 domain Domain Name Server 63 whois++ Whois++ 67 bootp Bootstrap Protocol Server 69 tftp Trivial File Transfer 70 gopher Gopher 79 finger Finger 80 http World Wide Web HTTP110 pop3 Post Office Protocol - Version 3 111 sunrpc SUN Remote Procedure Call119 nntp Network News Transfer Protocol
12
TCP Well-Known PortsPort Number Keyword Application
123 ntp Network Time Protocol 137 netbios-ns NetBIOS Name Service 138 netbios-dgm NetBIOS Datagram Service 139 netbios-ssn NetBIOS Session Service 179 bgp Border Gateway Protocol (BGP)202 at-nbp AppleTalk Name Binding Protocol 213 ipx IPX 220 imap3 IMAP3 (Interactive Mail Access Protocol) 396 netware-ip Novell Netware over IP 540 uucp uucp daemon546 dhcpv6-client DHCPv6 Client 547 dhcpv6-server DHCPv6 Server560 rmonitor remote monitor daemon
13
TCP Connection確立/開放svr4.1037 (client) bsdi.discard(server)
SYN_ACK(a+1,b)
FIN (m,s)
FIN_ACK (m+1,s)
ACK (m+1)
ACK (s+1)
“Active open”(appli. open : telnet)
“Passive open”“open”
“open”
EOF to Application“Active Close”
(application close: quit)
“Passive Close”(application close)
“half close”
“half close”→ full close
14
TCP Connection確立/開放Client Server
SYN_ACK(a+1,b)
FIN (m,s)
FIN_ACK (m+1,s)
ACK (m+1)
ACK (s+1)
SYN_SENT(Active open)
ESTABLISHED
LISTEN(Passive open)
SYN_RCVD
ESTABLISHED
FIN_WAIT_1(Active close) CLOSE_WAIT
(Passive close)FIN_WAIT_2
TIME_WAITLAST_ACK
CLOSEDCLOSED
2-MSL
TCP Connection確立/開放Log on the console; svr4% telnet bsdi discard #port=“9” (server discard packet)Trying 140.252.13.35 Connected to bsdi. Escape character is ‘^]’. ^] telnet> quit Connection closed.
tcpdump output1 0.0 svr4.1037 > bsdi.discard: S 14155.14155(0)
win 4096 <mss 1024>2 0.024 (0.0024) bsdi.discard > svr4.1037: S 18239.18239(0)
ack 14156 win 4096 <mss 1024>3 0.007 (0.0048) svr4.1037 > bsdi.discard: . ack 18240 win 40964 4.155 (4.1482) svr4.1037 > bsdi.discard: F 14156:14156(0)
ack 18240 win 4096 5 4.158 (0.0013) bsdi.discard > svr4.1037: . ack 14157 win 40966 4.159 (0.0014) bsdi.discard > svr4.1037: F 18240.18240(0)
ack 14157 win 40967 4.189 (0.0225) svr4.1037 > bsdi.discard: . ack 18241 win 4096
16
TCP Connection確立/開放tcpdump output
1 0.0 svr4.1037 > bsdi.discard: S 14155.14155(0)win 4096 <mss 1024>
2 0.024 (0.0024) bsdi.discard > svr4.1037: S 18239.18239(0)ack 14156 win 4096 <mss 1024>
3 0.007 (0.0048) svr4.1037 > bsdi.discard: . ack 18240 win 40964 4.155 (4.1482) svr4.1037 > bsdi.discard: F 14156:14156(0)
ack 18240 win 4096 5 4.158 (0.0013) bsdi.discard > svr4.1037: . ack 14157 win 40966 4.159 (0.0014) bsdi.discard > svr4.1037: F 18240.18240(0)
ack 14157 win 40967 4.189 (0.0225) svr4.1037 > bsdi.discard: . ack 18241 win 4096
[意味]source.port > destination.port : flags SN_begin.SN_end(data_size)
flags : S = SYN ; Synchronize sequence_number(SN)F = FIN ; Finish data transmission R = RST ; Reset connection P = PSH ; push data to receiving process asap . = ; none of above four flags is on
SN_end = SN_begin + data_size win 4096 ; window size is 4096 mss 1024 ; maximum segment size is 1024 bytes
CLOSED
LISTEN
ESTABLISHED
SYN_SENTSYN_RCVD
CLOSE_WAIT
LAST_ACKFIN_WAIT_1
FIN_WAIT_2
CLOSING
TIME_WAIT
appl: passive opensend: <nothing> appl: active open
send: SYN
appl: send datasend: SYN
Send : RST
recvl: SYNsend: SYN, ACK
recv: SYNsend: SYN,ACK
(simultaneous open)recv: SYN,ACKsend: ACK
recv: ACKsend: <nothing>
appl
: clo
sese
nd: F
IN
appl: closesend: FIN
recv: FINsend: ACK appl: close
send: FIN
recv: ACKsend: <nothing>
appl: closeor timeout
recv: ACKsend: <nothing>
recv: FINsend: ACK
recv: ACKsend: <nothing>
recv: FIN,ACKsend: ACK
recv: FINsend: ACK
simultaneous close
2 MSL timeout
Active open
passive open
Active close
Passive close
CLOSED
LISTEN
ESTABLISHED
SYN_SENTSYN_RCVD
CLOSE_WAIT
LAST_ACKFIN_WAIT_1
FIN_WAIT_2
CLOSING
TIME_WAIT
appl: passive opensend: <nothing> appl: active open
send: SYN
appl: send datasend: SYN
send: RST
recvl: SYNsend: SYN, ACK
recv: SYNsend: SYN,ACK
(simultaneous open)recv: SYN,ACKsend: ACK
recv: ACKsend: <nothing>
appl
: clo
sese
nd: F
IN
appl: closesend: FIN
recv: FINsend: ACK appl: close
send: FIN
recv: ACKsend: <nothing>
appl: closeor timeout
recv: ACKsend: <nothing>
recv: FINsend: ACK
recv: ACKsend: <nothing>
recv: FIN,ACKsend: ACK
recv: FINsend: ACK
simultaneous close
2 MSL timeout
Active open
passive open
Active close
Passive close
<< Client >>
CLOSED
LISTEN
ESTABLISHED
SYN_SENTSYN_RCVD
CLOSE_WAIT
LAST_ACKFIN_WAIT_1
FIN_WAIT_2
CLOSING
TIME_WAIT
appl: passive opensend: <nothing> appl: active open
send: SYN
appl: send datasend: SYN
send: RST
recvl: SYNsend: SYN, ACK
recv: SYNsend: SYN,ACK
(simultaneous open)recv: SYN,ACKsend: ACK
recv: ACKsend: <nothing>
appl
: clo
sese
nd: F
IN
appl: closesend: FIN
recv: FINsend: ACK appl: close
send: FIN
recv: ACKsend: <nothing>
appl: closeor timeout
recv: ACKsend: <nothing>
recv: FINsend: ACK
recv: ACKsend: <nothing>
recv: FIN,ACKsend: ACK
recv: FINsend: ACK
simultaneous close
2 MSL timeout
Active open
passive open
Active close
Passive close
<< Server >>
20
21
誤りのないデータ転送
• パケットが紛失したり誤ったりしたら
– 再送(Resend)して、もとにもどす。
• 正しく受信できたかの確認のメッセージ(ACK; Acknowledge)を送信(From dst src)– とても原始的な手順では、、、、速度が出ない。。
– 2つの改善手法
• 大きなパケット長: 最大でも 帯域幅の 1/3 まで。。。
• パイプラインでパケットを転送
22
TCP Bulk Data Transmission- Sliding Window -
・ Window制御を用いたパケット転送①Sliding Window (Receiver設定)②Congestion Window(Sender設定)
(1) ACKなしにwindow数のパケットを転送(2) ACKのAggregation(ACKパケットの減少)(3) Receiver側によるwindow幅の制御(4) ACK受信でwindowをスライドさせる
23
TCP Sliding Window
1 2 3 4 5 6 7 8 9 10 11 …
Offered window(advertised by receiver)
Unsent window
Can not send untilwindow slides
Can send ASAP
sent but not ACKed
sent and ACKed
TCP Sliding Window
1 2 3 4 5 6 7 8 9 10 11 …
Offered window(advertised by receiver)
Unsent window
Can not send until window slides
Can send ASAPsent but not ACKed
sent and ACKed
Receive ack “5”from receiver
Sent “3” and “4”
Receive ack “5”from receiver
5+window=113+window=9
25
TCP Sliding Window
window
closed by ACK reception= ACKed SN
Opend byACK reception(=ack+window)
shrink enlarge
Window advertise by receiver
Slide window by ACK from receiver
26
TCP Congestion Window
1 2 3 4 5 6 7 8 9 10 11 …
Offered window(advertised by receiver)
Unsent window
Can not send untilwindow slides
Shall not send ASAP
→ sent but not ACKed
sent and ACKed
Congestion window(“cwnd”=1 )
TCP Congestion Window
1 2 3 4 5 6 7 8 9 10 11 …
Offered window(advertised by receiver)
Unsent window
Can not send until window slides
Shall not send ASAPShall send withoutACK ASAP; cwnd=2 (cwnd←cwnd*2)
sent and ACKed
Receive ack “4”from receiver
Sent “3”
Receive ack “4”from receiver
4+window=103+window=9
28
TCP Congestion Window ・ Slow Start Policy (cwnd ; exponential increase)
cwnd = 1 ;for (セグメント転送) {for (not congestion) { if (セグメント転送ACK受信){ cwnd = cnwd +1 }
cwnd = 1 }
(*)注意 : Congestion Avoidance では若干異なる。SenderがLocal に制御することなので、変えることが容易に可能
29
TCP Congestion Window
timecw
nd
congestion
time
cwnd
advertised_windowadvertised_window
< Congestionなしの場合 > < Congestion経験の場合 >
(*) Duplicated ACKを使用せず
30
TCP Congestion Window(1)
[送] [受] [送] [受]1
1
1
1
1
1
1
1
31
TCP Congestion Window(2)
[送] [受] [送] [受]2
2
2
2
2
2
2
2 3
3
3
3
3
3
3
32
TCP Congestion Window(3)
[送] [受] [送] [受]4
5 4
6 5 4
7 6 5 4
7 6 5
4
4 5
4 5 6
4 5 6 7
7 6
7
33
TCP Congestion Window(4)
[送] [受] [送] [受]
10 9 8
9 8
8
5 6 7
6 7
711 10 9 8
12 11 10 9
13 12 11 10
14 13 12 11
15 14 13 12
8
8 9
8 9 10
8 9 10 11
必要なウィンドー幅 ≧ BWxRTT
34
35
bsdi.1023 svr4.discard1
2
3
4
SYN 0:0(0) win4096 <mss1024>
SYN 3:3(0) ack 1 win4096 <mss1024>ack 4 win4096
PSH 1:15(14) ack 4 win4096ack 15 win 4096 5
67
8
9
17
18
1.5 sec3 sec
6 sec
64 sec
再送間隔
再送トライ (RTO; 再送タイマ)RTO = 1.5 sec /* 変更可能*/for ( 9 minutes) { if ( RTO expired){retransmission; RTO=RTO x 2; RTO=min{64sec, RTO};}}
end /* 諦める */
RTO Expired Retransmission
36
Retransmission by Duplicated ACK
(2) Reception of Duplicated ACK- Fast Retransmission / Fast Recovery
Segment廃棄特性 ; → “single (or few) segment(S)” あるい
は連続多数。→ 未ACKの同一ACK Segmentsを
複数(3回)受信したら、再送。
Fast Retransmission by Duplicated ACK ack 5889
ack 6145
ack 6401
ack 6657
ack 6657 ①
ack 6657 ②
ack 6657 ③
ack 6657
ack 6657
ack 6657
ack 8449 win5888
6401:6657(256) ack16657:6913(256) ack16913:7169(256) ack1
7169:7425(256) ack1
7425:7681(256) ack1
7681:7937(256) ack1
7937:8193(256) ack1
8193:8449(256) ack1
6657:6913(256) ack1
8449:8705(256) ack18705:8961(256) ack1
8961:9217(256) ack1 ack 8705 win5888
“Fast Retransmission”
38
Congestion Window Control cwnd=1; ssthresh=65KB;for (){if (“Timeout”) { cwnd=1; ssthresh = cwnd/2; }
if (“duplicated ACK”){ ssthresh=cwnd / 2;cwnd=ssthresh; }
if (cwnd ≦ ssthresh) { slow_start; /* exponential */ }
else { congestion_avoidance;/* liner */ }
}
[目的]cwndの大きな振動を防ぎ、適切なcwndで運用する
[1] cwndの制御(i) ssthresh以下のcwndサイズ→ Exponential increase
(slow start) (ii) ssthresh以上のcwndサイズ→ Liner increase
(congestion avoidance) [2] ssthreshの制御(i) Timeout ; goto “1” (ii) Duplicated-ACK ; 1/2
39
Congestion Window Control (続)
・ ICMP 制御メッセージ
(1) ICMP Source Quench → cwnd = 1 ;
ssthresh = as is ;
(2) Host unreachable→ No Action ;
40
cwnd
Targetcnwd
“ssthresh”
cwdn_1
(cw
nd_1
) / 2
cwdn_3
(cw
nd_3
) / 2
slow-start slow-startCongestionavoidance
Congestionavoidance
Congestionavoidance
Timeout FastRecovery
FastRecovery
41
Window Scaling for Long Fat Pipe- RFC1323 -
Network Bandwidth(bps) RTT(ms) BWxRTT(B)Ethernet 10.000 M 3 3,750 T1(大陸間) 1.544 M 60 11,580T1(衛星) 1,544 M 500 96,500T3(大陸間) 45,000 M 60 337,500OC12(大陸間) 2,400,000 M 60 7,500,000
・ Max. Window Size ; 2^(16) Bytes = 64KB→ Window Scaling ; “wscale”
wscale=n → 64 x 2^(n) windowサイズ
42
RFC 1379 ; T/TCP- Transaction TCP -
[目的] TCPコネクションの確立・開放手続きの速度アップ
[方法] ・ CC (Connection Count) Option ・ SYNへのPiggy-back ; “half-synchronization”
(1) SYN, Data, FIN, CC (2) SYN, SYN-ACK, Data, FIN, FIN-ACK,
CC, CC-Echo(3) FIN-ACK
43
RFC 1379 ; T/TCPServerClient
SYN_ACK(a+1,b)
FIN (m,s)
FIN_ACK (m+1,s)
ACK (m+1)
ACK (s+1)
Data_ACK(a+2,b+1)
ServerClient
SYN,S-ack,Data,F,F-ack
9 セグメント→ 3 セグメント
44
ECN(Explicit Congestion Notification)制御
TOS for Differentiated Service - PHB(Per-Hop-Behavior) - CU(Currently Unused)
=> for ECN(Explicit Congestion Notification) ?
0 1 2 3 4 5 6 7
TOSフィールド:
PHB: 000000 DE (Default Service)101110 EF (Expedited Forwarding)Others AF (Assured Forwarding) xxxxx0 Standard Purpose xxxx11 Experimental Purposexxxx01 Experimental Purpose
PHB CU
45
DestinationNode 1
SourceNode 1
Explicit Congestion Notification (ECN)制御
(1) ECN=00
(4) ECN=10
(6) ECN=11
(5) ECN=11
Reduce Speed
Reduce Speed(2) ECN=01
(3) ECN=01
(7) ECN=10
(8) ECN=11
(9) ECN=11
Congestion NodeCongestion Node(Set ECN bit)
輻輳を未然に防ぐ; 予防的制御
46
RTP・ RTP; Real-time Transport Protocol ・ RTPはEnd-Hostでのみ適用される
(*) ルータでの通信品質はOut-of-Focus ・ 基本仕様; RFC1889, RFC1890 ・ Playbackタイミングの再生
- Payload Type- Sequence Number- Time-Stamp
・ 2対のUDP Portを使用- User Data- Control Data
・ ContentごとにRTP Payload Formatを規定
47
RTP
・ RTP Payload Format 仕様RFC2029 ; CellB Video Encoding (for SUN)RFC2032 ; H.261 Video Stream RFC2035 ; JPEG-compressed VideoRFC2250 ; MPEG1/MPEG2 Video
・ Control Protocol RTCP ; RTP Control Protocol
・ 通信品質監視機能- 通信受信/送信ノード- 品質監視ノード
48
RTP
・ RTPの仕事; 「受信ノードにおいて、送信側から送信されるデータの出力タイミングを再生する。」
受信バッファ
タイミング制御
送信ノード
App
licat
ion
Generate Delay-Jitter
49
RTP
・ 送信側タイミング;
・ 受信側入力タイミング;
・ 受信側出力タイミング;
t1 t2 t3 t4 t5
d1 d2 d3 d4
t1 t2 t3 t4 t5
T T+t1 T+t2 T+t3 T+t4 T+t5
d1 d2 d3 d4Off-set
50
NAT(Network Address Translation)・ 受信パケットのIPアドレス(src_IP)およびポート番号の(src_port)変換テーブルを持ちIPヘッダの変換。(RFC1631) (1) Private → Global
- DNS : NATルータのIPアドレスが解決される。- 受信パケット(dst_IP)
→ 送信パケットの(src_IP, src_port)の書換え
(2) Global → Private - 受信パケット(src_IP, src_port)
→ 送信パケットの(dst_IP)の書換え
(*) ポート番号(src_port)の機能(i) src_IPの多重化(ii) dst_IPのマッピング
NAT
NATA C
CA CN
A C N C
入力 出力アドレス ポート アドレス ポート送信宛先送信宛先送信宛先送信宛先A ー ー ー
ー ー ーN
A→Nに変換
N→Aに変換
送信アドレス
宛先アドレス
Traditional NAT
NATA C
組織内 インターネット
AC
送信アドレス
宛先アドレス Basic NAT
100200
送信ポート番号
宛先ポート番号CN
100200
200100
NC
CA
200100
A→Nに変換
N→Aに変換
AC
送信アドレス
宛先アドレス
100200
送信ポート番号
宛先ポート番号
CN
150200
200150
NC
CA
200100
A→N、100→150に変換
N→A150→100に変換
NAPT
Bi-directional NAT
NATA C
組織内 インターネット
AC
送信アドレス
宛先アドレス
100200
送信ポート番号
宛先ポート番号
CN
100200
200100
NC
CA
200100
A→Nに変換
N→Aに変換
DNS(1) ホストAのアドレスは?(2)アドレスはN
(3)
(4)
Twice NAT
NATA C
組織内 インターネット
ANl1
Ng1C
Nl1A
CNg1
DNS(1)ホストCのアドレスは?(2)アドレスはNl1
(3)
(4)
A→NgNl1→Cに変換
Ng→AC→Nl1に変換
送信アドレス
宛先アドレス
55
NAT動作の例
192.168.3.5
NAT-R1
NAT-R2
192.20.2.24(bill.whitehouse.gov)
192.168.0.0/16192.20.0.0/16
192.168.32.1
198.29.10.23 198.30.40.50
192.20.61.1
dst=198.30.40.50src=192.168.3.5
dst=198.30.40.50src=198.30.10.23
dst=192.20.2.24 src=198.20.10.23
<Translation Table in NAT-R2>input output output
source port destination port source port destination port port 198.29.10.23 2012 198.30.40.50 n/a 190.29.10.23 n/a 192.20.2.24 n/a #1 192.20.2.24 n/a 198.29.10.23 n/a 198.30.40.50 2122 198.29.10.23 n/a #2
#2
#1
src=198.29.10.23, port=2012→ dst=192.20.2.24
(*) DNS Address resolution : bill.whitehouse.gov → 198.30.40.50
56
トランスポートレイヤ技術- TCP; Transmission Control Protocol -