Upload
trannhan
View
222
Download
0
Embed Size (px)
Citation preview
2
Connecting devices
Connectingdevices
Connectingdevices
Networkingdevices
Networkingdevices
Internetworkingdevices
Internetworkingdevices
Applicationgateway
ApplicationgatewayRouterRouterBridge/
Switch
Bridge/Switch
Hub/Repeater
Hub/Repeater
L1 L2 L3 L4-L7
3
IEEE 802 vs IPv4 addresses
1011110110111101
Group/Individual bit
Global/Localbit
1011110101110101 1011110110111101 1011110101110101 1011110110111101 1011110101110101
vendor code vendor assigned
IEEE802
IPv4 addr1011110111000000 1011110100100100 1011110101111101 1011110100010010
netid hostid
192.36.125.18
00:0E:35:64:E9:E7
IP addresses have an hierarchical structure: the netid is used to subnet.
MAC addresses have a flat structure.
4
Routing vs bridging
•Bridging - forwarding on layer 2–A MAC address/ID has a flat structure
• Many nodes -> large forwarding tables
• Broadcast reaches all nodes
–Simple to configure and manage, cheaper–Loops detected by spanning tree protocol
•Routing – forwarding on layer 3–The netid of the IP addresses can be aggregated
• Many nodes -> smaller forwarding tables than bridging
• Routers partition broadcast domains
–Routing is more difficult to configure–Loops detected by routing protocols and TTL decrementation
5
What does a router do?
• Packet forwarding• Not only IPv4:
–IPv6, MPLS, Bridging/VLAN, Tunneling,...• Filter packets - Access lists• Classification• Metering/Policing/Shaping• Compute routes: build forwarding table• In the background: routing• In real-time: forwarding
Classifier Lookup Metering Shaping
ACL - Access Lists for dropping packets
Metering - Measure traffic characteristics of a 'flow': rate in terms of packet-per-second and bits-per-second
Policing - If the rate is higher than a threshold, drop packets. Alternatively, packets can be 'marked' as out-of-norm and potentially dropped at a later stage. For example, packets being marked as out-of-profile atthe edge may be dropped within a network if congestion is detected.
Shaping - Actively changing the rate of traffic, making it comply to a specific rate. Often use token-bucket / leaky-bucket mechanisms or most simply a queue. Shaping is more complex to do than policing.
6
Inside a software-based router
•This is a regular computer architecture (eg PC)
•Every packet goes twice over the shared bus
•Constrained by Bus and memory bandwidth (per byte cost)
•And CPU cycles (per packet cost)
LineCard
LineCard
LineCard
BufferMemoryCPU RIB
Shared bus backplane
First generation routers were in fact regular computers
7
Line Card
BufferMemory
forwarder
Line Card
BufferMemory
forwarder
Line Card
BufferMemory
forwarder
Line Card
BufferMemory
forwarder
Inside a hardware-based router
•Multiple simultaneous transfers over the backplane
•Specialized hardware: ASICs (Application Specific IC)
•Wirespeed at 100 Gb/s and beyond
CPU
RIB
CPU Card
Switched backplane
8
CPU
RoutingTable
Memory
Fast path, slow path
•Fast path
–If line cards can determine outgoing port
•Slow path
–Control processor must determine outgoing port
Control Processor
LineCard
LineCard
LineCard
LineCard
Fast path
Slow path
9
Routing table lookup •Longest prefix first
•Divide table in 32 ”buckets” - one for each netmask length
•Match destination with longest prefixes first
•SW algorithms: tree, binary trees, tries (different data structures)
•HW support: TCAMs – Content Addressable Memory
Netid
Netid
...
0
1
32
31
Masklen
destination IP address
The shift to classless addressing led to changes to the routing table organization and routing algorithms since the netid length is variable. You may have several matches and you need to select the longest prefix.
How much time can you spend on lookups? Example: 10 Gb/s line rate and 40 bytes packet size: 31.25 ns to do sequential lookup (comparable with SRAM memory speed)
The example above shows a 'bucket' approach to lookup. Organize the routing table in buckets based on prefix length and check in longer buckets first. This is only slightly better than a linear search.
10
Using a Trie for lookup
•Binary tree
–Nodes are prefixes
–Left branch represents ´0´in the string
–Right branch represents ´1´
e
011*
f g
c01*
0*
a*
10*
110* d
b
1*
0010 0110 0111
a *
b 10*
c 01*
d 110*
e 0010
f 0110
g 0111
00*
000*
11*
A much better algorithm is to organize the routing table into a binary trie. There are at most four lookups required in the example above.
11
Elimination of Internal Prefixes
•No overlapping prefixes
•Prefix expansion with ”leaf pushing”
•Simplifies lookup at expense of larger memory
a a e a
00*
c c f g
01*
*
b b b b
10*
d d a a
11*
a *
b 10*
c 01*
d 110*
e 0010
f 0110
g 0111
Tries are often modified for optimal lookup using compression and expansion. The above is one example. Only one lookup required.
12
TCAM
Linear Search on Values—TCAM
•Ternary Content-Addressable Memory
–Fully associative memory
•Three values for each bit—’0’, ’1’, and ’x’ (don’t care)
•Compare input with all words in parallel
–First match gives the result
•Up to 100 million searches per second
a *
b 10*
c 01*
d 110*
e 0010
f 0110
g 0111
0010 gfedcba
01100111110x01xx10xxxxxx
input
=
=
=
=
=
=
=
Example of hardware supported lookups - not a software-based method.
13
Packet classification
•Map a packet to a class
•Class defined by filters, usually a 5-tuple: –<source IP, destination IP, source port, destination port, protocol>
•For example, all packets:
–From subnet N
–To TCP port 80 on web-server S
–From subnet N to port 666 on subnet M
•Applications:–Firewall & NAT
–Blocking
–Accounting
–Policy routing
–QoS—metering, policing, DiffServ marking, ...
This is called Forwarding Equivalence Class (FEC) in MPLS-speak.
14
Examples of commercial routers
6ft
19”
2ft
Capacity: 1.28 Tb/sPower: 4.7 kW
Juniper M160
3ft
2.5ft
19”
Capacity: 80Gb/sPower: 2.6kW
Cisco 12816
This slide is somewhat dated,...
17
Open source routing•Linux and BSD platforms•Most routing protocols exist as open source projects (eg Quagga)•But PC hardware has traditionally been a limiting factor•But now up to
–4x12 core CPUs, –inter-processor buses (HT, QPI), –non-uniform memory (numa),–I/O buses (PCI-E), –10Gbps NICs enables 10s of gigabit forwarding speeds.
•Example: the Bifrost open source router (UU/KTH)
A router as the above routes all Uppsala University IPv4 and IPv6 traffic at 2x10Gb/s speed.
18
Example: PC routing architecture
CPU 0
CPU 1
CPU 2
CPU 3
CPU 4
CPU 5
CPU 6
CPU 7
DDR3
DDR3
DDR3 DDR3
DDR3
DDR3
QPI
I/OHandler(North Bridge)
I/OHandler(North Bridge)
QPI QPI
PCI-Ex16
x4
x16
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
PCI-Ex16
x4
x16
• Multi-core CPUs: (Intel Nehalem) 8 cores, 16 with 'Hyper-threading'• Multi-channel: each network card has 8 DMA queues• NUMA: Non-local memory (many memory banks)• Inter-processor bus: QPI 2.4GHz ~76 GB/s• Memory: 1066 DDR3 68 GB/s x3 channels• I/O Bus: PCI-E gen2 x1 ~4GB/s: x16 ~64GB/s
New Intel (sandy bridge) and AMD (bulldozer) provides even more CPU cores.
19
Homework 4
•Make a programming assignment in C•Part 1: Print out IPv4 destination address•Part 2: Make an IPv4 forwarding lookup
–Mandatory: 3 bonus points
•Part 3: Same as part-2 with non-trivial lookup (time limit)–Optional: 2 bonus points
20
Homework 4: Part 1
You should read an Ethernet frame, identify it as an IPv4 packet, and print the IPv4 destination address.
Input: Ethernet packet. Example:0200 0000 00110200 000c 0001 0800 4500 0026 17d4 0000 ff01 8ffc 0a01 0002 0a02 0002 0000 e802 c04b 0004 3e89 339a 0786 d0ff 0009
Output: IPv4 address.Example:10.2.0.2
Errors:Error: packet too short: length of frame in bytesError: Not ipv4 payload: payload type
21
Homework 4: Part 2•The program should read a forwarding table and an Ethernet packet and extract the destination IP address, make a lookup in the forwarding table, and write the outgoing interface name.
•The assignment is a step towards a full forwarding but lacks several sanity checks, MAC address lookups and ARP. It is intended to illustrate how to inspect packet header, the use of pointers, buffers, and IP longest prefix match.
•The program should do the following:
Read a routing table from stdin. The routing table consists of a list of prefix, nexthop interface triples.
Read a single Ethernet (RFC894) packet from stdin.
Verify that the packet is long enough to contain an EThernet and IPv4 header
Verify that the Ethernet payload type is 0x0800 (IPv4)
Verify that the IP version field is 4
Extract the destination address from the IPv4 header and make a longest prefix match lookup and return the outgoing interface name.
•Example: Inputfib 10.1.0.0/24 e1
fib 10.2.0.0/24 e2
fib 10.3.0.0/24 e3
fib 0.0.0.0/0 e1
input 0200 0000 0001 0200 0000 0011 0800 4500 0026 17d4 0000 ff01 8ffc 0a01 0002 0a02 0002 0000 e802 c04b 0004 3e89 339a 0786 d0ff 0009
Example Output:e2
22
Homework 4: Part 3
•Same as part 1 with two changes:1) Read not only one but several packets2) Time-limit on lookup
The time-limit is set so that you cannot just have a simple linear lookup. The test-case used
Part 3 is optional
23
Homework 4: Kattis•If you have registered, you will get a Kattis account •Use the link on the homework page and login•Submit by selecting
language: CSelect problem: forwarding (part 1), forwarding2 (part2), forwarding3 (part3)upload the fileSubmit
•You can see the status on the web-pageCompile-errorRuntime errorWrong outputOK
•You will also get a mail• Submit solution electronically, or on paper lab assistants or course leader before the deadline.• Append a receipt that you passed both forwarding and forwarding2 (& forwarding 3) test of Kattis.
Read the instructions for the kattis assignment thoroughly. Kattis is a machine and is very picky about details: Extra spaces, capital letters, etc, are significant.
24
Extracting correct info•The Ethernet header is 14 bytes
–payload type is in bytes 13-14–IPv4 is 0x0800
•The IP header is 20 bytes (without options)–The destination IP address is in bytes 17-20
struct ethhdr{
char da[6], sa[6];
uint16_t pt;
};
struct iphdr{
unsigned int ip_v:4,ip_hl:4; /* version, header length
uint8_t ip_tos; /* type of service */
uint16_t ip_len; /* total length */
uint16_t ip_id; /* identification */
uint16_t ip_off; /* fragment offset field */
uint8_t ip_ttl; /* time to live */
uint8_t ip_p; /* protocol */
uint16_t ip_sum; /* checksum */
uint32_t ip_src, ip_dst; /* source and dest address */
};
25
Byte ordering / Endianness
CPU:s represent numbers they load/store from memory differently
–Most significant byte in first byte: Big-endian (Big end first)–Most significant byte in last byte: Little-endian–There is also middle-endian and bi-endian
•Intel PCs are little-endian. •The Kattis machine is big-endian (Sun Sparc)
0A0B0C0D 0A0B0C0DRegister Register
n
n+1
n+2
n+3
Big-EndianLittle-Endian
Memory
0A0B0C0D
0A0B0C0D
26
Network byte order
•The way the CPU stores/loads numbers from memory is called host byte order•But in communication system, we sometimes have to transfer numbers in binary format (character arrays is not a problem)•We have to agree on a format to encode numbers•This is called network-byte order
–In IP network-byte order is big-endian
•Therefore, in portable code, if you transfer binary numbers between nodes, always translate between host-byte order and network-byte order.•BSD has the following help functions:
–htonl, ntohl (4-byte numbers)–htons, ntohs (2-byte numbers)
27
Alignment
•Data structures must be aligned in memory when accessed as several bytes. •In particular 2-byte, 4-byte, 8-byte numbers must be aligned on word boundaries
–Otherwise a bus error occurs (in serious cases, eg SPARC)–Or a performance degradation (as in x86)
•Typically, –2-byte numbers must be 2-byte aligned–4-byte numbers must be 4-byte aligned–Etc
•In Eth+IP, the Eth header is 14 bytes which makes the IP header misaligned (actually, the fields of the IP header)
28
Alignment example
0A0B0C0D 99100101102103104105
Memory
OK
0A0B0C0D 97 98 99100101102103
Memory
BUS ERROR!
0A0B0C0D
0A0B0C0D