View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Revisiting Ethernet:Plug-and-play made scalable
and efficient
Changhoon Kim, and Jennifer Rexfordhttp://www.cs.princeton.edu/~chkim
Princeton University
2
An “All Ethernet” Enterprise Network? “All Ethernet” makes network management easier
Zero-configuration of end-hosts and network due to Flat addressing Self-learning
Location independent and permanent addresses also simplify Host mobility Network troubleshooting Access-control policies
3
But, Ethernet bridging does not scale Flooding-based delivery
Frames to unknown destinations are flooded
Broadcasting for basic serviceBootstrapping relies on broadcasting
Vulnerable to resource exhaustion attacks
Inefficient forwarding pathsLoops are fatal due to broadcast storms; use the STPForwarding along a single tree leads to
inefficiency
4
State of the Practice: A Hybrid ArchitectureEnterprise networks comprised of Ethernet-based IP
subnets interconnected by routers
R
R
R
R
Ethernet Bridging - Flat addressing - Self-learning - Flooding - Forwarding along a tree
IP Routing - Hierarchical addressing - Subnet configuration - Host configuration - Forwarding along shortest paths
R
5
Motivation
Neither bridging nor routing is satisfactory.
Can’t we take only the best of each?
ArchitecturesFeatures
EthernetBridging
IPRouting
Ease of configuration Optimality in addressing Mobility support Path efficiency Load distribution Convergence speed Tolerance to loop
SEIZE (Scalable and Efficient Zero-config Enterprise)
SEIZE
6
Overview
Objectives SEIZE architecture Evaluation Conclusions
7
Overview: Objectives
ObjectivesAvoiding floodingRestraining broadcastingKeeping forwarding tables smallEnsuring path efficiency
SEIZE architecture Evaluation Conclusions
8
Avoiding Flooding
Bridging uses flooding as a routing schemeUnicast frames to unknown destinations are flooded
Does not scale to a large network
Objective #1: Unicast unicast trafficNeed a control-plane mechanism to discover and
disseminate hosts’ location information
“Send it everywhere! At least, they’ll learn where the source is.”
“Don’t know where destination is.”
9
Restraining Broadcasting
Liberal use of broadcasting for bootstrapping(DHCP and ARP)Broadcasting is a vestige of
shared-medium EthernetVery serious overhead in
switched networks
Objective #2: Support unicast-based bootstrapping Need a directory service
Sub-objective #2.1: Support general broadcastHowever, handling broadcast should be more scalable
10
Keeping Forwarding Tables Small Flooding and self-learning lead to unnecessarily
large forwarding tablesLarge tables are not only inefficient, but also dangerous
Objective #3: Install hosts’ location information only when and where it is neededNeed a reactive resolution schemeEnterprise traffic patterns are better-suited to reactive
resolution
11
Ensuring Optimal Forwarding Paths Spanning tree avoids broadcast storms.
But, forwarding along a single tree is inefficient.Poor load balancing and longer pathsMultiple spanning trees are insufficient
and expensive
Objective #4: Utilize shortest pathsNeed a routing protocol
Sub-objective #4.1: Prevent broadcast stormsNeed an alternative measure to prevent broadcast
storms
12
Backwards Compatibility Objective #5: Do not modify end-hosts
From end-hosts’ view, network must work the same way
End hosts should Use the same protocol stacks and applications Not be forced to run an additional protocol
13
Overview: Architecture
Objectives SEIZE architecture
Hash-based location managementShortest-path forwardingResponding to network dynamics
Evaluation Conclusions
14
SEIZE in a Slide Flat addressing of end-hosts
Switches use hosts’ MAC addresses for routing Ensures zero-configuration and backwards-compatibility (Obj # 5)
Automated host discovery at the edge Switches detect the arrival/departure of hosts Obviates flooding and ensures scalability (Obj #1, 5)
Hash-based on-demand resolution Hash deterministically maps a host to a switch Switches resolve end-hosts’ location and address via hashing Ensures scalability (Obj #1, 2, 3)
Shortest-path forwarding between switches Switches run link-state routing with only their own connectivity info Ensures data-plane efficiency (Obj #4)
15
How does it work?
Host discovery or registration
B
D
x y
Hash(F(x) = B)
Store<x, A> at B
Traffic to x
Hash(F(x) = B)
Tunnel to egress node, A
Deliver to x
Switches
End-hosts
Control flowData flow
Notifying<x, A> to D
Entire enterprise(A large single IP subnet) LS core
E
Optimized forwarding directly from D to AC
A
Tunnel to relay switch, B
16
Terminology
Ingress
Relay (for x)
Egress
xy
B
A
DstSrc< x, A >
< x, A >
< x, A >
D
Ingress appliesa cache eviction policyto this entry
cut-through forwarding
17
Responding to Topology Changes
Consistent Hash [Karger et al.,STOC’97] minimizes re-registration
A
B
CD
E
F
hh
h
h
hh
h
h
h
h
18
Single Hop Look-up
A
B
CD
F(x)
xy
y sends traffic to x
E
Every switch on a ring islogically one hop away
19
Responding to Host Mobility
Relay (for x)
xy
B
A
Src< x, A >
< x, A >
< x, A >
D
when cut-throughforwarding is used
G
< x, G >
Old Dst
New Dst
< x, G >
< x, G >
< x, G >
20
Unicast-based Bootstrapping
ARP Ethernet: Broadcast requests SEIZE: Hash-based on-demand address resolution
Exactly the same mechanism as location resolution Proxy resolution by ingress switches via unicasting
DHCPEthernet: Broadcast requests and repliesSEIZE: Utilize DHCP relay agent (RFC 2131)
Proxy resolution by ingress switches via unicasting
21
Overview: Evaluation
Objectives SEIZE architecture Evaluation
Scalability and efficiencySimple and flexible network management
Conclusions
22
Control-Plane Scalability When Using Relays Minimal overhead for disseminating host-location
informationEach host’s location is advertised to only two switches
Small forwarding tablesThe number of host information entries over all switches
leads to O(H), not O(SH)
Simple and robust mobility supportWhen a host moves, updating only its relay sufficesNo forwarding loop created since update is atomic
23
Data-Plane Efficiency w/o Compromise Price for path optimization
Additional control messages for on-demand resolutionLarger forwarding tablesControl overhead for updating stale info of mobile hosts
The gain is much bigger than the costBecause most hosts maintain a small, static
communities of interest (COIs) [Aiello et al., PAM’05]
Classical analogy: COI ↔ Working Set (WS);Caching is effective when a WS is small and static
24
Evaluation: Prototype Implementation
Link-state routing: eXtensible Open Router Platform [Handley et al., NSDI’05]
Host information management and traffic forwarding: The Click modular router [Kohler et al., TOCS’00]
Host info. registrationand notification messages
Click
XORP
OSPFDaemon
RingManager
Host InfoManager
SeizeSwitch
Link-state advertisementsfrom other switches
Data Frames Data Frames
RoutingTable
NetworkMap
ClickInterface
25
Evaluation: Set-up and Models
Emulation on Emulab
Test Network Configuration
Test Traffic LBNL internal packet traces [Pang et al., IMC’05]
17.8M packets from 5,128 hosts across 22 subnets Real-time replay
Models tested Ethernet w/ STP, SEIZE w/o path opt., and SEIZE w/ path opt. Inactive timeout-based eviction: 5 min ltout, 1 ~ 60 sec rtout
SW2
SW1SW0
SW3
N0
N2 N3
N1
26
Overall Comparison
Data-planeEfficiency
Control-planeScalability
Low Cost
27
Sensitivity to Cache Eviction Policy
Effect of Cache Entry Timeout
0.000
0.200
0.400
0.600
0.800
1.000
1 5 10 30 60
Timeout Values for Cached Entries (sec)
Ratio to Eth-STP
0
20,000
40,000
60,000
80,000
100,000
Counts
stretch (left)
# control pkts (right)
table size (right)
28
Some Unique Benefits
Optimal load balancing via relayed deliveryFlows sharing the same ingress and egress switches
are spread over multiple indirect pathsFor any valid traffic matrix, this practice guarantees
100% throughput with minimal link usage[Zhang-Shen et al., HotNets’04/IWQoS’05]
Simple and robust access controlEnforcing access-control policies at relays makes policy
management simple and robustWhy? Because routing changes and host mobility do not
change policy enforcement points
29
Conclusions SEIZE is a plug-and-playable enterprise
architecture ensuring both scalability and efficiency
Enabling design choicesHash-based location managementReactive location resolution and cachingShortest-path forwarding
LessonsTrading a little data-plane efficiency for huge control-
plane scalability makes a qualitatively different systemTraffic patterns (small static COIs, and short flow
interarrival times) are our friends
30
Future Work
Enriching evaluationVarious topologiesDynamic set-ups (topology changes, and host mobility)
Applying reactive location resolution to other networksThere are some routing systems that need to be slimmer
GeneralizationHow aggressively can we optimize control-plane without
losing data-plane efficiency?
31
Thank you.
Full paper is available athttp://www.cs.princeton.edu/~chkim
Backup Slides
33
Group-based Broadcasting
SEIZE uses per-group multicast tree
34
Group-based Access Control
Relay switches enforce inter-group access policies The idea
Allow resolution only when the access policy between a resolving host’s group and a resolved host’s group permits access
35
Simple and Flexible Management Using only a number of powerful switches as relays?
Yes, a pre-hash can generate a set of identifiers for a switch
Applying cut-through forwarding selectively? Yes, ingress switches can adaptively decide which policy to use
(E.g., no cut-through forwarding for DNS look-ups)
Controlling (or predicting) a switch’s table size? Yes, pre-hashing can determine the number of hosts for which a
switch provides relay service The number of directly connected hosts to a switch is also usually
known ahead of time
Traffic engineering? Yes, adjusting link weights works effectively
36
Control Overhead
Number of Control Packets
89.5
34.615.55.4
335.3
0
100
200
300
Eth-STP SEIZE/no-opt SEIZE/opt(1) SEIZE/opt(10) SEIZE/opt(60)
Thousands ofPackets
37
Host Information Replication Factor
Size of Forwarding Tables
15,492
6,945 6,939
5,284 5,275
466
0
10,000
20,000
30,000
Eth-STP SEIZE/no-opt SEIZE/opt(10)
Num. of Entries
SEIZE/Remote-Cache
SEIZE/Remote-Auth
SEIZE/Local
Eth/Regular
Max = NH
min = H
RF 2.23
RF 1.762H RF 1.83
38
Path Efficiency
Number of Packets Forwarded
22,5M
17,8M
22,2M
0
5
10
15
20
25
Eth-STP SEIZE/no-opt SEIZE/opt(10)
Millions ofPackets
+ 2%
+ 29%+ 27%
Optimum
39
Understanding Traffic Patterns
40
Understanding Traffic Patterns - cont’d
41
Evaluation: Prototype Implementation
Click
XORP
OSPFDaemon
RingMgr
HostInfoMgr
SeizeSwitch
Link State Advertisementsfrom other switches
Host info. registrationand optimization msgs
Data Frames Data Frames
IP Forwarding
RIBDClickFEA
42
Prototype: Inside a Click Process
Strip(14)
FromDevice(em0) FromDevice(em1)
ToDevice(em0) ToDevice(em1)
CheckIPHeader(…)
ProcessIPMisc(…) ProcessIPMisc(…)
ARPQuerier(…) ARPQuerier(…)
LookupIPRoute(…)
to ARPResponder or ARPQuerier
Classifier(…)ARP IP
SeizeSwitch(…)
FromDevice(eth0) FromDevice(eth1)
ToDevice(eth0) ToDevice(eth1)
IPClassifier(…)ARP
Classifier(…)ARP IP
to ARPResponder or ARPQuerier
to upper layer
IP
Strip(20)
OthersClassifier(…)
IP Proto SEIZE
Others
Strip(14)
43
store or update<srcmac, in-port>
in host-tablec-hash srcmac,
get a relay node rn
notify <srcmac, my-IP>
to rn
to
Source Learning
Inside a SeizeSwitch Element
IP Forwarding
look uprouting table
send downto L2
send up toL4
stripIP header
c-hash dstmac,get a relay node rn
encapsuate with<my-IP, rn>,
set proto to SEIZE
encapsuate with<my-IP, egress-IP>,set proto to SEIZE
send outto interface
to to
to
inform ingress of<dstmac, egress-IP>
Layer 2 Layer 3
stripEthernet header
is dstmacon host-table?
yesno
yesis dstmaclocal?
no
proto == SEIZE ?
yesno
is dstIP me? yesno
get egress-IPof dstmac,
from host table
yes nocontrolmessage?
is dstmac meor broadcast?
yesnoapply tohost table
L2 Data Forwarding
L2 Control
to
EthFrame<srcmac, dstmac> departs
EthFrame<srcmac, dstmac> arrives
44
Control Plane: Single Hop DHT
1 2
3
45
I
H
GD
C
A
B
K
E
F
LJ
6
E, K, L
A, J, IB, G
C, H D, F
1’s LOCAL
1’s REMOTE_AUTH
3 Registers L
1 Forgets L 2 Registers F
45
Temporal Traffic Locality
46
Spatial Traffic Locality
47
Failover Performance
Time/Sequence GraphSequenceNum. [KB]
50 150 250 350 450 550 650 Time (s)
SWdown
SWup
New STbuilt
New STbuilt
100,000
50,000
Time/Sequence GraphSequenceNum. [KB]
100,000
50,000
50 100 150 Time (s)
Relaydown
Relayup
OSPF cnvg &host registration
OSPF cnvg &host registration