Eytan ModianoSlide 1
Packet Multiple Access
Eytan ModianoMassachusetts Institute of Technology
Eytan ModianoSlide 2
Multiple Access
• Shared Transmission Medium– a receiver can hear multiple transmitters– a transmitter can be heard by multiple receivers
• The major problem with multi-access is allocating the channelbetween the users; the nodes do not know when the other nodeshave data to send
– Need to coordinate transmissions
Eytan ModianoSlide 3
Examples of Multiple Access Channels
• Local area networks (LANs)– Traditional Ethernet– Recent trend to non-multi-access LANs
• satellite channels
• Multi-drop telephone
• Wireless radio
NET
DLC
PHY
MAC
LLC
• Medium Access Control (MAC)– Regulates access to channel
• Logical Link Control (LLC)– All other DLC functions
Eytan ModianoSlide 4
Approaches to Multiple Access
• Fixed Assignment (TDMA, FDMA, CDMA)– each node is allocated a fixed fraction of bandwidth– Equivalent to circuit switching– very inefficient for low duty factor traffic
• Contention systems– Polling
– Reservations and Scheduling
– Random Access
Eytan ModianoSlide 5
Aloha
Single receiver, many transmitters
Receiver
Transmitters
....
E.g., Satellite system, wireless
Eytan ModianoSlide 6
Slotted Aloha
• Time is divided into “slots” of one packet duration– E.g., fixed size packets
• When a node has a packet to send, it waits until the start of thenext slot to send it
– Requires synchronization• If no other nodes attempt transmission during that slot, the
transmission is successful– Otherwise “collision”– Collided packet are retransmitted after a random delay
Eytan ModianoSlide 7
Slotted Aloha Assumptions
• Poisson external arrivals• No capture
– Packets involved in a collision are lost– Capture models are also possible
• Immediate feedback– Idle (0) , Success (1), Collision (e)
• If a new packet arrives during a slot, transmit in next slot• If a transmission has a collision, node becomes backlogged
– while backlogged, transmit in each slot with probability qr untilsuccessful
• Infinite nodes where each arriving packet arrives at a new node– Equivalent to no buffering at a node (queue size = 1)– Pessimistic assumption gives a lower bound on Aloha performance
Eytan ModianoSlide 8
0 1 2 3
P
P
PP34
10
03
13
Markov chain for slotted aloha
• state (n) of system is number of backlogged nodes.
pi,i-1 = prob. of one backlogged attempt and no new arrival
pi,i =prob. of one new arrival and no backlogged attempts or nonew arrival and no success
pi,i+1= prob of one new arrival and one or more backlogged attempts
pi,i+j = Prob. Of J new arrivals and one or more backlogged attemptsor J+1 new arrivals and nobacklogged attempts
• Steady state probabilities do not exists– Backlog tends to infinity => system unstable– More later
Eytan ModianoSlide 9
slotted aloha
• let g(n) be the attempt rate (the expected number of packetstransmitted in a slot) in state n
g(n) = λ + nqr
• The number of attempted packets per slot in state n isapproximately a Poisson random variable of mean g(n)
– P (m attempts) = g(n)me-g(n)/m!– P (idle) = probability of no attempts in a slot = e-g(n)
– p (success) = probability of one attempt in a slot = g(n)e-g(n)
– P (collision) = P (two or more attempts) = 1 - P(idle) - P(success)
Eytan ModianoSlide 10
Throughput of Slotted Aloha
• The throughput is the fraction of slots that contain a successfultransmission = P(success) = g(n)e-g(n)
– When system is stable throughput must also equal the externalarrival rate (λ)
– What value of g(n)maximizes throughput?
– g(n) < 1 => too many idle slots– g(n) > 1 => too many collisions– If g(n) can be kept close to 1, an external arrival rate of 1/e packets
per slot can be sustained
d
dg(n)g(n)e!g( n) = e!g( n) ! g(n)e!g( n) = 0
" g(n) = 1
" P(success) =g(n)e!g( n) = 1/ e# 0.36
Eytan ModianoSlide 11
Instability of slotted aloha
• if backlog increases beyond unstable point (bad luck) then it tendsto increase without limit and the departure rate drops to 0
• Drift in state n, D(n) is the expected change in backlog over onetime slot
– D(n) = λ - P(success) = λ - g(n)e-g(n)
Eytan ModianoSlide 12
Stabilizing slotted aloha
• choosing qr small increases the backlog at which instabilityoccurs ( since g(n) = λ + nqr), but also increases delay (since meanretry time is 1/qr)
• solution: estimate the backlog (n) from past feedback– Given the backlog estimate, choose qr to keep g(n) = 1
Assume all arrivals are immediately backlogged g(n) = nqr , P(success) = nqr (1-qr)n-1
To maximize P(success) choose qr = min{1,1/n}– When the estimate of n is perfect:
idles occur with probability 1/e,successes with 1/e, andcollisions with 1-2/e.
– When the estimate is too large, too many idle slots occur– When the estimate is too small, too many collisions occur
• Nodes can use feedback information (0,1,e) to make estimates– A good rule is increase the estimate of n on each collision, and to
decrease it on each idle slot or successful slot note that the increase on a collision should be (e-2)-1 times as large as the
decrease on an idle slot
Eytan ModianoSlide 13
stabilized slotted aloha
• assume all arrivals are immediately backlogged– g(n) = nqr = attempt rate– p(success) = nqr (1-qr)n-1
for max throughput set g(n) = 1 => qr = min{1,1/n’}where n’ is the estimate of n
– Let nk = estimate of backlog after kth slot
max {λ, nk+λ-1} idle or successnk+1 =
nk+λ+(e-2)-1 collision
– Can be shown to be stable for λ < 1/e
Eytan ModianoSlide 14
TDM vs. slotted aloha
• Aloha achieves lower delays when arrival rates are low• TDM results in very large delays with large number of users, while
Aloha is independent of the number of users
0 0.2 0.4 0.6 0.8
ARRIVAL RATE
4
8
DELAY
ALOHA
TDM, m=8
TDM, m=16
Eytan ModianoSlide 15
Pure (unslotted) Aloha
• New arrivals are transmitted immediately (no slots)– No need for synchronization– No need for fixed length packets
• A backlogged packet is retried after an exponentially distributedrandom delay with some mean 1/x
• The total arrival process is a time varying Poisson process of rateg(n) = λ + nx (n = backlog, 1/x = ave. time between retransmissions)
• Note that an attempt suffers a collision if the previous attempt is notyet finished (ti-ti-1<1) or the next attempt starts too soon (ti+1-ti<1)
t t t1 2 3
t4
t5
Retransmission
New Arrivals
43! !
Collision
Eytan ModianoSlide 16
Throughput of Unslotted Aloha
• An attempt is successful if the inter-attempt intervals on bothsides exceed 1 (for unit duration packets)
– P(success) = e-g(n) e-g(n) = e-2g(n)
– Throughput (success rate) = g(n) e-2g(n)
– For max throughput at g(n) = 1/2, Throughput = 1/2e ~ 0.18
– Stabilization issues are similar to slotted aloha
– Advantages of unslotted aloha are simplicity and possibility ofunequal length packets
Eytan ModianoSlide 17
Splitting Algorithms
• More efficient approach to resolving collisions– Simple feedback (0,1,e)– Basic idea: assume only two packets are involved in a collision
Suppose all other nodes remain quiet until collision is resolved, andnodes in the collision each transmit with probability 1/2 until one issuccessful
On the next slot after this success, the other node transmits
The expected number of slots for the first success is 2, so the expectednumber of slots to transmit 2 packets is 3 slots
Throughput over the 3 slots = 2/3
– In practice above algorithm cannot really work Cannot assume only two users involved in collision Practical algorithm must allow for collisions involving unknown number
of users
Eytan ModianoSlide 18
Tree algorithms
• After a collision, all new arrivals and all backlogged packets notin the collision wait
• Each colliding packet randomly joins either one of two groups(Left and Right groups)
– Toss of a fair coin– Left group transmits during next slot while Right group waits
If collision occurs Left group splits again (stack algorithm) Right group waits until Left collision is resolved
– When Left group is done, right group transmits(1,2,3,4)
(1,2,3)
4
successcollision
1
success
(2,3)
collision
idle
collision
(2,3)
2 3
success success
Notice that after the idle slot, collision between (2,3) was sure to happen and could have been avoided
Many variations and improvementson the original tree splitting algorithm
Eytan ModianoSlide 19
Throughput comparison
• stabilized pure aloha T = 0.184 = (1/(2e))
• stabilized slotted aloha T = 0.368 = (1/e)
• Basic tree algorithm T = 0.434
• Best known variation on tree algorithm T = 0.4878
• Upper bound on any collision resolution algorithm with (0,1,e)feedback T <= 0.568
• TDM achieves throughputs up to 1 packet per slot, but the delayincreases linearly with the number of nodes
Eytan ModianoSlide 20
Carrier Sense Multiple Access (CSMA)
• In certain situations nodes can hear each other by listening to thechannel - “Carrier Sensing”
• CSMA: Polite version of Aloha– Nodes listen to the channel before they start transmission
Channel idle => Transmit Channel busy => Wait (join backlog)
– When do backlogged nodes transmit?
When channel becomes idle backlogged nodes attempt transmission withprobability qr= 1
Persistent protocol, qr= 1
Non-persistent protocol, qr< 1
Eytan ModianoSlide 21
CSMA
• Let τ = the maximum propagation delay on the channel– When a node starts/stops transmitting, it will take this long for all nodes
to detect channel busy/idle
• For initial understanding, view the system as slotted with "mini-slots" of duration equal to the maximum propagation delay
– Normalize the mini-slot duration to β = τ/Dtp and packet duration = 1
• Actual systems are not slotted, but this hypothetical systemsimplifies the analysis and understanding of CSMA
! <"">
minislotspacket
<----------- 1 ---------------->
Eytan ModianoSlide 22
Rules for slotted CSMA
• When a new packet arrives– If current mini-slot is idle, start transmitting in the next mini-slot– If current mini-slot is busy, node joins backlog– If a collision occurs, nodes involved in collision become backlogged
• Backlogged nodes attempt transmission after an idle mini-slotwith probability qr < 1 (non-persistent)
– Transmission attempts only follow an idle mini-slot– Each”busy-period” (success or collision) is followed by an idle slot
before a new transmission can begin
• Time can be divided into epochs:– A successful packet followed by an idle mini-slot (duration = β+1)– A collision followed by an idle mini-slot (duration = β+1)– An idle minislot (duration = β)
Eytan ModianoSlide 23
�Analysis of CSMA
• Let the state of the system be the number of backlogged nodes
• Let the state transition times be the end of idle slots– Let T(n) = average amount of time between state transitions when the
system is in state nT(n) = β + (1 - e-λβ (1-qr)n)
When qr is small (1-qr)n ~ e-qrn => T(n) = β + (1 - e-λβ−nq
r )
• At the beginning of each epoch, each backlogged node transmitswith probability qr
• New arrivals during the previous idle slot are also transmitted
• With backlog n, the number of packets that attempt transmissionat the beginning of an epoch is approximately Poisson with rate
g(n) = λβ + nqr
Eytan ModianoSlide 24
Analysis of CSMA
• The probability of success (per epoch) is
Ps = g(n) e-g(n)
• The expected duration of an epoch is approximately
T(n) ~ β + (1 - e-g(n) )
• Thus the success rate per unit time is
! < departure rate=g(n)e" g( n)
# +1" e" g( n)
Eytan ModianoSlide 25
Maximum Throughput for CSMA
• The optimal value of g(n) can again be obtained:
• Tradeoff between idle slots and time wasted on collisions
• High throughput when β is small
• Stability issues similar to Aloha (less critical)
Arrival rate
Departure rate1-!2!
!!2
g(n) = + nq"!r
g(n) ! 2" ! <1
1+ 2"
Eytan ModianoSlide 26
Unslotted CSMA
• Slotted CSMA is not practical– Difficult to maintain synchronization– Mini-slots are useful for understanding but not critical to the
performance of CSMA
• Unslotted CSMA will have slightly lower throughput due toincreased probability of collision
• Unslotted CSMA has a smaller effective value of β than slottedCSMA
– Essentially β becomes average instead of maximum propagationdelay
Eytan ModianoSlide 27
CSMA/CD
• CSMA with Collision Detection (CD) capability– Nodes able to detect collisions– Upon detection of a collision nodes stop transmission
Reduce the amount of time wasted on collisions
• Protocol:
– All nodes listen to transmissions on the channel
– When a node has a packet to send: Channel idle => Transmit Channel busy => wait a random delay (binary exponential backoff)
– If a transmitting node detects a collision it stops transmission Waits a random delay and tries again
Two way cable
WS WS WS WS WS WS
Eytan ModianoSlide 28
Time to detect collisions
• A collision can occur while the signal propagates between the twonodes
• It would take an additional propagation delay for both users todetect the collision and stop transmitting
• If τ is the maximum propagation delay on the cable then if acollision occurs, it can take up to 2τ seconds for all nodesinvolved in the collision to detect and stop transmission
WS WSττ = prop delay
Eytan ModianoSlide 29
Approximate model for CSMA/CD
• Simplified approximation for added insight
• Consider a slotted system with “mini-slots” of duration 2τ
• If a node starts transmission at the beginning of a mini-slot, by theend of the mini-slot either
– No collision occurred and the rest of the transmission will beuninterrupted
– A collision occurred, but by the end of the mini-slot the channelwould be idle again
• Hence a collision at most affects one mini-slot
2τ <−−>
minislotspacket<----------- 1 ---------------->
Eytan ModianoSlide 30
Analysis of CSMA/CD
• Assume N users and that each attempts transmission during afree “mini-slot” with probability p
– P includes new arrivals and retransmissions
P(i users attempt) = N
i
!
" # # $
% & & P
i(1' P )
N'i
P(exactly 1 attempt) = P(success) = NP(1-P )N-1
To maximize P(success),
d
dp[NP(1- P )
N- 1] = N(1-P )
N-1 'N(N'1)P(1' P )N'2
= 0
(Popt =1
N
( Average attempt rate of one per slot
( Notice the similarity to slotted Aloha
Eytan ModianoSlide 31
Analysis of CSMA/CD, continued
• Once a mini-slot has been successfully captured, transmissioncontinues without interruption
• New transmission attempts will begin at the next mini-slot afterthe end of the current packet transmission
P(success)=NP(1- p)N-1
= (1!1
N)
N!1
Ps = limit (N" #) P(success) = 1
e
Let X = Average number of slots per succesful transmission
P(X= i) = (1- Ps)i!1
Ps
$ E[X]=1
Ps
= e
Eytan ModianoSlide 32
Analysis of CSMA/CD, continued
• Let S = Average amount of time between successful packettransmissions
S = (e-1)2τ + DTp + τ
• Efficiency = DTp/S = DTp / (DTp + τ + 2τ(e-1))
• Let β = τ/ DTp => Efficiency ≈ 1/(1+4.4β) = λ < 1/(1+4.4β)
• Compare to CSMA without CD where
Ave time until start of next Mini-slot
Packet transmission timeIdle/collisionMini-slots
! <1
1+ 2"
Eytan ModianoSlide 33
Notes on CSMA/CD
• Can be viewed as a reservation system where the mini-slots areused for making reservations for data slots
• In this case, Aloha is used for making reservations during themini-slots
• Once a users captures a mini-slot it continues to transmit withoutinterruptions
• In practice, of course, there are no mini-slots
– Minimal impact on performance but analysis is more complex
Eytan ModianoSlide 34
CSMA/CD examples
• Example (Ethernet)– Transmission rate = 10 Mbps– Packet length = 1000 bits, DTp = 10-4 sec– Cable distance = 1 mile, τ = 5x10-6 sec
– ➨ β = 5x10-2 and E = 80%
• Example (GEO Satellite) - propagation delay 1/4 second– β = 2,500 and E ~ 0%
• CSMA/CD only suitable for short propagation scenarios!
• How is Ethernet extended to 100 Mbps?
• How is Ethernet extended to 1 Gbps?
Eytan ModianoSlide 35
Token rings
• Token rings were developed by IBM in early 1980’s
• Token: a bit sequence– Token circulates around the ring
Busy token: 01111111 Free token: 01111110
• When a node wants to transmit– Wait for free token– Remove token from ring (replace with busy token)– Transmit message– When done transmitting, replace free token on ring
– Nodes must buffer 1 bit of data so that a free token can bechanged to a busy token
• Token ring is basically a polling system Token does the polling
Token Ring
Eytan ModianoSlide 36
TOKEN BUSES
• Special control packet serves as a token• Nodes must have token to transmit• Token is passed from node to node in some order
– Conceptually, a token bus is the same as a token ring
– When one node finishes transmission, it sends an idle token to thenext node (by addressing the control packet properly)
– Similar to a polling system• Issues
– Efficiency lower than token rings due to longer transmission delayfor the packets and longer propagation delays
– Need protocol for joining and leaving the bus
WS WS WS WS WS WS
Eytan ModianoSlide 37
Large propagation delay(satellite networks)
• Satellite reservation system– Use mini-slots to make reservation for longer data slots– Mini-slot access can be inefficient (Aloha, TDMA, etc.)
• A crude approximation: delay is 3/2 times the propagation delayplus ideal queueing delay.
1 2 3 4 5
A = mv
ReservationInterval
DataInterval
ReservationInterval
Frame
Res Data Res Data DataRes Res
Arrival
Wait for Reser-vation Interval
Propagation Delay
Wait for AssignedData Slot
Transmit
Eytan ModianoSlide 38
Satellite Reservations
• Frame length must exceed round-trip delay– Reservation slots during frame j are used to reserve data slots in
frame j+1– Variable length: serve all requests from frame j in frame j+1
Difficult to maintain synchronization Difficult to provide QoS (e.g., support voice traffic)
– Fixed length: Maintain a virtual queue of requests• Reservation mechanism
– Scheduler on board satellite– Scheduler on ground– Distributed queue algorithm
All nodes keep track of reservation requests and use the same algorithm tomake reservation
• Control channel access– TDMA: Simple but difficult to add more users– Aloha: Can support large number of users but collision resolution
can be difficult and add enormous delay
Eytan ModianoSlide 39
Aloha Reservations
• Use Aloha to capture a slot• After capturing a slot user keeps the slot until done
– Other users observe the slot busy and don’t attempt• When done other users can go after the slot
– Other users observe the slot idle and attempt using Aloha• Method useful for long data transfers or for mixed voice and data
Eytan ModianoSlide 40
Packet multiple access summary
• Latency: Ratio of propagation delay to packet transmission time– GEO satellite example: Dp = 0.5 sec, packet length = 1000 bits, R = 1Mbps
Latency = 500 => very high– LEO Satellite example: Dp = 0.1 sec
Latency = 100 => still very high– Over satellite channels data rate must be very low to be in a low latency
environment• Low latency protocols
– CSMA, Polling, Token Rings, etc.– Throughput ~ 1/(1+aα), α = latency, a = constant
• High latency protocols– Aloha is insensitive to latency, but generally low throughput
Very little delays– Reservation system can achieve high throughput
Delays for making reservations– Protocols can be designed to be a hybrid of Aloha and reservations
Aloha at low loads, reservations at high loads
MIT
Switches, Routers and Networks
MIT
Overview
• Introduction• Routing and switching:
– Switch fabrics :– Basics of switching– Blocking– Interconnection examples– Complexity– Recursive constructions
• Interconnection routing• Buffering - input and output• Local area networks (LANs)• Metropolitan area networks (MANs)• Wide area networks (WANs)• Trends
MIT
Introduction
• Data networks generally evolve fairly independently for differentapplications and are then patched together – telephony, variety ofcomputer applications, wireless applications
• IP is a large portion of the traffic, but it is carried by a variety ofprotocols throughout the network
• Voice is still the application that has determined many of theimplementation issues, but its share is decreasing and voice isincreasingly carried over IP (voice over IP)
• Voice-oriented networks are not very flexible, but are very robust• IP very successful because it is very flexible, but increasingly
there is a drive towards enhancing the reliability of services• How do all of these network types and requirements fit together?
MIT
Networks
WANMAN
MAN
LANLAN
LAN
• LANs serve a wide variety of services and attach to MANsor maybe directly to WANs
• The two main purposes of a networks are:– Transmission across some distance: this involves
amplification or regeneration (generally code-assisted)– The establishment of variable flows: switching and
routing
SAN
LAN
MIT
Switching and Routing
• Switching is generally the establishment of connections on a circuitbasis
• Routing is generally the forwarding of traffic on a datagram basis• Routing requires switching but not vice-versa – routing uses
connections which are permanently or temporarily set up to in orderto forward datagrams (those datagrams may be in circuit form, forinstance VPs and VCs)
MIT
Packet routers
• A packet switch consists of a routing engine (table look-up), a switchscheduler, and a switch fabric.
• The routing engine looks-up the packet address in a routing table anddetermines which output port to send the packet.– Packet is tagged with port number– The switch uses the tag to send the packet to the proper output port
MIT
Switch fabrics
• Simplest switch fabric is simply a shared bus– Most of the processing is done in line cards
• Route table look-up• Line cards buffer the packets• Line card send packets to proper output
– Bus bandwidth must be N times LC speed (N ports)• In general a switch fabric replaces the bus• Switch fabrics are created from certain building blocks of
smaller switches arranged in stages• Simplest switch is a 2x2 switch, which can be either in the
through or crossed position
Computer
Bus
LC LC LC LC
MIT
Definitions
• A connection state is a mapping from the array of inputs to thatof outputs; connections are either point-to-point or multicast
• Basic switch building blocks are:– the distributor
– the concentrator
– the 2x2 2-state point-to-point switch (switching cell)
0
10
0
10
0
10
0
10
0
10
0 01 1
0 01 1
MIT
Building up
• Interconnection network: finite collection of nodes togetherwith a set of interconnection lines such that– every node is an object with an array of inputs and an
array of outputs– an interconnection line leads from an output of one node
to an input of another node– every I/O of a node is incident with at most one
interconnection line– an I/O is called external if it is not incident with any
interconnection line• A route from an external input to an external output is a chain
of distinct (a0, b0, a1, b1, …, ak, bk) where a0 and bk areexternal, bj-1 is interconnected to aj
MIT
Building up
• An interconnection network is called a switching networkwhen:– every node qualifies to be a switch through proper
specification of connection states– the network is routable (there exists a route from every
external input to every external output)– an ordering is specified on external inputs and on external
outputs• Unique routing interconnection networks: all routes from an
external input to an external output are parallel, that is (a0, b0,a1, b1, …, ak, bk) and (a0, b’0, a’1, b’1, …, a’k, bk) are such that aj,a’j reside on the same nodes and bj, b’j reside on the same node
• Otherwise: alternate routing
MIT
Blocking
• A mxn unique routing network is called a nonblocking networkif for any integer k < min(m,n)+1, any k external inputs, any kexternal outputs and pairing between these external I/O, thereexist k disjoint routes for the matched pairs
• For a routable network, the same property is that ot arearrangeably nonblocking, or rearrangeable network
• An interconnection network is strictly non-blocking if requestsfor routes are always granted under the rule of arbitrary routeselection, wide-sense non-blocking if there exists an algorithmfor route selection that grants all requests
rearrangeableWSnon-blockingnon-blocking
MIT
Blocking, Multi-stage networks
• Main connection between rearrangeability and non-blockingproperty is given by the following theorem:
A switching network composed of non-blocking switches is rearrangeable iff itconstructs a non-blocking switch
• A common means of building interconnection networks is touse a multi-stage architecture:– every interconnection line is between two stages– every external input is on a first-stage node– every external output is on a final-stage node– nodes within each stage are linearly ordered
MIT
Interconnection networks
• N input, Log(N) stages with N/2 modules per stageExample: Omega (shuffle exchange network)
• Notice the order of inputs into a stage is a shuffle of the outputsfrom the previous stage: (0,4,1,5,2,6,3,7)
• Easily extended to more stages• Any output can be reached from any input by proper switch settings
– Not all routes can be done simultaneously– Exactly one route between each OD pair
MIT
Interconnection networks
• Another example of a multi-stage interconnection network• Built using the basic 2x2 switch module• Recursive construction
– Construct an N by N switch using two N/2 by N/2 switches and anew stage of N/2 basic (2x2) modules
– N by N switch has Log2(N) stages each with N/2 basic (2x2)modules
MIT
Complexity issues
• There are many different parameters that are used to considerthe complexity of an interconnection network
• Line complexity: number of interconnection lines• Node (cell) complexity: number of small nodes (mxn where
m < 3 and n < 3)• Depth: maximum number of nodes on a route (assuming an
acyclic interconnection network)• Entropy of a switch: log of the number of connections states• What relations exist between complexity and the capabilities
of a switch?
MIT
Complexity
• The depth of a mxn routable interconnection network is atleast max(log(m), log(n)).
• Proof: for a depth d, there are at most 2d external outputs.Since we have routability, n< 2d+1 and m< 2d+1 .
• When a switching network is composed of 2-state switches,the component complexity of the network is at least theentropy of the switch
• Proof: for E the number of switches, there are 2E ways toform a combination of one connection state in every node.Each combination corresponds to at most one connectionstate in the node.
MIT
Complexity
• When a nxn rearrangeable network is composed of smallnodes, its component complexity is at least log(N!)
• Proof: if we take every small node to be replaced by a 2-statepoint-to-point switch, then we have a non-blocking switch.Thus, there is a different connection state for everyone of then! one-to-one mapping between the n inputs and the noutputs. We now use the relation for networks composed of2-state switches.
• Note: using Stirling’s formula, we can obtain an approximatesimple bound for component complexity
MIT
Complexity
• Component complexity:
• Relation between line and component complexity: component complexity +mn = line complexity +m + n
MIT
Complexity
• If a mxn nonblocking network is composed of n12 1x2 nodes, n212x1 nodes, n22 cells, plus possibly crosspoints (edges), then
n12 + n21 + 4 n22 = 2mn - m - n• Corollary: a nxn non-blocking network composed of small
nodes has component complexity at least 0.5(n2 - n)• Note: directed acyclic graphs can be seen as a special case of a
network - a crosspoint network.• We have basic complexity properties, but how do we build
networks?
MIT
Recursive 2-stage construction
• 2-stage interconnection with parameters m and n is composedof n mxm input nodes and m nxn output nodes interconnectedby a coordinate interchange (static)
• Constructions using trees:
• Basic blocks need not be 2x2, trees need not be balanced
16x16
4x4 4x4
2x2 2x2 2x2 2x2
Divide and conquer
60x60
6x6 10x10
2x2 3x3 5x5 2x2
MIT
Benes approach
• A three stage approach in which we use as the middle stage twonetworks of size 2n-1 x 2n-1 to build a network of size 2n x 2n
2n-1 x 2n-1
2n-1 x 2n-1
.
.
.
.
.
.2n-1 cells
2n-1 cells
MIT
Generalized 3-stage approach
• We denote by [nxm, rxp, mxq] the 3-stage network with rnxm input nodes, m rxp middle nodes, p mxq output nodessuch that– output y of input node x is linked to input x of of middle
node y– output u of middle node y is linked to input y of output
node u• Rearrangeability theorem: the 3-stage network is
rearrangeable iff
• It is strictly non-blocking iff
MIT
Maximum matchings
• Algorithms for finding maximum matching exist• The best known algorithms takes O(N2.5) operations
– Too long for large N• Alternatives
– Sub-optimal solutions– Maximal matching: A matching that cannot be made
any larger for a given backlog matrix– For previous example:
(1-1,3-3) is maximal(2-1,1-2,3-3) is maximum
• Fact: The number of edges in a maximal matching ≥ 1/2the number of edges in a maximum matching
MIT
Self-routing
• Use the switch fabric for packet routing• Use a tag: n bit sequence with one bit per stage of the
network– E.g., Tag = b3b2b1
• Module at stage i looks at bit i of the tag (bi), and sends thepacket up if bi=0 and down if bi=1
• In omega network, for destination port with binary addressabc the tag is cba– Example: output 100 => tag = 001– Notice that regardless of input port, tag 001 will get you
to output 100• What happens when packets cannot be forwarded to the right
output for the given setting of the switching fabric?
MIT
MIT
Interconnection analysis for routing
• Assume no buffering at the switches• If two packets want to use the same port one of them is
dropped• Suppose switch has m stages• Packet transmit time = 1 slot (between stages)• New packet arrival at the inputs, every slot
– Saturation analysis (for maximum throughput)– Uniform destination and distribution independent from
packet to packet
MIT
Interconnection throughput
• Let P(m) be the probability that a packet is transmitted on astage m link, P(0) = 1
• P(m+1) = 1 – P(no packet on stage m+1 link (link c) )= 1 – P(neither inputs to stage m+1 chooses this output)
• Each input has a packet with probability P(m) and that packetwill choose the link with probability 1/2. Hence,
• We can now solve for P(m) recursively• For an m stage network, throughput (per output link) is P(m),
which is the probability that there is a packet at the output
P(m +1) = 1! (1 !1
2P(m))
2
MIT
Distributed buffer
• Modular Architecture
• Switch buffers: None, at input, or at output of each moduleSwitch fabric consists of many 2x2 modules
MIT
Contention and buffering
• Two packets may want to use the same link at the sametime (same output port of a module): hot spot effect
• Solution: BufferingThroughput of interconnect network
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
stages
th
rou
gh
pu
t
MIT
Multi-stage architecture
• Throughput is significantly improved by buffers at the stages– Buffers increase delay– Tradeoff between delay and throughput
• Advantages: modular, scalable, bus (links) only needs to be as fastas the line cards
• Disadvantages– Delays for going through the stages
• Cut-through possible when buffers empty– Decreased throughput due to internal blocking
• Alternatives: Buffers that are external to the switch fabric– Output buffers– Input buffers
MIT
Output buffer architecture
• As soon as a packet arrives, it is transferred to the appropriate outputbuffer
• Assume slotted system (cell switch)• During each slot the switch fabric transfers one packet from each input
(if available) to the appropriate output– Must be able to transfer N packets per slot– Bus speed must be N times the line rate– No queueing at the inputs
• Buffer at most one packet at the input for one slot
MIT
Queueing Analysis
• If external arrivals to each input are Poisson (average rate ),each output queue behaves as an M/D/1 queue
– packet duration equaling one slot• The average number of packets at each output is given by
(M/G/1 formula):
• Note that the only delay is due to the queueing at the outputsand none is due to the switch fabric
A
X = X2
= 1
MIT
Advantages/Disadvantages ofOutput buffer architecture
• Advantages: No delay or blocking inside switch• Disadvantages:
– Bus speed must be N times line speed• Imposes practical limit on size and capacity of switch
• Shared output buffers: output buffers are implemented inshared memory using a linked list– Requires less memory (due to statistical multiplexing)– Memory must be fast
MIT
Input buffer
• Packets buffered at input rather than output, so switch fabricdoes not need to be as fast
• During each slot, the scheduler established the crossbarconnections to transfer packets from the input to the outputs– Maximum of one packet from each input– Maximum of one packet to each output
MIT
Throughput analysis of input queued switches
• Head of line (HOL) blocking – when the packets at the head of twoor more input queues are destined to the same output, only one canbe transferred and the others are blocked
• HOL blocking limits throughput because some inputs (consequentlyoutputs) are kept idle during a slot even when they have otherpacket to send in their queue
• Consider an NxN switch and again assume that inputs are saturated(always have a packet to send)
• Uniform traffic => each packet is destined to each output with equalprobability (1/N)
• Now, consider only those packets at the head of their queues (thereare N of them!)
MIT
Throughput analysis, continued
• Let be the number of HOL packets destined to node iat the end of the mth slot
• Where = number of new HOL messages addressed to node i thatarrive to the HOL during slot m. Now,
• Where = number of HOL messages that departed during the m-1slot = number of new HOL arrivals
• As N approaches infinity, becomes Poisson of rate C/Nwhere C is the average number of departures per slot
Qm
i
Qm
i= max(0,Q
m!1
i+ A
m
i!1)
Am
i
P(Am
i= l) =
Cm!1
l
"
# $
%
& ' (1/ N)l(1 !1/ N)Cm! 1! l
Cm!1
Am
i
MIT
Throughput analysis, continued
• In steady-state, Qi behaves as an M/D/1 of rate and, asbefore,
• Notice however that the total number of packets addressedto the outputs is N (number of HOL packets). Hence, =>
• We can now solve, using the quadratic equation to obtain:
A
Qi
i=1
N
! = N
A = utilization = 2 ! 2 " 0.58
MIT
Summary of input queued switches
• The maximum throughput of an input queued switch, islimited by HOL blocking to 58% ( for large N)
– Assuming uniform traffic and FCFS service
• Advantages of input queues:– Simple– Bus rate = line rate
• Disadvantages: Throughput limitation
MIT
Overcoming HOL blocking
• If inputs are allowed to transfer packets that are not at thehead of their queues, throughput can be substantiallyimproved (not FCFS)
Example:
How does the scheduler decide which input to transfer towhich output?
MIT
Backlog matrix
• Each entry in the backlog matrix represent the number of packets ininput i’s queue that are destined to output j
• During each slot the scheduler can transfer at most one packet from eachinput to each output– The scheduler must choose one packet (at most) from each row, and
column of the backlog matrix– This can be done by solving a bi-partite graph matching algorithm– The bi-partite graph consists of N nodes representing the inputs and
N nodes representing the outputs
MIT
Bi-partite graph representation
• There is an edge in the graph from an input to an output if there is apacket in the backlog matrix from that input to that output
• For previous backlog matrix, the bi-partite graph is:
• A matching is a set of edges, such that no two edges share a node: amatching in the bi-partite graph is equivalent to a set of packets suchthat no two packets share a row or column in the backlog matrix
• A maximum matching is a matching with the maximum possiblenumber of edges: a maximum matching is equivalent to the largest setof packets that can be transferred simultaneously
MIT
Maximum matchings
• Algorithms for finding maximum matching exist• The best known algorithms takes O(N2.5) operations
– Too long for large N• Alternatives
– Sub-optimal solutions– Maximal matching: A matching that cannot be made
any larger for a given backlog matrix– For previous example:
(1-1,3-3) is maximal(2-1,1-2,3-3) is maximum
• Fact: The number of edges in a maximal matching ≥ 1/2the number of edges in a maximum matching
MIT
Achieving 100% throughputin an input queued switch
• Finding a maximum matching during each time slot does noteliminate the effects of HOL blocking– Must look beyond one slot at a time in making scheduling
decisions• Definition: A weighted bi-partite graph is a bi-partite graph
with costs associated with the edges• Definition: A maximum weighted matching is a matching
with the maximum edge weights• Theorem: A scheduler that chooses during each time slot the
maximum weighted matching where the weight of link (i,j) isequal to the length of queue (i,j) achieves full utilization(100% throughput)
– Proof: see “Achieving 100% throughput in an input queued switch” by N. McKeown, et. al., IEEETransactions on Communications, Aug. 1999.
MIT
General relation with bipartite matching
• Stability of infinite input-buffered switch iff we candecompose the traffic as a convex linear combination of 0,1sub-stochastic matrices
• Birkhoff-von Neumann principle• This links packets and flows to circuits• Corollary: if we know the traffic matrix well, then we can
provide stable service through a TDM schedule• Delay effects?• Robustness to poor knowledge of the traffic?
MIT
LANs
• The driver behind LANs can be roughly thought of asincreasing the reach and sharing of a bus
• Traditional Ethernet: CSMA/CD, shared• Other approach: token ring, for instance Fiber data distributed
interface (FDDI)
• Switched networks:Lines are not shared but gothrough a router/switch
User1
User1
Shared ring
MIT
IEEE/ANSI 802 standards
802.3:CSMA/CD
Ethernet
802.4Token
bus
802.5 Token Ring
802.6DQDB MAN
802.9IIS
LAN
802.11Wireless
LAN
802.12DPAM
Distributed queuedual bus
Integrated services
Demand priorityAccess method
802.1 bridging
802.2 logical link control
Each of the 802.3-12 have both a Medium access and a physical standard
MIT
Evolution of Ethernet
• Ethernet emerged form the ideas of shared media such as ALOHA and the firstEthernet was built at Xerox Parc in the early 1970s
• Ethernet s not completely 802.3, but a close approximation (there are somedifferences in the packet)
• Ethernet node:• MAC enforces CSMA/CD and performs:
– Transmit and receive message data encapsulation:• Framing• Addressing• Error detection
– Media access management:• Medium allocation (collision avoidance)• Contention resolution (collision handling)
• PLS: physical signaling, Manchester encoding• AUI (attachment unit interface) manages data in (DI),
data out (DO) and control in (CI)• Medium attachment unit (MAU): transmits and receives data,
loops data back from DO to DI to indicate valid Tx and Rx path,detects collisions, sends signal quality error signal, performs jabber function, checks link integrity
Host system bus
MAC
System interface
PLSDI DO CI
DI DO CIMAU
RG 58 COAX
MIT
Increasing Ethernet bandwidth – the first step
• The first Ethernet went up to 10 Mbs – 10BASE-T, over phonegrade twisted pair, with a repeater in the middle of a starconfiguration acting as a virtual shared medium (also traditional10Base5 and Cheapernet 10BASE2 on thick and thin coax,respectively were laid out)
• 10Base-T over fiber was developed, extending the distancebetween MAUs to 2 km instead of 500 m in coax
• 1990: the Etherswitch was marketed by Kalpana to boost LANperformance rather than as a bridge to interconnect differentLANs and in 1993 full-duplex interconnect was also introducedby Kalpana
• Still each port could only deliver 10 Mbps, the option for higher(100 Mbps) connection was FDDI, which was expensive
MIT
Fast Ethernet
• In 1992, Grand Junction introduced 100 Mbps Ethernet• Standardization was done by the Fast Ethernet Alliance, while the
IEEE struggled between 802.3 and a demand-priority camp, whichcreated the 802.12 group
• Later 803.2u standardized 100BASE-T• Main differences between 10BASE-T and 100BASE-T:
– No more mixing segments (coax with multiple devices attached), all cabling ispoint to point between terminal equipment or repeaters
– Shorter distances – 100 m for Cat 5, Cat 3 and 130 m for fiber (160 m if allfiber network)
– Kept the MAC but changed elements below to adapt ot 100 Mbps - replacedthe AUI with the media independent sublayer, added a reconciliation sublayer(going from bit-derial to nibble-serial), went from Manchester encoding toNRZ
• 10 GigE is emerging as a new standard http://www.10gea.org/Tech-whitepapers.htm
MIT
10 Gigabit Ethernet
•10 GigE is emerging as a new standard• The standard is being developed with SONET interoperability in mind with a view towards expansion in the MAN and WAN end-to-end Ethernet arena• In particular, the load will be be matched to OC-192 loads•Task force 802.3ae is in charge of developing 10 GE standard•Also 10 Gigabit Ethernet alliance http://www.10gea.org/
MIT
Evolution to switched LANs
• VLANS were introduced to allow for smaller broadcast group:– the standardization efforts have not yet yielded interoperable VLANs, they are
still proprietary solutions– VLANs require a frame extension (802.3ac) to convey VLAN information via
tagging (802.1Q) (2 tags of 16 bits each), approved in 1998
• Layer 3 switches implement some routing in hardware:– Routers were generally used for interconnecting LANs and for remote WAN
connections– Switches traditionally had little intelligence but were very fast– Layer 3 switches still perform layer 2 switching but also some routing
functionality in ASICs– They also implement VLANs– Generally support only IP
MIT
The next step in Ethernet- Gigabit Ethernet
• The Gigabit Ethernet Alliance (May 1996) started the push for GigabitEthernet, mostly standardized as 802.3z in 1998
• Main characteristics:– The MAC itself was modified so that there is 200 m network span with a single
repeater– The MII was changed to GMII, Tx and Rx data paths widened to 8 bits– Adoption of 8bit/10bit fibre channel encoding– Carrier extension: extending or padding from 64-byte minimum to 512-byte
minimum to maintain compatibility– Frame bursting to enhance efficiency:
worst-case efficiency for 100 Mb/s CSMA/CD is for1000 Mb/s with CSMA/CD is
Minimum packet length
Preamble length
Inter-frame gap
MIT
Frame-bursting for Efficiency
• Frame bursting to enhance efficiency• Worst-case efficiency for 100 Mb/s CSMA/CD is
• For 1000 Mb/s with CSMA/CD is
• If we allow n frames to be transmitted in a burst after the first framethen worst-case efficiency is
• Efficiency gains beyond 65,536 bits is minimal and is about 72% atthat value
Minimum packet length
Preamble length
Inter-frame gap
Slot time
MIT
Another LAN application: storage access
• In open systems world, dominant I/O technology is small computersystem interface (SCSI), which transfers data in blocksstandardized in 1986 as ANSI X3T9
• SCSI drawbacks:– Two or more I/O controllers cannot easily share SCSI devices on the same
I/O bus, so a single server controls connections between users and their data– Address on an I/O bus: 8 or 16 addresses depending on implementation– Distance 25 m
Storage devices
SCSI channels
server
MIT
A new type of LAN – the SAN
• In the same way that early LANs developed from extending thebus, the requirement for more storage has driven extending theSCSI interface to many devices and eventually replacing asingle storage device with a full network, the storage areanetwork (SAN)
• Based on Fibre Channel protocol (FC) fiber channel:– Gigabit per second bandwidth (1063 Mbps) and theoretically
up to 4 Gbps– Allows SCSI in serial form rather than the parallel form
usually found in SCSI (also supports HIPPI and IPI I/Oprotocols)
– Distance of up to 10 km– 24-bit address identifier – up to 16 million ports
MIT
FC
• Upper level protocolsinclude application,device drivers, operatingsystems
• Common services arestriping, hunt groups,multicast
• Framing: frames of upto 2112 bytes,sequences (one or moreframes), exchanges (unior bidirectional set ofnon-concurrentsequences, packets (oneor more exchanges)
Upper level protocols
FC4 Protocol mappings
FC3 Common services
FC2 Framing protocol
FC1 Encode/decode
FC0 Physical
Port
Leve
lN
ode
Leve
l
MIT
Different types of FC SAN architectures
• Point-to-point
• Arbitrated loop topology:– up to 126 devices in a serial loop
configuration– Each port discovers when
it has been attached– No collisions– Fair access: every port wanting
to initiate traffic gets to do sobefore another port gets a
second shot
hub
MIT
Different types of FC SAN architectures
• Fabric topology
• A common fabric topology is cascaded switches
FC switch
Host I/O controller
MIT
This is not a shared bus!
Commerzbank Brocade set-up
MIT
Other alternatives to SANs
• Embedded disk drives• Directly attached storage attached by SCSI directly, possibly
shared among servers• Network attached storage is in front of the server, directly attached
to the network, rather than behind the server as a SAN– Protocol is generally NFS vs. FC for SAN– Network is Ethernet vs. FC for SAN– Source and target are client/server or server/server vs.
server/device for SAN– Transfers files vs. device blocks for SAN– Connection is direct on network vs. I/O bus or channel on
server for SAN– Has an embedded file system
MIT
High availability in the enterprise
Tx Rx Tx Rx
Tx Rx Tx Rx
Secondary switchPrimary Switch
Primary
Primary
Secondary
Secondary
Inter-switchconnection
GigE or FC
GigE or FC
MIT
MANs
• MANs are a fuzzy area since they may operate as largeLANs or simply as the last leg of a WAN
• Certain protocols are particularly oriented towards MANs,such a DQDB, dual bus either folded or not folded :– Exhibited certain issues with utilization fairness– Not very flexible in its layout architecture
Head end
Headendnode node node Dual bus
Head end node node node Folded bus
MIT
Resilient Packet Ring
• Rings for packet access in the MAN• Resilient packet ring alliance (RPR) and IEEE working group
802.17 (started December 2000)• Oriented towards IP• Recovery is done using traditional self-healing ring approach• Maintains the same architecture as SONET rings and FDDI,
but changes the MAC
MIT
WANs
• WANs are predominantly implemented over optical networks• The underlying protocol is SONET (synchronous optical network)
or SDH in Europe and Japan (synchronous digital hierarchy)• Synchronous, so framing is in terms of timing• Lowest-speed SONET runs at STS-1, 51.84 Mbps• STS frames may be concatenated with a single header, which
contains pointers to the different headers of the STS frames• SONET provides very tight requirements on reliability• Typical implementations are UPSR or BLSR• Recovery must occur within 50 ms, detection of a problem occurs
within 2.4 microseconds
MIT
WANs
• WANs are increasingly dense and require extensive networkmanagement
• Provisioning across WANs in short time is a growing as thereselling market becomes more fluid
• WANs are increasingly called upon to perform functionsheretofore reserved for LANs or MANs, so there isincreasing convergence
• Speed per wavelength is now 0C-48 (2.5 Gbps), OC-192 (10Gbps) possibly going towads 40 Gbps
MIT
Access to the Optical Infrastructure
• Two trends in optical access:
– IP, GE being pushed closer to the core– streaming media pushing core-type traffic closer to the edge
• How should access be architected:– role of network management– types of nodes
Core: SONET
x on WDM
MAN:SONET, ATM
x on WDM
Local:GE, FC, ATM,
TCP/IP
Access: MPLS or other encapsulation
Eytan ModianoSlide 1
Fast packet switching
Eytan ModianoMassachusetts Institute of Technology
Eytan ModianoSlide 2
Packet switches
• A packet switch consists of a routing engine (table look-up), aswitch scheduler, and a switch fabric.
• The routing engine looks-up the packet address in a routing tableand determines which output port to send the packet.– Packet is tagged with port number– The switch uses the tag to send the packet to the proper output port
Eytan ModianoSlide 3
First Generation Switches
• Computer with multiple line cards– CPU polls the line cards– CPU processes the packets
• Simple, but performance is limited by processor speeds and busspeeds
• Examples: Ethernet bridges and low end routers
Eytan ModianoSlide 4
Second Generation switches
• Most of the processing is now done in the line cards– Route table look-up, etc.– Line cards buffer the packets– Line card send packets to proper output port
• Advantages: CPU and main Memory are no longer the bottleneck
• Disadvantage: Performance limited by bus speeds– Bus BW must be N times LC speed (N ports)
• Example: CISCO 7500 series router
Eytan ModianoSlide 5
Third generation switches
• Replace shared bus with a switch fabric• Performance depends on the switch fabric, but potentially can
alleviate the bus bottleneck
N by N
SWITCH FABRIC
Input LC
Input LC
Input LC
Output LC
Output LC
Output LC
Controller
Eytan ModianoSlide 6
Input buffer architecture
• Packets buffered at input rather than output– Switch fabric does not need to be as fast
• During each slot, the scheduler established the crossbarconnections to transfer packets from the input to the outputs– Maximum of one packet from each input– Maximum of one packet to each output
• Head of line (HOL) blocking – when the packet at the head of twoor more input queues is destined to the same output, only one canbe transferred and the other is blocked
Eytan ModianoSlide 7
Throughput analysis of input queued switches
• HOL blocking limits throughput because some inputs(consequently outputs) are kept idle during a slot even when theyhave other packet to send in their queue
• Consider an NxN switch and again assume that inputs aresaturated (always have a packet to send)
• Uniform traffic => each packet is destined to each output withequal probability (1/N)
• Now, consider only those packets at the head of their queues(there are N of them!)
Eytan ModianoSlide 8
Throughput analysis, continued
• Let be the number of HOL packets destined to node i at theend of the mth slot
• Where
= number of new HOL messages addressed to node i that arriveto the HOL during slot m. Now,
• Where
= number of HOL messages that departed during the m-1 slot =number of new HOL arrivals
• As N approaches infinity, becomes Poisson of rate C/N where Cis the average number of departures per slot
Qm
i
Qm
i= max(0,Q
m!1
i+ A
m
i!1)
Am
i
P(Am
i= l) =
Cm!1
l
"
# $
%
& ' (1/ N)l(1 !1/ N)Cm! 1! l
Cm!1
Am
i
Eytan ModianoSlide 9
Throughput analysis, continued
• In steady-state, Qi behaves as an M/D/1 of rate and,
• Notice however that the total number of packets addressed to the outputsis N (number of HOL packets). Hence,
• =>
We can now solve, using the quadratic equation to obtain:
A
Qi=2A ! (A )
2
2(1 ! A )
Qi
i=1
N
! = N Qi=2A ! (A )
2
2(1 ! A )= 1
A = utilization = 2 ! 2 " 0.58
Eytan ModianoSlide 10
Summary of input queued switches
• The maximum throughput of an input queued switch, is limited byHOL blocking to 58% ( for large N)
– Assuming uniform traffic and FCFS service
• Advantages of input queues:– Simple– Bus rate = line rate
• Disadvantages: Throughput limitation
Eytan ModianoSlide 11
Overcoming HOL blocking
• If inputs are allowed to transfer packets that are not at the head oftheir queues, throughput can be substantially improved (notFCFS)
Example:
• How does the scheduler decide which input to transfer to whichoutput?
Eytan ModianoSlide 12
Backlog matrix
• Each entery in the backlog matrix represent the number ofpackets in input i’s queue that are destined to output j
• During each slot the scheduler can transfer at most one packetfrom each input to each output– The scheduler must choose one packet (at most) from each row, and
column of the backlog matrix– This can be done by solving a bi-partite graph matching algorithm– The bi-partite graph consists of N nodes representing the inputs and
N nodes representing the outputs
1
2
3
input
output
1 2 3
3 3
2 0
2
0
0
0 0
Eytan ModianoSlide 13
Bi-partite graph representation
• There is an edge in the graph from an input to an output if there is apacket in the backlog matrix to be transferred from that input to thatoutput– For previous backlog matrix, the bi-partite graph is:
• Definition: A matching is a set of edges, such that no two edges sharea node– Finding a matching in the bi-partite graph is equivalent to finding a set of
packets such that no two packets share a row or column in the backlogmatrix
• Definition: A maximum matching is a matching with the maximumpossible number of edges– Finding a maximum matching is equivalent to finding the largest set of
packets that can be transferred simultaneously
Eytan ModianoSlide 14
Maximum Matchings
• Algorithms for finding maximum matching exist• The best known algorithms takes O(N2.5) operations
– Too long for large N
• Alternatives– Sub-optimal solutions– Maximal matching: A matching that cannot be made any larger for a
given backlog matrix
– For previous example:
(1-1,3-3) is maximal
(2-1,1-2,3-3) is maximum
• Fact: The number of edges in a maximal matching ≥ 1/2 thenumber of edges in a maximum matching
Eytan ModianoSlide 15
Achieving 100% throughputin an input queued switch
• Finding a maximum matching during each time slot does noteliminate the effects of HOL blocking– Must look beyond one slot at a time in making scheduling decisions
• Definition: A weighted bi-partite graph is a bi-partite graph withcosts associated with the edges
• Definition: A maximum weighted matching is a matching with themaximum edge weights
• Theorem: A scheduler that chooses during each time slot themaximum weighted matching where the weight of link (i,j) is equal tothe length of queue (i,j) achieves full utilization (100% throughput)
– Proof: see “Achieving 100% throughput in an input queued switch” byN. McKeown, et. al., IEEE Transactions on Communications, Aug. 1999.