A high performance ATM switch architecture · 2016. 12. 2. · multicast packets. Unlike the usual design for the multicast ATM switch which concentrates on a cell copy network with

A High Performance ATM Switch Architecture

Hong Xu Chen

A thesis submitted for the degree of Doctor of Philosophy at The Swinburne University of Technology

Faculty of Information and Communication Technology

Swinburne University of Technology

September 2006

ii

Except where otherwise indicated, this thesis is my own original work.

Hong Xu Chen

September 2006

iii

Acknowledgements

I would like to take this opportunity to thank my supervisor, Associate Professor

Jim Lambert, for his consistent guidance, teaching and support during my Ph.D

candidature. I would also like to thank Professor Brad Gibson, Professor Jun Han, Dr.

Hai Vu and Ms Charlotte Swain for their help during my thesis examination and their

amendments to finalise the thesis. In addition, I would like to thank Associate Professor

Bin Qiu from Monash University and Dr. Xi Ying Hu. I would like to thank my aunt, Ms

Zhang Han Qiu, for her initial financial support, which gave me the opportunity to pursue

both my Masters and Ph.D in Australia. Finally, I would like to thank my mother, Zhang

Lin Xia and my father, Chen Chang Ling for their teaching, advice and support in my

everyday life. The degree is for them.

iv

Abstract

ATM is based on the efforts of the ITU-T Broadband Integrated Services Digital

Network (B-ISDN) standard. It was originally conceived as a high-speed transfer

technology for voice, video, and data over public networks. The ATM Forum has

broadened the ITU-T’s vision of ATM for extended use over public and private networks,

multi-protocol support and mobile ATM. There are also some ATM applications in High

Performance Computing (HPC).

ATM is a packet switching technique based on a virtual circuit mechanism. Data

flows are statistically multiplexed and communication resources are dynamically shared.

Therefore the high performance ATM switch is essential for quality of services (QoS).

This thesis introduces typical ATM switch architecture design and analyses

design problems. The research objective is to propose a switch architecture design that

can solve or improve those existing problems to achieve a superior performance. The

research goal is an integrated ATM switch architecture that will handle both unicast and

multicast packets. Unlike the usual design for the multicast ATM switch which

concentrates on a cell copy network with a unicast switching network, the proposed

switch architecture processes the network packets in a single switching block, and allows

unicast and multicast packets to co-exist without competing. The switch design has a

simple topology and operation principle and is easy to implement. Furthermore, no copy

network is required. Three major components are proposed to form the core of the new

switch architecture: the parallel buffering strategy for improved buffer performance, the

fast table lookup algorithm for packet duplication and routing, and the relay ring

controller for solving the contention problem associated with multiple packets destined

for the same output port.

A mathematical model is presented and its numerical results are analysed. In

addition, the simulation algorithms for the proposed switching design are presented and

compared against the switching design with input and output buffering strategies. The

simulation results are also compared and analysed against the numerical results.

v

A multicast traffic model is also presented. Its performance calculation for the

proposed switch is achieved through simulation. Performance analysis is compared

against the output buffering switch under the same multicast traffic model.

The performance analysis shows that the proposed switch architecture achieves

high throughput with low cell loss rate and low time delay. Its performance can be as

good as the output buffering strategy or better. Therefore the proposed switch design has

solved the problems associated with input and output buffering.

This thesis also analyses the complexity of the proposed switch architecture and

suggests a topology to build a large scale ATM switch. The suitability and feasibility for

production implementation are also addressed.

vi

Table of Contents

ACKNOWLEDGEMENTS ...........................................................................................III

ABSTRACT.....................................................................................................................IV

TABLE OF CONTENTS ...............................................................................................VI

LIST OF FIGURES ........................................................................................................IX

LIST OF TABLES .......................................................................................................XVI

ACRONYMS.............................................................................................................. XVII

PUBLICATIONS ......................................................................................................XVIII

1 INTRODUCTION................................................................................................. 1 1.1 BROAD OBJECTIVES ................................................................................................. 1 1.2 MAJOR CONTRIBUTIONS........................................................................................... 1 1.3 ATM SWITCH BACKGROUND.................................................................................... 4 1.4 DETAILED OBJECTIVES............................................................................................. 8 1.5 OUTLINE OF THE PROPOSED APPROACH.................................................................... 9 1.6 ORGANISATION OF THE THESIS................................................................................. 9

2 LITERATURE REVIEW OF ATM SWITCH ARCHITECTURES ............ 11 2.1 LITERATURE REVIEW OF ATM SWITCH ARCHITECTURES ....................................... 11

2.1.1 Batcher-Banyan network ............................................................................. 11 2.1.2 Knockout switch ........................................................................................... 13 2.1.3 Shared memory and medium switch ........................................................... 15 2.1.4 Crossbar switch ............................................................................................ 16 2.1.5 Summary....................................................................................................... 17

2.2 LITERATURE REVIEW OF MULTICAST ATM SWITCH ARCHITECTURE ...................... 18 2.2.1 Starlite switch ............................................................................................... 18 2.2.2 Knockout switch ........................................................................................... 19 2.2.3 Turner’s broadcast switch ........................................................................... 20 2.2.4 A recursive multistage structure for multicast ATM switching ................. 21 2.2.5 Tony Lee’s multicast switch......................................................................... 23 2.2.6 SCOQ Switch................................................................................................ 25 2.2.7 ORCN multicast switch................................................................................ 26 2.2.8 Summary....................................................................................................... 28

3 NEW MULTICAST SWITCH ARCHITECTURE......................................... 29 3.1 BUS INTERFACES.................................................................................................... 30 3.2 ATM CELL INFORMATION STRUCTURE................................................................... 31 3.3 MGT TRANSLATION TABLE.................................................................................... 32

vii

3.4 TRUNK NUMBER TRANSLATOR (TNT).................................................................... 33 3.5 RELAY RING........................................................................................................... 34 3.6 TIMING DIAGRAM OF THE PROPOSED SWITCH......................................................... 37 3.7 ABR AND VBR TRAFFIC DISCUSSION .................................................................... 38 3.8 SUMMARY.............................................................................................................. 39

4 MATHEMATICAL MODELLING AND NUMERICAL RESULTS OF THE PROPOSED SWITCH DESIGN....................................................................... 41

4.1 MATHEMATICAL MODELLING ................................................................................ 42 4.2 NUMERICAL RESULTS............................................................................................. 50

4.2.1 Computer aid program................................................................................. 51 4.2.2 Gauss elimination algorithm with partial pivoting ..................................... 54 4.2.3 Numerical results analysis........................................................................... 58

4.3 MULTICAST TRAFFIC MODEL.................................................................................. 64 4.4 SUMMARY.............................................................................................................. 65

5 SIMULATION DESIGN AND RESULTS COMPARISON WITH PROPOSED SWITCH, INPUT QUEUING SWITCH AND OUTPUT QUEUING SWITCH.......................................................................................... 66

5.1 SIMULATION FOR PROPOSED SWITCH DESIGN ......................................................... 66 5.2 SIMULATION FOR INPUT BUFFERING SWITCH.......................................................... 76 5.3 SIMULATION FOR OUTPUT BUFFERING SWITCH....................................................... 86 5.4 SUMMARY.............................................................................................................. 94

6 SIMULATION DESIGN AND RESULTS COMPARISON UNDER MULTICAST TRAFFIC WITH PROPOSED SWITCH, AND OUTPUT QUEUING SWITCH.......................................................................................... 95

6.1 SIMULATION FOR PROPOSED SWITCH DESIGN UNDER MULTICAST TRAFFIC............. 95 6.2 SIMULATION FOR OUTPUT BUFFERING SWITCH DESIGN UNDER MULTICAST TRAFFIC ..

........................................................................................................................... 102 6.3 SIMULATION FOR PROPOSED SWITCH DESIGN WITH PRIORITY QUEUE STRATEGY

UNDER MULTICAST TRAFFIC................................................................................. 109 6.4 SUMMARY............................................................................................................ 121

7 COMPLEXITY AND FEASIBILITY ANALYSIS OF THE PROPOSED SWITCH DESIGN............................................................................................ 123

7.1 COMPLEXITY AND FEASIBILITY ANALYSIS OF PROPOSED SWITCH......................... 123 7.2 COMPLEXITY COMPARISON WITH TYPICAL ATM SWITCHING NETWORK TOPOLOGIES

........................................................................................................................... 124 7.3 BUILDING A LARGE SCALE ATM SWITCH............................................................. 130 7.4 SUMMARY............................................................................................................ 132

8 CONCLUSION ................................................................................................. 133

REFERENCES.............................................................................................................. 135

viii

APPENDIX A MATLAB PROGRAM FOR CALCULATING THE NUMERICAL RESULTS FOR THE QUEUING MODEL OF THE PROPOSED SWITCH ............................................................................................. 143

APPENDIX B SIMULATION PROGRAM FOR PROPOSED SWITCH........... 147

APPENDIX C SIMULATION PROGRAM FOR INPUT BUFFERING SWITCH.. .............................................................................................................. 151

APPENDIX D SIMULATION PROGRAM FOR OUTPUT BUFFERING SWITCH ............................................................................................ 155

APPENDIX E SIMULATION PROGRAM FOR PROPOSED SWITCH UNDER MULTICAST TRAFFIC.................................................................. 158

APPENDIX F SIMULATION PROGRAM FOR OUTPUT BUFFERING SWITCH UNDER MULTICAST TRAFFIC ................................. 162

APPENDIX G SIMULATION PROGRAM FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL UNDER MULTICAST TRAFFIC ............................................................................................ 165

ix

List of Figures FIGURE 1 BATCHER-BANYAN SWITCHING NETWORK [1]..................... 11 FIGURE 2 INTERNAL BLOCKING IN THE BATCHER-BANYAN NETWORK

.................................................................................................................................... 12 FIGURE 3 KNOCKOUT SWITCH ARCHITECTURE [9] ............................. 14 FIGURE 4 BUS INTERFACE ARCHITECTURE [9] ............................................... 14 FIGURE 5 SHARED MEMORY SWITCH ARCHITECTURE [43] ....................... 15 FIGURE 6 SHARED MEDIUM SWITCH ARCHITECTURE [39]......................... 16 FIGURE 7 CROSSBAR SWITCH ARCHITECTURE [27] ...................................... 17 FIGURE 8 STARLITE SWITCH ARCHITECTURE [4] ...................................... 18 FIGURE 9 KNOCKOUT SWITCH ARCHITECTURE [22] .................................... 19 FIGURE 10 BUS INTERFACE ARCHITECTURE WITH A MULTICAST

FUNCTION [22] ...................................................................................... 20 FIGURE 11 TURNER’S BROADCAST SWITCH ARCHITECTURE [17].......... 21 FIGURE 12 RECURSIVE MULTICAST SWITCH ARCHITECTURE [19]........ 22 FIGURE 13 TONY LEE’S MULTICAST SWITCH ARCHITECTURE [6] ......... 23 FIGURE 14 COPY NETWORK STRUCTURE [6] .................................................. 23 FIGURE 15 ALGORITHM OF CELL REPLICATION [6] .................................... 24 FIGURE 16 SCOQ MULTICAST SWITCH ARCHITECTURE [20] .................... 25 FIGURE 17 COPY NETWORK STRUCTURE [18] ................................................ 27 FIGURE 18 ORCN MULTICAST SWITCH ARCHITECTURE [18] ................... 27 FIGURE 19 PROPOSED MULTICAST ATM SWITCH ARCHITECTURE.

................................................................................................................... 30 FIGURE 20 BUS INTERFACE STRUCTURE ........................................................ 31 FIGURE 21 ATM CELL STRUCTURE .................................................................... 32 FIGURE 22 PROPOSED VCI INFORMATION STRUCTURE............................. 32 FIGURE 23 TABLE LOOKUP ILLUSTRATION FOR LOCN.............................. 33 FIGURE 24 STRUCTURE OF THE RELAY RING ............................................. 34 FIGURE 25 CONTROL LOGIC OF EACH RELAY............................................... 35 FIGURE 26 CONTROL LOGIC FOR PRIORITY QUEUE IN EACH RELAY .. 36 FIGURE 27 PHYSICAL MODEL OF A PARTICULAR OUTPUT....................... 41 FIGURE 28 DISCRETE TIME QUEUING MODEL OF THE ATM SWITCH

DESIGN .................................................................................................... 44 FIGURE 29 STATE TRANSITION DIAGRAM OF THE MODELLED SYSTEM.

................................................................................................................... 46 FIGURE 30 FLOW DIAGRAM FOR A COMPUTER AID PROGRAM TO

CALCULATE .......................................................................................... 54 FIGURE 31 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN

0 AND 0.13 WITH L=3 ........................................................................... 59 FIGURE 32 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN

0 AND 0.13 WITH L=10 ......................................................................... 60 FIGURE 33 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN

0 AND 0.13 WITH L=30 ......................................................................... 60

x

FIGURE 34 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 WITH L=3, 10 AND 30....................................................... 61

FIGURE 35 MEAN QUEUE LENGTH VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 WITH L=3, 10 AND 30 ................................. 61

FIGURE 36 AVERAGE PACKET DELAY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 WITH L=3, 10 AND 30 ..................... 62

FIGURE 37 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 WITH L=3 ...................................................... 62

FIGURE 38 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 WITH L=10 .................................................... 63

FIGURE 39 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 WITH L=30 .................................................... 63

FIGURE 40 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 WITH L=3,10 AND 30 .................................. 64

FIGURE 41 SIMULATION PROGRAM FLOW CHART FOR PROPOSED SWITCH DESIGN .................................................................................. 68

FIGURE 42 PROGRAM FLOW CHART FOR DISTRIBUTING GENERATED PACKETS TO ITS .................................................................................. 69

FIGURE 43 PROGRAM FLOW CHART FOR APPLYING THE RELAY RING .. ................................................................................................................... 70

FIGURE 44 PROGRAM FLOW CHART FOR APPLYING THE FIFO QUEUING SCHEME.............................................................................. 71

FIGURE 45 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13................................................................................................ 72



FIGURE 48 MEAN QUEUE LENGTH VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 .......................................................................... 73



FIGURE 51 PACKET LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13.............................................................. 75



FIGURE 54 SIMULATION PROGRAM FLOW CHART FOR THE SWITCH.. 78 FIGURE 55 PROGRAM FLOW CHART FOR APPLYING A CONTENTION

CONTROL SCHEME............................................................................. 79 FIGURE 56 PROGRAM FLOW CHART FOR COUNTING HOL BLOCKING 80 FIGURE 57 PROGRAM FLOW CHART FOR TRANSMITTING PACKETS

WITH RANDOM SELECTION POLICY............................................ 81

xi

FIGURE 58 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD FOR INPUT QUEUING SWITCH............................................................................... 81



FIGURE 61 DELAY VERSUS OFFERED TRAFFIC LOAD FOR INPUT QUEUING SWITCH............................................................................... 83



FIGURE 64 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD FOR INPUT QUEUING ......................................................................... 84

FIGURE 65 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD FOR INPUT ............................................................................................. 85

FIGURE 66 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD FOR INPUT QUEUING ......................................................................... 85

FIGURE 67 SIMULATION PROGRAM FLOW CHART FOR THE SWITCH WITH OUTPUT ...................................................................................... 88

FIGURE 68 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR THE............................................................................. 89



FIGURE 71 DELAY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR THE ......................................................................................... 91



FIGURE 74 PACKET LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR .................................................... 92


FIGURE 76 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 .......................................................................... 93

FIGURE 77 SIMULATION PROGRAM FLOW CHART FOR PROPOSED SWITCH UNDER MULTICAST TRAFFIC........................................ 97

FIGURE 78 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH L=50 AND BURST TRAFFIC LENGTH=50 ......................................................................... 97

FIGURE 79 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH L=100 AND BURST TRAFFIC LENGTH=100 ....................................................................... 98

xii

FIGURE 80 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH L=200 AND BURST TRAFFIC LENGTH=200 ....................................................................... 98

FIGURE 81 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH L=50,100,200 AND BURST TRAFFIC LENGTH=50,100,200............................................. 99

FIGURE 82 AVERAGE QUEUE LENGTH VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH L=50,100,200 AND BURST TRAFFIC LENGTH=50,100,200 ........... 99

FIGURE 83 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH L=50 AND BURST TRAFFIC LENGTH=50 ............................................... 100

FIGURE 84 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH L=100 AND BURST TRAFFIC LENGTH=100 ............................................. 100

FIGURE 85 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH L=200 AND BURST TRAFFIC LENGTH=200 ............................................. 101

FIGURE 86 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR ............................................................... 101

FIGURE 87 SIMULATION PROGRAM FLOW CHART FOR OUTPUT BUFFERING SWITCH UNDER MULTICAST TRAFFIC ............. 104

FIGURE 88 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR OUTPUT BUFFERING SWITCH VERSUS PROPOSED SWITCH WITH L=400 AND BURST TRAFFIC LENGTH=50 .......................................................................................... 105

FIGURE 89 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR OUTPUT BUFFERING SWITCH VERSUS PROPOSED SWITCH WITH L=800 AND BURST TRAFFIC LENGTH=100 ........................................................................................ 105

FIGURE 90 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR OUTPUT BUFFERING SWITCH VERSUS PROPOSED SWITCH WITH L=1600 AND BURST TRAFFIC LENGTH=200 ........................................................................................ 106

FIGURE 91 DELAY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR OUTPUT BUFFERING SWITCH VERSUS PROPOSED SWITCH WITH L=400 AND BURST TRAFFIC LENGTH=50 ..... 106

FIGURE 92 DELAY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR OUTPUT BUFFERING SWITCH VERSUS PROPOSED SWITCH WITH L=800 AND BURST TRAFFIC LENGTH=100 ... 107

FIGURE 93 DELAY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR OUTPUT BUFFERING SWITCH VERSUS PROPOSED SWITCH WITH L=1600 AND BURST TRAFFIC LENGTH=200 . 107

xiii

FIGURE 94 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR OUTPUT BUFFERING SWITCH VERSUS PROPOSED SWITCH WITH L=400 AND BURST TRAFFIC LENGTH=50 ....................................................................... 108

FIGURE 95 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR OUTPUT BUFFERING SWITCH VERSUS PROPOSED SWITCH WITH L=800 AND BURST TRAFFIC LENGTH=100 ..................................................................... 108

FIGURE 96 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR OUTPUT BUFFERING SWITCH VERSUS PROPOSED SWITCH WITH L=1600 AND BURST TRAFFIC LENGTH=200 ..................................................................... 109

FIGURE 97 SIMULATION PROGRAM FLOW CHART FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL .......... 111

FIGURE 98 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=10 VERSUS OUTPUT BUFFERING SWITCH WITH L=50 FOR BURST TRAFFIC LENGTH=50 AND PRIORITY QUEUE COUNT=5 .......................................................... 111



FIGURE 101 DELAY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH NL=80 VERSUS OUTPUT BUFFERING SWITCH WITH L=80 FOR BURST TRAFFIC LENGTH=50 AND PRIORITY QUEUE COUNT=5 ........................ 113

FIGURE 102 DELAY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH NL=160 VERSUS OUTPUT BUFFERING SWITCH WITH L=160 FOR BURST TRAFFIC LENGTH=100 AND PRIORITY QUEUE COUNT=5 ...................... 113

FIGURE 103 DELAY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH NL=320 VERSUS OUTPUT BUFFERING SWITCH WITH L=320 FOR BURST TRAFFIC LENGTH=200 AND PRIORITY QUEUE COUNT=5 ...................... 114

xiv

FIGURE 104 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD

BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH NL=80 VERSUS OUTPUT BUFFERING SWITCH WITH L=80 FOR BURST TRAFFIC LENGTH=50 AND PRIORITY QUEUE COUNT=5 ..... 114

FIGURE 105 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH NL=160 VERSUS OUTPUT BUFFERING SWITCH WITH L=160 FOR BURST TRAFFIC LENGTH=100 AND PRIORITY QUEUE COUNT=5 ... 115

FIGURE 106 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH NL=320 VERSUS OUTPUT BUFFERING SWITCH WITH L=320 FOR BURST TRAFFIC LENGTH=200 AND PRIORITY QUEUE COUNT=5 ... 115

FIGURE 107 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=10 AND BURST TRAFFIC LENGTH=50 FOR PRIORITY QUEUE COUNT=5 AND 8............ 116

FIGURE 108 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=20 AND BURST TRAFFIC LENGTH=100 FOR PRIORITY QUEUE COUNT=5 AND 8.......... 116

FIGURE 109 THROUGHPUT VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=40 AND BURST TRAFFIC LENGTH=200 FOR PRIORITY QUEUE COUNT=5 AND 8.......... 117

FIGURE 110 AVERAGE QUEUE LENGTH VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=10 AND BURST TRAFFIC LENGTH=50 FOR PRIORITY QUEUE COUNT=5 AND 8 ................................................................................................................. 117

FIGURE 111 AVERAGE QUEUE LENGTH VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=20 AND BURST TRAFFIC LENGTH=100 FOR PRIORITY QUEUE COUNT=5 AND 8 ............................................................................................................... 118

FIGURE 112 AVERAGE QUEUE LENGTH VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=40 AND BURST TRAFFIC LENGTH=200 FOR PRIORITY QUEUE COUNT=5 AND 8 ............................................................................................................... 118

xv

FIGURE 113 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=10 AND BURST TRAFFIC LENGTH=50 FOR PRIORITY QUEUE COUNT=5 AND 8 ................................................................................................................. 119

FIGURE 114 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=20 AND BURST TRAFFIC LENGTH=100 FOR PRIORITY QUEUE COUNT=5 AND 8 ............................................................................................................... 119

FIGURE 115 CELL LOSS PROBABILITY VERSUS OFFERED TRAFFIC LOAD BETWEEN 0 AND 0.13 FOR PROPOSED SWITCH WITH PRIORITY QUEUE BUFFER CONTROL WITH L=40 AND BURST TRAFFIC LENGTH=200 FOR PRIORITY QUEUE COUNT=5 AND 8 ............................................................................................................... 120

FIGURE 116 THREE-STAGE MARS SWITCHING NETWORK ARCHITECTURE................................................................................. 131

FIGURE 117 THREE-STAGE CLOS SWITCHING NETWORK ARCHITECTURE................................................................................. 132

xvi

List of Tables

TABLE 1 CELL LOSS PROBABILITY FOR PROPOSED SWITCH WITH N=16 ..................................................................................................................... 128

TABLE 2 CELL LOSS PROBABILITY FOR PROPOSED SWITCH WITH N=64 ..................................................................................................................... 129

TABLE 3 CELL LOSS PROBABILITY FOR OUTPUT QUEUING SWITCH WITH N=16 ............................................................................................... 129

TABLE 4 CELL LOSS PROBABILITY FOR OUTPUT QUEUING SWITCH WITH N=64 .............................................................................................. 130

xvii

Acronyms

ABR Available Bit Rate

AF Address Filter

ATM Asynchronous Transfer Mode

BBN Broadcast Banyan Network

B-ISDN Broadband Integrated Services Digital Network

CAM Content Addressable Memory

CBR Constant Bit Rate

CDN Cyclic Distribution Network

CRP Contention Resolution Processors

FIFO First In First Out

HOL Head of Line

HPC High Performance Computing

ITU-T International Telecommunication Union Telecommunication

Standardisation Sector

LOCN Logic Output Channel Number

MCN Multicast Channel Number

MGT Multicast and Group Translator

MPLS Multi-Protocol Label Switching

NC Number of Copies

ORCN Output-Reserved Copy Network

QoS Quality of Services

TNT Trunk Number Translator

TDM Time Division Medium

UMI Unicast/Multicast Indicator

VBR Variable Bit Rate

VC Virtual Channel

VCI Virtual Channel Identifier

VP Virtual Path

VPI Virtual Path Identifier

xviii

Publications

[1] Hongxu Chen, J. Lambert, A. Pitsillides, ‘RC-BB Switch: a High Performance Switching Network for B-ISDN’, IEEE Proceedings, GLOBECOM, November 1995.

[2] Hongxu Chen, J. Lambert, A. Pitsillides, ‘Input Queuing Strategy for Batcher- Banyan Switching Network’, Proceedings, ATNAC, December 1995. [3] Hongxu Chen, J. Lambert, ‘Design of a High Performance Multicast ATM Switch for B-ISDN’, IEEE Proceedings, GLOBECOM, November 1998, pp. 1859-64.

1

1 Introduction Asynchronous Transfer Mode (ATM) technology is based on the efforts of the

International Telecommunication Union Telecommunication Standardization Sector

(ITU-T) Study Group XVIII to develop a Broadband Integrated Services Digital Network

(B-ISDN) for the high-speed transfer of voice, video, and data through public networks.

Through the efforts of the ATM Forum, ATM is capable of transferring voice,

video, and data through private networks and across public networks. ATM continues to

evolve as the various standards groups finalise specifications that allow interoperability

among the equipment produced by vendors in the public and private networking

industries, multi-protocol support and mobile ATM. There are also some ATM

applications in the High Performance Computing (HPC) area.

This study aims to overcome some weaknesses in ATM switch technology and

has broader relevance to switch technology in general (not necessarily in the context of

ATM). This may also have some relevance to other high speed switching applications

and to variable size packet switching such as MPLS, but the main focus here is ATM

switches.

This thesis has been delayed due to external circumstance and because my former

supervisor was retired during the last phase of my PhD study.

1.1 Broad objectives

The main objective of this research is to identify the key weaknesses of the major

ATM switch architectures that have been used in the past, and seek to propose an

architecture that will overcome these weaknesses and optimise switch performance.

1.2 Major contributions

The major contributions to this field of study are:

a new multicast ATM switch architecture is proposed that has a simple design,

a simple switching algorithm and also performs well. It is modular, and can

be scaled up to a larger size. Based on reverse Knockout switch architecture,

it was presented in the research paper ‘Design of a high performance

multicast ATM switch for B-ISDN’ in the Proceedings of the 1998 IEEE

GLOBECOM conference — one of the most prestigious international

2

conferences in this field. Performance analysis for the new multicast ATM

switch architecture is provided. A queuing model based on multiple-access

protocol is introduced and the numerical results are plotted and analysed. The

analysis shows that the proposed switch has excellent performance characters.

A performance comparison for input and output buffering switches is

provided and analysed. The comparison shows that the proposed switch

design has better performance than the input buffering switch, and also

performs as well as the output buffering switch that has been proven to have

optimal performance relative to any other buffering strategy

a new mechanism to make cell copy for multicast without using a copy

network is proposed. It has a simple internal bus structure combined with

MGT (multicast and group translator) and TNT (trunk number translator).

This mechanism was also presented in the research paper ‘Design of a high

performance multicast ATM switch for B-ISDN’ in the Proceedings of the

1998 IEEE GLOBECOM conference

a new routing algorithm based on table lookup technique is proposed that can

handle both unicast and multicast packets. The routing algorithm is

implemented in MGT. A new table structure that can handle both unicast and

multicast packets is also proposed. This was presented in the research paper

‘Design of a high performance multicast ATM switch for B-ISDN’ in the

Proceedings of the 1998 IEEE GLOBECOM conference

a new contention resolution mechanism called a relay ring is proposed. A

parallel buffering strategy is introduced to solve the HOL blocking problem

associated with input buffering. The relay ring is introduced to solve

contention among packets for the same output link without the internal

speedup required by output buffering. This was initially presented in the

research paper ‘RC-BB switch: a high performance switching network for B-

ISDN’ in the Proceedings of the 1995 IEEE GLOBECOM conference — one

of the most prestigious international conferences in this field. It was also used

in two additional research papers ‘Input queuing strategy for the Batcher-

Banyan switching network’ in the Proceedings of the 1995 Australian

3

Telecommunications Conference — one of the most prestigious national

conferences in this field, and ‘Design of a high performance multicast ATM

switch for B-ISDN’ in the Proceedings of the 1998 IEEE GLOBECOM

conference

the priority queue switching algorithm to improve switch performance is

introduced and easily implemented within the relay ring. The detailed control

logic of the relay ring for the priority queue switching algorithm is also

presented and described. The priority queue idea was initially presented in the

research paper ‘Queuing in high-performance packet switching’ in the

Proceedings of the 1988 IEEE Journal of Selected Areas Communications

an extensive list of references are provided, including 114 research papers,

journals or books on ATM switching or HPC and queuing theory

well-designed and tested simulation programs for the proposed switch and the

input and output buffering switches are provided. Well drawn and logical

simulation flow charts are described. The detailed C++ programs are

provided in the appendixes. The simulation results are compared and

analysed against numerical results for the proposed switch and the input and

output buffering switches

a multicast traffic model is proposed. Simulation for the proposed switch and

the output buffering switch using the multicast traffic model are provided.

Well drawn and logical simulation flow charts are described. The detailed

C++ programs are provided in the appendixes. Simulation results for the

proposed switch and the output buffering switch are compared and analysed

complexity, feasibility and scalability are provided and analysed. The

analysis is described in a new angle, derived from topology, buffer,

scalability, implementation and functionality aspects. The complexity is also

compared with some other typical ATM switching architectures. The analysis

shows that the new multicast ATM switch architecture has a simple design, a

simple topology and a simple switching algorithm. The architecture is also

modular, scalable and feasible.

4

1.3 ATM switch background

In this section, an overview of ATM switch research is introduced, thus

establishing a framework and perspective for the specific objectives of the research, as

follows:

new communication services such as video on demand, voice-over IP,

teleconferencing and distributed data processing have varying requirements of

bandwidth, transfer delay, tolerated information loss and call blocking probability

ATM was proposed by ITU to address these requirements

Asynchronous Transfer Mode (ATM) is based on the efforts of the ITU-T

Broadband Integrated Services Digital Network (B-ISDN) standard. ATM is a

packet switch technique based on a virtual circuit mechanism. Data flows are

statistically multiplexed and communication resources are dynamically shared

the aim of ATM switch design is to allocate switch and transmission bandwidth to

satisfy the quality of service (QoS) parameters without under utilising network

resources.

A large number of switch architectures are proposed for implementing an ATM

switch [1-65]. We can group them into three major classes: multistage switches, shared

memory or shared medium switches and crossbar switches. In multistage switches such

as the Banyan switch [14] and its many subclasses, internal link blocking can occur due

to more than one cell contending for the same internal link, and output blocking can also

occur due to more than one cell being destined for the same output. In shared memory

[40], [43], [51], shared medium [39] or crossbar switches such as the crossbar switch

[24]， [27] and the Knockout switch [9], internal link blocking cannot occur, but output

blocking cannot be avoided. Queuing is therefore required.

There are three types of queuing: input, shared and output queuing. Input queuing

is good for resolving contention at the input. But it needs contention control logic to

transfer the packet from the input queue to the output port without internal contention.

This approach suffers from Head of Line (HOL) blocking, which has a throughput of

only about 58 per cent with first-in-first-out input queues. Output queuing accepts

packets from every input port simultaneously during one time slot. However, only a

single packet may be served by an output port, thus causing possible output contention.

5

Therefore output queuing is required to solve the contention. Output queuing has the best

performance characteristic. But it needs N times faster internal links in order to accept N

packets destined for the same output port simultaneously from the N input port. The

shared buffer approach still provides for output queuing, but rather than have a separate

queue for each output, all memory is pooled into a shared queue. The shared queue

approach requires fewer queues than output queuing, because several separate queues are

combined in a single memory. More complicated control logic is required to ensure that

the single memory performs the FIFO discipline to all output ports.

The Batcher-Banyan, Knockout and crossbar switches with good interconnection

structures and good performance have received more attention and are preferably chosen

to implement the cell-routing function. But output blocking will certainly occur when

more than one cell is destined for the same output port.

Several studies have been completed on output port contention resolution [3-5],

[30], [31], [53]. The Starlite switch was designed based on the Batcher-Banyan switch by

Huang in 1984 [4]. A trap network is located between the Batcher and Banyan networks.

If there is more than one cell with the same destination address, the Batcher will let one

pass and feed the rest back to the input of the Batcher network for the next cycle. Two

major drawbacks of the trap network are complexity of implementation and the

possibility of causing out-of-sequence cells. Another architecture based on the Batcher-

Banyan switch is the moonshine switch proposed by Hui in 1987 [3]. It deploys a three-

phase algorithm in conjunction with input queuing at each input port. The three phases

are the arbitration phase, which checks whether contending cells are waiting at the input

by sending special ‘messages’ called requests; the acknowledgement phase which

informs the waiting cells of the winning requests selected from those contending requests

in phase one; and the sending phase which sends the winning cells acknowledged in

phase two through the network. Since both phases do not perform real data transfer, but

merely introduce overhead processing, a 14 per cent speedup is required as described by

Hui, which is a drawback of this architecture. Another drawback is that the input buffer

can only achieve about 58 per cent of the maximum throughput.

The switch architecture described so far is based on multistage interconnection

networks comprised of small switching elements. Now a fully interconnected topology,

6

in the sense that every input has a non-overlapping direct path to every output so that no

cell blocking or contention may occur internally, is introduced. The Knockout Switch is

an example [9]. It uses a fully interconnected topology to passively broadcast all input

packets to all outputs. The bus interface performs several functions. First, it filters out all

packets that are not destined to the output port. This operation effectively achieves the

switching function in a fully distributed manner, but requires the N simple filtering

element per output port. Next, it buffers the packets for that output among all input lines.

The buffer is shared, is filled in a cyclical manner, and is served to the output port by

FIFO. The structure of this topology is simple and has low latency, and the bus structure

makes multicast much easier. But there is still a possibility of packet loss in the

concentrator in the bus interface in this switch architecture.

The crossbar switch still has its advantages since it is always possible to set-up a

connection between any arbitrary input and output pair [24], [27]. Internal blocking will

not occur in the crossbar switch. Low latency is another advantage. But output blocking

may still occur and it also uses the maximum number of cross points within the switch

and it is difficult to build a large scale switch.

Another important fact for switch design is the queuing strategy. In input queuing,

a FIFO buffer is placed at each input port [29]. Only the packet at the head of the queue

can be transmitted. The advantage for input queuing is that the packet sequence is always

in order. The drawback is poor performance due to head of line (HOL) blocking. In this

case, the switch interconnection network will also transfer the packet from input buffer to

the output port without internal condition knowledge. Therefore the internal buffering or

internal control mechanism is required for multistage switching architecture. Depending

on the control algorithm, the internal control mechanism could be simple (round robin) or

more complex.

In output queuing strategy, the buffer is located at each output port [24], [26].

Good performance is a proven advantage in [2] and [8]. The drawback is that it needs

internal speedup in the case of multiple packets destined for the same output port. For

multistage switching architecture, it also needs internal buffering or internal control

mechanism for transferring packets from the input port to the output buffer (controlling

the internal collision).

7

In shared queuing strategy, a simple buffer space is shared by all the inputs for

writing and all the outputs for reading [43], [51], [63]. The advantages are proven

performance [2], [96] and everything is under control, such as routing control, signal

control and packet sequence, etc. The drawback is that a fast internal processor and

internal link are needed, therefore hardware limitation exists. It also needs a complicated

switching algorithm and buffer management algorithm.

There is another very special parallel queuing strategy described in [2], [24] and

[61]. Each queue is split into N separate queues, one for each output link. In [24], the

switch is built based on crossbar switch topology. The parallel buffers are placed at each

crosspoint. As it is proven in [24], the performance of this switch is optimal and is just as

good as the output buffering switch. But it uses a maximum number of cross points and

has no multicast function. In [61], the parallel buffers are shared by all input ports and

surely good performance is the advantage. But it has no multicast function and needs a

fast internal link if more than one packet goes to the same buffer.

In addition, multicast is a must support function in the ATM switch design. In the

future many networking services for B-ISDN, such as video on demand, voice-over IP,

teleconferencing and distributed data processing will require transmission of information

from one source to multiple destinations simultaneously. The multicast function in the

ATM switch is designed to support those networking services. Usually a multicast switch

is constructed by cascading a cell copy network and a routing network. A complex issue

for multicast switch design is to generate as many copies as the number of target users

and to route copied cells with a satisfactory performance.

Many multicast switching studies have been reported [17-23], [41], [47-48], [54],

[63-65]. They are mainly grouped into three classes: tandem class [6], [17-18], [47-48],

[63-64], recursive class [19], [21], [54] and multicast bus class [20], [22-23], [65].

The tandem class usually is a copy network followed by a point to point routing

network. In this approach, a multicast cell is replicated in the copy network, which is then

sent through the routing network as a unicast cell. The advantages are constant latency

and no modification is needed in the routing network. However, there are two drawbacks

with this approach: replicated multicast cells must compete with normal unicast cells in

8

the copy network, which can cause cell loss and unicast cells experience additional delays

due to the copy network.

The recursive class recirculates the outputs of the routing network back to the

input. In this approach, cells that cannot reach their destination in the first pass are

recirculated back in order to replicate and try to pass through the routing network again.

The advantage of this approach is that no copy network is needed and the switch structure

is simple, but the cell replication requires cell recirculation which introduces additional

delays and non-constant latency.

The multicast bus class has separate multicast buses. Cell replication can be

performed easily without contending with unicast cells, but additional hardware is needed

for the multicast function. Large-scale implementation is also difficult.

In this section, we have introduced new communication services, ATM, the ATM

switch and ATM switch research, the multicast ATM switch and queuing strategy. By

studying ATM switches and multicast ATM switches, research objectives are established.

and the researcher’s approach is clearly defined.

1.4 Detailed objectives By now, we have analysed the merits and drawbacks for different ATM switching

topology and buffering strategies. This identifies the problems that we are seeking to

solve in this research project.

Our research objectives are to find a switching architecture that is simple,

modular, maintainable, fault tolerant and feasible according to switch design principles.

The proposed switch architecture should:

solve the HOL blocking problem associated with input buffering and the

internal speedup requirement for output buffering and shared buffering

have the multicast function without the duplicated copy network or packet

recirculation that is usually required

seek a switching/routing algorithm that is easy, fast and programmable

demonstrate a performance that is at least as good or better than the output

buffering switch

find a mathematical model for theoretical support

9

develop the simulation algorithms and programs to prove the performance of

the proposed switch architecture

seek an expandable architecture to build a large scale ATM switch based on

the proposed switch architecture

be flexible and provide support for ATM, ABR and VBR traffic.

1.5 Outline of the proposed approach

First, we need to study typical ATM switches and multicast ATM switches. By

analysing their advantages and drawbacks, we should arrive at a new design that can

inherit advantages and overcome drawbacks, thereby:

creating a new multicast ATM switch design

applying a mathematical model to prove the proposed switch design

theoretically and finding a multicast traffic model for multicast simulation

applying simulation and comparing simulation results with numerical results

from a mathematical model to prove the mathematical model

applying simulation to compare new switch design with input queuing switch

and output queuing switch to prove the performance

applying simulation for the new switch design and output queuing switch

under a multicast traffic model and comparing the results to prove the

performance

providing complexity analysis

proposing a topology to build a large scale switch.

1.6 Organisation of the thesis

The rest of the thesis is organised as follows: Chapter 2 offers a literature review

of ATM switch architectures including typical ATM switching network topologies and

typical ATM multicast switching topologies. Chapter 3 describes the new proposed

switch architecture. The queuing model for the proposed switch architecture and its

numerical results are introduced in Chapter 4. A performance comparison against a

mathematical model for the input and output buffering switch using simulation is offered

in Chapter 5. Chapter 6 analyses the performance for the proposed switch design using a

multicast traffic model and compares its performance against the output buffering switch.

10

Chapter 7 provides a complexity and feasibility analysis of the proposed switch design.

This chapter also proposes a solution to build a large scale ATM switch based on the

proposed switch design. Chapter 8 is the concluding chapter in the thesis, followed by the

references and appendixes.

11

2 Literature review of ATM switch architectures Many designs for ATM switches have been proposed. The purpose here is to

review existing switch designs in order to assess their strengths and weaknesses in order

to develop a better design. As a multicast function is required by the ATM switch, the

review of multicast ATM switches is introduced in section 2.2.

2.1 Literature review of ATM switch architectures

In this section, various ATM switches are examined. The switches are classified

according to whether they are a multistage switching network, a fully interconnected

topology, a shared memory and shared medium switch or a crossbar switch.

2.1.1 Batcher-Banyan network

Figure 1 shows a block diagram of an 8 × 8 Batcher-Banyan network. A Batcher

sorting network is placed in front of the Banyan routing network with a shuffle-exchange

network connecting them.

01

01

01

01

01

01

01

01

01

01

01

01

Batcher Network Banyan Network

Shuffle Exchange

Sort 2 Merge 4 Merge 8

011111

010

000001

010011

100101

110111

011

010

111

Figure 1 Batcher-Banyan switching network [1]

The Batcher network is constructed of sorting elements (each with two inlets)

which compare the bits of the destination address and switch to either ‘pass’ or

‘exchange’. In Figure 1, it sorts the cells in ascending order according to their destination

address and places the cell with the lower address at the upper outlet of the Batcher

network. The arrow in Figure 1 points to the outlet at which the larger number is to be

routed. If only a single cell is present at one of two inlets, it will be taken as the lower

12

value [33]. Figure 1 shows how the Batcher sorting network sorts three cells with

destination addresses 010, 011, 111. After cells arriving at the inlets of the Banyan

network are already sorted into ascending order. This ensures non-blocking inside the

Banyan network if no multiple cells are destined to a single outlet as proven by Lee in [6].

The banyan network is a simple self-routing network constructed from 2 × 2

switching elements. Each element is a 2 × 2 crossbar switch and routes the incoming cell

according to the value of a bit in its destination address, 0 or 1. No matter from which

inlet a cell enters the network, it will always be routed stage by stage to the outlet which

corresponds to its destination address. Figure 1 shows how three cells with destination

addresses 010, 011, 111 are routed to their own outlet via the Banyan network.

Now the problem with a non-blocking switch network is that output port blocking

will certainly occur and internal link blocking may occur as well if no buffering strategy

is adopted, and more than one cell with the same destination address arrives at the

Banyan routing network. Figure 2 shows how internal link blocking occurs when two

incoming cells have the same destination address.

01

01

01

01

01

01

01

01

01

01

01

01

Batcher Network Banyan Network

Shuffle Exchange

Sort 2 Merge 4 Merge 8

011

011

000001

010011

100101

110111010

010

011

011

Figure 2 Internal blocking in the Batcher-Banyan network

A rectangular N x N (For N = 2 to 2n) Banyan network is constructed from

identical 2 X 2 switching elements with Log2N stages; each stage consists of N/2

switching elements. This makes it much more suitable than the crossbar structures for the

construction of large switch fabrics. Unfortunately, the Banyan is an internal blocking

network and its performance degrades rapidly as the size of the network increases. The

13

number of elements in the Batcher sorting network is (N/4)*((log2N)2 + log2N).

Achieving the Batcher network becomes increasingly more difficult for larger

switch sizes, because the placement/arrow of the sorting element in the Batcher network

will become more and more complex, especially during the sorting stage and the first

merging stage. In addition, the growth of a Batcher network is of the order of N*log2N2,

so many more switching stages are required in the Batcher network than in the Banyan

network.

2.1.2 Knockout switch

The Knockout switch architecture is shown in Figure 3. It uses a fully

interconnected topology to passively broadcast all input packets to all outputs [9]. It has

two basic characteristics: 1) each input has a separate broadcast bus 2) each output has

access to the packets arriving at all inputs. Figure 3 illustrates these characteristics where

each of the N inputs is placed directly on a separate broadcast bus and each output

passively interfaces to the complete set of N buses. This simple structure has several

important features: 1) with each input having a direct path to every output, no internal

switching collision occurs. The only congestion in the switch takes place at the interface

to each output where multiple packets can arrive simultaneously with the same output

address, 2) the switch architecture is modular, 3) the bus structure can achieve a higher

transmission rate, 4) it is easier to implement broadcast and multicast functions with the

bus structure.

Figure 4 illustrates the architecture of bus interface associated with each output in

the Knockout switch. It has three major components: the packet filter, concentrator and

shared buffer. The packet filter checks incoming packets that will be discarded if their

address is not for this output port. The concentrator achieves an N to L concentration of

the input lines. Maximum L packets can make it through the concentrator if there are

more than L packets destined for this output port. This is where the cell loss occurs. The

Knockout tournament principle is implemented to achieve the concentration. It is from

this principle that the term Knockout Switch originates. The shared buffer consists of a

shifter and L separate FIFO buffers. It is the equivalent of a single queue with L inputs

and one output under FIFO discipline. Complete sharing of the L FIFO buffers is

controlled by the shifter that works in a cyclic manner.

14

Figure 3 Knockout switch architecture [9]

Figure 4 Bus interface architecture [9]

In the Knockout switch, the number of cell filters is N2 and the number of the

concentrator is N. The number of interconnection wires in the crossbar-link network is N2.

In the concentrator, more and more knockout tournament stages and tournament elements

Bus Interfaces . . .

1 2 N

Inputs 1 2 : N

Outputs

1 2 L

1 2 L

CONCENTRATOR

SHIFTER

PACKET FILTERS

PACKET BUFFERS

OUTPUT

1 2 3 4 5 N

INPUTS

15

will be required when the switch size N grows. It is a good alternative for constructing

non-blocking switching elements and switches of modest size.

2.1.3 Shared memory and medium switch

The shared memory switch architecture is illustrated in Figure 5. In this approach,

there is only one buffer that is shared by all input and output ports [40], [43], [51]. The

memory controller decides the order in which cells are read from the memory. It has the

better performance in terms of throughput versus cell loss under heavy load. But the

drawback is that the shared memory must operate N times faster than the port speed. In

addition, the control algorithm for the memory controller is very complicated. Therefore

a faster processor is required and more software control algorithms are involved in the

memory controller.

A shared memory switch has very simple hardware architecture compared with

other switch architectures described in this section. But as it needs N times speed for

memory access, it is not suitable for large N due to physical limitation. It can only be

used to build switching elements that can be used as a building block in larger multistage

switching systems.

Figure 5 Shared memory switch architecture [43]

Figure 6 shows the shared medium switch architecture. The shared medium could

be a ring, bus or dual bus. Time-division multiplexed buses are a popular example of this

approach [39]. The incoming packets are sequentially broadcast on the TDM bus in a

round-robin manner. At each output, address filters pass the appropriate cells to the

Memory

S/P

S/P

.

.

Controller

WA/RAHeaders

1

N

S/P

S/P

.

.

1

N

16

output buffers based on their routing tag. The bus speed must be at least N times faster

than the port speed to eliminate input queuing. This will place a physical and hardware

technology limitation on the switch specification. In addition, the address filters and

output buffers also have to operate at bus speed to avoid cell loss due to multiple packets

destined to the same output port.

The shared medium switch has very simple hardware architecture. But as it needs

N times speed for TDM bus, it is not suitable for large N due to physical limitations. It

can only be used to build switching elements that can be used as a building block in

larger multistage switching systems.

Figure 6 Shared medium switch architecture [39]

2.1.4 Crossbar switch

The crossbar switch is the simplest example of a matrix-like space division fabric

that physically interconnects any of the N inputs to any of the N outputs as shown in

Figure 7. It is easy to implement. The address filter (AF) at each cross point checks the

incoming packet to read for the output address that the AF is assigned to. It will then pass

the incoming packet to the output port. The controller is responsible to solve the output

contention in case more than one packet is destined for the same output port. Various

algorithms for contention resolution can be implemented such as round robin or random

selection. There is no internal blocking that exists for this topology, but in order to avoid

the packet loss at the output port, either the output port and the controller must operate at

N times faster than the input port or output buffers must be placed at each cross point

[24], [27].

S/P

S/P

.

.

1

N

T D M B U S

AF

AF

.

.

P/S

P/S

.

.

1

N

Buffers

17

Figure 7 Crossbar switch architecture [27]

Crossbar designs have a complexity in paths or cross points that grows as a

function of N2 where N is the number of input ports or output ports for the ATM switch.

Thus, they do not scale well to large sizes. They are however very useful for the

construction of non-blocking, self-routing, switching elements and switches of modest

size.

2.1.5 Summary

In this chapter, four typical ATM switch topologies are introduced. Their

advantages and drawbacks are analysed. So far, most ATM switch architecture designs

have been based on the above topologies [1-65]. In principle, contention resolution,

performance, multicasting, scalability and complexity are the issues that researchers

address. Many approaches have been proposed to improve or solve one or more of these

issues. In the following section, more multicast ATM switches based on these four

topologies are studied and analysed.

.

.

.

AF

AF

AF

C O N T R O L E R

AF

AF

AF

CONTROLER

AF

AF

AF

C O N T R O L E R

. . . .

. . . .

1

2

N

1 2 N

18

2.2 Literature review of multicast ATM switch architecture

Many designs for multicasting ATM switches have been proposed [114]. The

purpose here is to review existing designs in order to assess their strengths and

weaknesses in order to propose a better design.

In the following sections, various multicast ATM switches are examined. The

switches are classified according to copy networks and recursive structure. 2.2.1 Starlite switch

Figure 8 Starlite switch architecture [4]

The structure of the Starlite switch [4] is shown in Figure 8. It is built and based

on the Batcher-Banyan switch architecture. It carries out multicasting by a two-stage

copy network placed above the concentrator, as shown above. The first stage is a sorting

network. New cells entering the switching fabric are put into input ports on the left.

Multicasting cells are put into input ports on the right. Multicasting cells are special cells

that contain the channel identification of the cells they want to copy and the destination

port to which the copy is to be sent. The sorting network sorts the cells based on their

source addresses (channel identification), so that each new cell appears next to

multicasting cells that want to copy it. The copy network then duplicates each new cell

…

Sorter

…

Copy

…

… Concentrator

19

into all of its copies and injects the cells into the concentrator. In the previous cell

replication process, the Starlite switch assumes the synchronisation of the source and

destinations, and an ‘empty packet set-up procedure’ is also required. But it is not

feasible to implement this approach in a broadband packet network in which packets may

usually experience delay variation due to buffering, multiplexing and switching in a

multiple-hop connection.

2.2.2 Knockout switch

Figure 9 Knockout switch architecture [22]

The original Knockout switch [9] does not support multicasting. To support

multicasting, in addition to the N bus interfaces, M multicast modules [22] are specially

designed to handle multicast packets. As shown in Figure 9, each multicast module has N

inputs and one output. The inputs are the signals from the input interface modules. The

output drives one of M bus for broadcasting to all the N bus interfaces. There are two

proposed approaches to implement the multicast module.

A block diagram of the multicast module is illustrated in Figure 10. The incoming

cells are selected through the cell filters which are set to accept multicasting cells only.

20

The selecting principle adopted is the same as in the original Knockout switch: an N-L

(L<<N) Knockout concentrator is used and L ‘winners' from the N-L concentrator are

stored in an L-input, one-output FIFO buffer after proper shifting. Upon exit from the

buffer, a multicasting cell enters into the cell duplicator to duplicate cells with different

destination addresses in the header. The duplicated cells are sent along the broadcast bus

to the required bus interfaces. In this scheme, the various destination addresses of

replicated cells are obtained by table lookup.

Figure 10 Bus interface architecture with a multicast function [22] Knockout is based on a crossbar network and therefore performs best. It has a

simple principle for multicasting and cell duplication. But extra hardware and buses are

required for multicasting.

2.2.3 Turner’s broadcast switch

The switch fabric architecture of Turner’s broadcast switching network [17] is

shown in Figure 11. This is also an ATM switch based on Banyan switching topology.

21

Figure 11 Turner’s broadcast switch architecture [17] Where CP is the Connection Processor, CN is the Copy Network, PP is the Packet

Process, DN is the Distribution Network, BGT is the Broadcast and Group Translator and

RN is the Routing Network.

When a broadcast cell having K destinations passes through the copy network, it

is replicated so that K copies of that cell emerge from the copy network. Unicasting cells,

pass through the copy network without any change. The broadcast and group translators

will then determine the routing information for each cell in the rest of the switch. Based

on translated routing information, the distribution and routing networks route the cells to

the proper outgoing packet processor.

The novelty of Turner’s switch is its clear design logic and flexible broadcast

capability. However, Turner’s switch is a blocking switch. When two cells arrive at the

same switching element in the routing network, collision will occur if these cells attempt

to exit on the same output link. Therefore, buffers are required for every internal node in

the routing network to prevent packet loss and cell collision. An extra copy network is

required in order to implement multicast function.

2.2.4 A recursive multistage structure for multicast ATM switching

This switch provides self-routing switching nodes with a multicast facility, based

on a buffered N x N multistage interconnection network (a kind of Banyan network) with

external links connecting outlets to inlets [19]. The structure of this multicast switch is

shown in Figure 12. Such a network is able to route a cell to the addressed output for

transmission and generate M copies of the same cell (with M<<N) on M pre-defined

adjacent outlets. M is the ‘multiplication factor’ of the multicast connection network

22

(MCN). In order to reach more than M outputs, some of the M cells generated in the first

crossing of the network are recycled back to the corresponding inputs, and each generate

other cells until the requested number of copies are obtained.

If a copy bit in the routing tag is set to 0 (unicast cell) for a MCN with a general

multiplication factor M, the input cell is addressed to a single output line. Likewise, if the

copy bit is set to 1 (multicast cell), C copies (with 2<=C<=M) are simultaneously

generated on C consecutive outlets, starting from the addressed output. The number C is

specified in the routing tag.

If the requested copy number B (with 2<=B<=N) is less or equal to M, all copies

can be generated in a single network crossing. If B>M, more crossings are necessary. The

concentrators at the inlets merge input and recycling cells, while the binary switches on

the outlets manage forwarding and recycling by routing the cells respectively toward their

upper or lower output line.

Although the proposed recursive mechanism seems very simple, the switching

elements located inside the switch architecture have to perform more operations. This

will make the switching elements more complex and hardware complexity will be high.

In addition, recycle lines will introduce delay to the switch, and cells leaving the switch

might be out of sequence.

Figure 12 Recursive multicast switch architecture [19]

23

2.2.5 Tony Lee’s multicast switch

Figure 13 Tony Lee’s multicast switch architecture [6]

Figure 14 Copy network structure [6]

The copy network structure is illustrated in Figure 14. It consists of a broadcast

Banyan network with switch nodes capable of cell replication. When a multicasting cell

contains a set of arbitrary n-bit destination addresses and arrives at a node in stage k, the

cell routing and replication are determined by the k bit of all the destination addresses in

the header. If they are all 0 or 1, the cell will be sent to the upper or lower link

respectively. Otherwise, the cell and its replica are sent on both links with the following

modified header information: the header of the cell sent to the upper or lower link

contains these addresses in the original header with their k bit equal to 0 or 1 respectively.

The modification of cell headers is performed by the node whenever the cell is replicated.

In this way, the set of paths from any input to a set of outputs forms a (binary) tree

24

embedded in the network, and it will be called an ‘input-output tree’ or ‘multicasting

tree’. Figure 15 shows an example of the corresponding multicasting tree.

Figure 15 Algorithm of cell replication [6]

The actual Lee’s multicast switch [6] is shown in Figure 13. When the

multicasting cells are received at the running adder network, the number of copies

specified in the cell headers is calculated recursively. The dummy address encoders form

new headers consisting of two fields: a dummy address interval and an index reference.

The dummy address interval, formed by adjacent running sums, is represented by two

binary numbers, namely, the minimum and maximum. The index reference is equal to the

minimum of the address interval. It is later used by the trunk number translators to

determine the copy index. The broadcast Banyan network replicates cells as shown in

Figure 15. When copies finally appear at the outputs, the trunk number translators

compute the copy index for each copy from the output address and index reference. The

broadcast channel number with the copy index forms a unique identifier for each copy.

The trunk number translators then translate this identifier into a trunk number, which is

added to the cell header and used by the switch to route the cell to its final destination.

There are two problems with this design. The first is the copy network capacity’s

overflow. It means the total number of copies requested exceed the number of outputs in

the copy network. The second problem is output port conflicts in the routing network and

this occurs when multiple cells request the same output port concurrently.

Beyond the previous discussion, there is also a serious problem inside the

broadcast Banyan network. When there is more than one multicasting tree in the network,

it might be that an internal link is used by more than one tree. This phenomenon is called

‘internal conflict’ or ‘internal blocking’. This phenomenon degrades switch performance

25

and increases the cell loss probability in the switch. In addition to the extra copy network

that is required, the actual cell duplication algorithm inside the copy network is complex

and difficult to implement.

2.2.6 SCOQ Switch

Figure 16 SCOQ multicast switch architecture [20] Where SN is the sorting network and SM is the switching module.

Figure 16 shows the multicasting SCOQ switch [20], built and based on the

Batcher-Banyan network. It is modified from the original [7], with a copy network in the

feedback loop. In a multicasting SCOQ switch, there is a sorting network, L switching

modules, a copy network and N input buffers. The sorting network and the switching

modules operate as in the original SCOQ switch. While transmitting the multicast and

broadcast cells, these cells will be fed back through the copy network. Cells will be

duplicated in the copy network according to their requests, and the destination addresses

of replicated cells will be assigned by the trunk number translators inside the copy

network. A multicasting cell will then be replaced by several unicasting cells.

Cell duplication in the multicasting SCOQ switch is performed in the feedback

configuration. The advantages of this method are: (1) there is no interference between

unicasting cells and the duplication of multicasting cells, (2) the copy network is non-

blocking, even without an extra selection or arbitration mechanism, and (3) the buffers at

the input ports operate independently and without central control.

Although the feedback duplication has the above advantages, this presents a serious

problem in itself. The copy network has a similar principle to that of Tony Lee’s.

26

Therefore it inherits particular drawbacks as described in section 3.5. The main drawback

is that more multicast trees cross each other and an internal link in the copy network

might be used by more than one tree, which causes internal blocking when there are more

and more cells requesting replication at the same switch cycle. This will rapidly degrade

and destabilise switch performance [114]. In addition, extra sorting and copy networks

are required to perform the multicasting function.

2.2.7 ORCN multicast switch

The shared-buffer copy networks in the various multicast architectures can be

divided into two categories from the viewpoint of cell replication mechanism: (1) those in

which copies are generated recursively, that is, some are made by feeding some copies

cells back to the input side of the network, and (2) those in which copies are generated by

a Broadcast Banyan network (BBN) where output ports are reserved before replication.

The former is called the ‘recursive copy network’ (RCN), and the latter is the ‘output-

reserved copy network’ (ORCN). The BBN has already been discussed in the earlier

section on Tony Lee’s multicast switch [6] and the ORCN is shown in Figure 17.

The ORCN [18] consists of a cyclic distribution network (CDN), a set of

contention resolution processors (CRP), a BBN, and a set of Trunk Number Translators

(TNTs). The CRPs are coordinated through a token ring. The objective of the CDN is to

distribute the master multicasting cells to the CRPs cyclically and this will ensure that all

CRPs are shared uniformly. Furthermore, by making the active incoming master

multicasting cells cyclically concentrated and the corresponding outputs sequence of the

master multicasting cells monotonic, cells will not block in the BBN.

The CDN consists of a running adder network and a reverse Banyan network. The

main functions of the CRPs are (1) to store the master multicasting cells distributed by

the CDN and process them in FIFO order, and (2) to update the header of the master

multicasting cell in order to reserve as many consecutive outputs of the BBN as the

number of copies requested. The combination of the CDN and a token-ring reservation

scheme ensures the cyclically non-blocking property of the BBN.

27

Figure 17 Copy network structure [18]

Figure 18 ORCN multicast switch architecture [18]

Copy networks as extra hardware are still required. Although the design of the

copy network and cell duplication is logical, it requires a cyclic distribution network, a

token ring and a broadcast Banyan network, which is redundant and complicated.

28

2.2.8 Summary

In this chapter, seven typical multicast ATM switching topologies are introduced

and their advantages and drawbacks are analysed. Most multicast ATM switch designs

are based on these typical topologies [1-65]. Many approaches have been proposed to

improve or solve one or more drawbacks. In the following chapter, a new multicast ATM

switch design is proposed which is based on a modified reverse Knockout switch, with

the bus interfaces redesigned to suit the multicast requirement. We will address the

drawbacks of typical multicast switching topologies.

29

3 New multicast switch architecture The design of this switch is based on the idea that output contention is likely to

come in bursts. But in the long term, the incoming traffic is likely to be managed so that

the combined rate of cells for any given output is statistically within the capacity of that

output port. This means that the switch fabric would not require speedup, provided there

was an available path from every input to every output at the line rate of the output.

The following main features are required for a better ATM switch:

• simple

• modular

• maintainable

• fault tolerant

• feasible

• the HOL blocking problem associated with input buffering and the internal

speedup requirement for output buffering and shared buffering is solved

• should have the multicast function without the duplicated copy network or packet

recirculation that is usually required

• avoid internal speedup

• seek a switching/routing algorithm that is easy, fast and programmable

• performance should be at least as good or better than the output buffering switch.

The design of the proposed multicast switch is shown in Figure 19. It is

constructed based on the reverse Knockout switch. The outputs of the Knockout switch

become the inputs of the proposed switch, and the inputs of the Knockout switch become

the outputs of the proposed switch. The bus interfaces are responsible for cell routing,

cell duplication and cell buffering. A parallel buffer strategy is adopted in each bus

interface to solve the HOL blocking problem with input buffering and this strategy also

achieves high performance. The relay ring for each output link is introduced to resolve

contention among packets that are waiting for the same output link without the need for

internal link speedup required for output buffering and shared buffering. A fast table

lookup combined with a special purpose decoding strategy (CAM technique) [25] is used

in each bus interface to ensure fast cell duplication for multicast switching and fast cell

routing.

30

Figure 19 Proposed multicast ATM switch architecture

3.1 Bus interfaces As mentioned above, bus interfaces are responsible for cell routing, cell

duplication and cell buffering. The internal structure of the bus interface is shown in

Figure 20. The incoming cell is placed in the latch. The MGT retrieves the address

information from the latch, decodes it, and triggers a related TNT (for a unicast cell) or

related TNTs (for a multicast cell). The latch is also triggered by MGT when the MGT

successfully decodes the address information. The transmission bus broadcasts the

incoming cell to the TNT associated with each output link. If that TNT is triggered by

MGT, it will allow the cell to pass through, otherwise it will block the cell. In the case of

unicast, the triggered TNT will allow the incoming cell to pass straight through. For

multicast, it will insert its output link number in the copied multicast cell header, then

pass it through. In each bus interface, the buffers are organised in a parallel fashion. Each

buffer is associated with a designated destination address. It has been proven by [2] and

[24] that this parallel buffering strategy has optimal performance (as the output buffering

strategy does). Each occupied buffer sends a request signal to the relay ring for each time

:

Bus Interfaces . . .

1 2 N

Inputs

Outputs

1 2 : N

NRs

NRs

NRs

. . . . . . . . .

N signals from parallel buffers

NRs -- N independent Relays

31

slot. The relay ring provides contention resolution between bus interface, then releases

only one buffer among those competing for the same destination address.

Figure 20 Bus interface structure

3.2 ATM cell information structure

The ATM cell structure has been defined by international standard organisation

(ITU) as shown in Figure 21. The two address information fields are called VPI and VCI

respectively. A virtual connection (VC) will carry data, voice or video. Each type of

traffic requires at least one unique VC. The ATM switch will route the cells based on

either the VPI or VCI. The VPI or VCI works in a similar way to the DLCI in Frame

Relay and LCN in X.25. In our design, the VPI or VCI is proposed in the following

format as shown in Figure 22, where the UMI is the Unicast/Multicast Indicator, the NC

is the number of Copies and the MCN is the Multicast Channel Number.

Trigger Signal

Trigger Signal

. . . . .

Relay Rings N 2 1

Latch MGT Address

Transmission Bus

TNT TNT TNT

. . .

. . . . .

Buffer1 Buffer2 BufferN

Output 1 2 : N

...

...

32

Figure 21 ATM cell structure

Figure 22 Proposed VCI information structure

3.3 MGT translation table

The MGT is responsible for cell routing and cell duplication for multicast. The

transmission bus in the bus interface is particularly good for cell broadcast and multicast.

With MGT and the transmission bus, there is no need to use a redundant copy network. A

fast routing table lookup with CAM (Content Addressable Memory) technique is used

[25]. The table is organised as shown in Figure 23 and is grouped according to the copy

number in the header information of the multicast packet. The unicast and multicast

addresses are distinguished by the most significant bit of the packet address (‘0’ means

individual address and ‘1’ means group address, as defined for Ethernet addressing).

When a packet arrives the MGT extracts its packet address then determines whether it is

a unicast or multicast address. If it is the latter, the MGT must read the NC field to

determine which address group to search. In the selected address group from NC, the

destination address for unicast or MCN for multicast is used to match the table entry in

that group. When the entry is found, the associated LOCN will be read to trigger the

relevant TNT or TNTs.

Generic Flow Control (GFC)

Virtual Path Identifier VPI(High)

Virtual Path Identifier VPI(Low)

Virtual Channel Identifier

Virtual Channel Identifier VCI(Middle)

Virtual Channel IdentifierVCI(High)

Payload Type CLP

Header Error Control (HEC)

Data 48 bytes Payload

UMI NC MCN Destination Address

33

Figure 23 Table lookup illustration for LOCN

3.4 Trunk number translator (TNT) TNTs are organised in parallel fashion. Each TNT is associated with a particular

output link and acts as a switch controlled by LOCN from the MGT. When TNT receives

a ‘1’ from the MGT, it opens its circuit and connects the Transmission Bus to the

associated buffer. Another function of TNT is the trunk number translation. When a

packet comes in, the TNT checks whether it is a unicast or multicast packet. If it is the

former, the TNT will pass it straight through. Otherwise the TNT will copy its pre-stored

output trunk number into the header of the multicast packet and then pass it to the buffer.

Unicast packet

NC=2

NC=3

NC=4

DestinationAddresses

Multicast Channel Numbers (MCNs)

Multicast Channel Numbers (MCNs)

BroadcastChannel Number (BCN)

Packet Header Information

0 0 0 10 0 1 00 1 0 01 0 0 0

0 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 0

0 1 1 11 0 1 11 1 0 11 1 1 0

1 1 1 1

34

3.5 Relay ring The relay ring is responsible for contention control. Each ring is looking after one

output destination and each relay is looking after one input port. The structure of the relay

ring is shown in Figure 24. It consists of N relays which are connected in a ring and

controlled by the priority token (status register in each relay) and the request signals from

the buffers for the same destination, but in different bus interfaces. Each relay has the

logic function shown in Figure 25, where Rij is an input to a relay and represents a request

from the buffer for i input port (i bus interface) travelling to j output port. ENi is another

input to a relay, and it comes from the previous relay in the ring and represents if

previous relay release its priority token and the ENi status decides if the delay is able to

send acknowledge Aij. Status (priority token) is an internal register in a relay and is used

to remember whether this relay sent out a packet last time and if this relay now holds a

priority token. Aij is an output from a relay and acknowledges the buffer that sends Rij so

that it now sends a packet. ENi+1 is another relay output and passes packet rights to the

next relay in the ring. ‘0’ represents no, ‘1’ represents yes and ‘x’ represents any in

Figure 25.

Each relay is an electronic switch which accepts a request signal Ri from the bus

interface i, and decides whether to acknowledge Ai to output the buffer contents in the

bus interface i, or to transfer the enable signal to the next relay in the ring. Inside each

relay, there is a status register (priority token) which is used to remember if this relay sent

out a packet last time, in which case it should give away its priority to the next relay. The

relay order in a ring affects the priority allocated to the individual relay. The relay on the

left-hand side has a higher priority.

Figure 24 Structure of the relay ring

Request Signal from the Buffers for the same destination, but in different bus interfaces

R1 R2 RN. . .

Only the request whose relay won the competition can be acknowledged

35

At the beginning of each time slot, the relay whose status register is set sends out

the enable signal. Initially the status register in the far right-hand side relay is set so that

the relay on the far left-hand side has the highest priority. If a request signal arrives, it

will send an acknowledgment back to the caller and disable subsequent relays in the ring

at the same time. Otherwise it will pass the enable signal to the next relay in the ring and

allow it to make a decision. In this way, the enable signal passes through the whole ring

until it encounters the first relay with an active request in the current time slot. This

request is acknowledged and all further relays are subsequently disabled. Therefore at a

given time slot only one acknowledgment signal can be sent out by the relay ring, even if

more than one relay is triggered. The relay ring that sends out the acknowledgment then

sets its status register and holds the priority token. In this way it starts to send out the

enable signal in the next time slot and gives away its priority token. The enable signal

will continue to travel through the relay ring until it hits an active request. The relay

triggered by that request will subsequently send out an acknowledgment, thus disabling

all subsequent relays, and this will set its status register to hold the priority token.

Figure 25 Control logic of each relay

The concept of the priority token allows dynamic priority reassignment which

maintains fairness for each bus interface. Each relay ring is responsible for resolving

contention among packets from each bus interface with the same output destination.

A different contention control algorithm can be implemented by a combination of

the buffer control and relay ring. For example, Rij is set to 11 by the buffer control,

indicating that the buffer needs high priority. Therefore after sending out the current

packet in the buffer, the relay won’t release the priority token so that this buffer can keep

Status ENi Rij Aij ENi+1 Status

0 0 0 0 0 1 0 1 0 0 1 1 1 x x

0 0 0 0 0 0 0 1 0 1 0 1 0 1 0

Relay

Aij

ENi ENi+1

Rij

36

sending packets in the next time slot. In order to stop the buffer and to hold the priority

token for a long time, the status register can be used to calculate how many times this

buffer has held this priority token. After a certain amount of time has passed, the relay

must release the priority token and give the other buffers a chance to send packets.

Figure 26 Control logic for priority queue in each relay

Figure 26 shows the control logic in each relay for the priority queue algorithm,

where Rij, ENi, Aij, status and ENi+1 have exactly the same meanings described in Figure

25. But Rij is different from the Rij in Figure 25, where ‘00’ represents no, ‘10’ represents

one transmission right, ‘11’ indicates multiple transmission rights and ‘x’ represents any.

Status is also different. The ‘s’ value in Status represents how many times this buffer can

transmit its packets continuously. It can be a fixed preset value or set dynamically by the

buffer control. When ENi is ‘1’ and Rij is ‘10’ and Status is ‘0’, it means that buffer i

requests one transmission right. Therefore this relay sends Aij = 1 to acknowledge buffer i

to transmit and set the Status register to ‘s’ value at the same time. This replay will then

release its transmission rights in the next time slot. When ENi is ‘1’ and Rij is ‘11’ and

Status is ‘<s’, it means that buffer i requests multiple transmission rights. The relay sends

Aij = 1 to acknowledge buffer i to transmit and increment the Status register by 1 at the

same time. This relay will then hold its transmission rights in the next time slot. The relay

will continue to operate in this way until its Status register reaches ‘s’ value, which

means that this replay will release its transmission rights in the next time slot. The buffer

control algorithm and the holding times in the relay can be configurable and

programmable. The buffer control will check to ensure that the buffer has reached a

Status ENi Rij Aij ENi+1 Status

0 0 00 0 0 10 0 0 11 0 1 00 0 1 10 <s 1 11 s x x

0 0 0 0 0 0 0 0 0 0 1 0 1 0 s 1 0 +1 0 1 0

Relay

Aij

ENi ENi+1

Rij

37

certain length. If this is the case, it will send Rij =11, otherwise it will send Rij =10. When

Rij = 11 and ENi=1, the relay will send Aij = 1, and will also check to ensure that the status

register is equal to a pre-configured value ‘s’ at the same time. If the buffer has not

reached a certain length, it will increment the status register by 1 and hold ENi=1 (priority

token). Otherwise it will pass the priority token to the next relay (ENi+1=1) in the ring.

Using the same principle, the other algorithm such as the longest queue can be

easily implemented. For the longest queue, we only need to change the buffer control

algorithm so that Rij =11 can be generated for multiple transmission rights. The buffer

length should be checked in the algorithm in order to decide which has the longest queue.

The easiest way to check for the longest queue for the parallel buffer scheme in the

proposed switch is to set a threshold in each bus interface and when the threshold is

crossed, the Rij =11 for that buffer will be generated. The relay algorithm is the same as

that shown in Figure 26 for the priority queue scheme. Actually there is no structural

difference required for the buffer control algorithm between the priority queue and the

longest queue as they both check buffer length. The only difference is probably the

meaning of the threshold value for both algorithms.

3.6 Timing diagram of the proposed switch

To understand more clearly how the proposed switch processes the incoming

packet, the time diagram is given in this section.

When a packet arrives at a particular input port, it travels through a parallel to a

serial (P/S) converter and enters the latch. As soon as the packet header enters the latch,

(which requires 5 bytes or 40 bit times), the MGT starts to retrieve the packet header,

performs a routing table lookup, and triggers relevant buffers to transfer the packet in the

latch to the triggered buffer/buffers. When the packet enters the buffer/buffers, the R

signal is sent to the relay ring which scans through the ring and triggers the buffer

transmission with the A signal, once the logic condition of the first relay is met. This

should take a relatively short time as the relay ring has only N relays to scan at the most.

The packet transmission will travel through the serial to the parallel (S/P) converter for

the output. Input, buffering and the relay ring are clocked according to the input line that

38

will take exactly one time slot, which is the switch processing time. The output is clocked

according to the output line rate and triggered by the relay ring.

3.7 ABR and VBR traffic discussion

ATM switches need to handle ABR and VBR traffic as defined by the ATM

forum and in [16]. ABR stands for Available Bit Rate. It is one of five ATM forum

defined service categories. In this service type, the network makes the best effort to pass

maximum cells but cannot absolutely guarantee cell delivery. ABR supports variable bit

rate data traffic with flow control, a minimum guaranteed data transmission rate and

Input timing

Buffering timing

Relay ring timing

Output timing

Receiving start

Buffering start after header processing

Relay ring logic start

Receiving and Buffering and Relay ring logic end Transmitting start

Transmitting end

39

specified performance parameters. In turn for regulating user traffic flow, the network

offers minimal cell loss of accepted traffic. VBR stands for Variable Bit Rate. VBR

Traffic contains bursts, but is centred around an average bandwidth. VBR, when divided

into real time (RT-VBR) and non-real-time (NRT-VBR) traffic requires the same service

guarantees, that is, delay, cell loss and timing provided by the CBR.

In order to handle ABR and VBR traffic, there is usually a reliance on large

buffers in the switch to smooth traffic bursts. With the proposed switch modular

architecture, as per customer service subscription, each relay associated with a customer

subscription in the relay ring can be programmed for different control logic, such as the

priority queue and longest queue shown in the previous section, for optimising buffer

behaviour due to ABR and VBR traffic. In Chapter 7, we shall show that the proposed

switch is working well under traffic bursts.

3.8 Summary

In this chapter, we have described our proposed ATM switch architecture design.

For each switch component, we have explained its responsibility and the issues it

attempts to address or solve in terms of the switch design, and we can see that:

There is no internal blocking with this switch design. The output contention is

inevitable and it is resolved by the relay ring and the parallel buffering strategy. The

only possible packet loss is due to buffer overflow under heavy traffic conditions

with multiple packets destined to the same output port. Chapters 4 and 5 will show

that the cell loss rate is very low

The parallel buffering strategy is adopted in this design, which can achieve good

performance. Chapters 4 and 5 will show that its performance equals the output

buffering strategy that has been proven to be optimal in [2] and [8]

It is also easy to implement the contention control with a different control algorithm

by using a combination of parallel buffers and relay rings

The multicast function is also implemented in the same design with a combination of

MGT, transmission bus and parallel buffers

40

There is no copy network or packet recirculation mechanism required and cell

duplication can be achieved specifically based on the address information in the

packet header

Simple switching architecture and control logic are the characteristics of the

proposed switch. Scalability and feasibility issues will be discussed in Chapter 8

There is no need to speedup internally as the packet passing through the switch fabric

is controlled by relay rings. Each relay works in a parallel fashion at the line speed

Each bus interface and each relay ring can be individually plugged or added to the

output bus. Due to this modular feature, it is very easy to maintain and repair and

implement backup bus interface and relay rings for fault tolerance

ATM, ABR and VBR traffic were discussed by programming each relay ring with

different control logics, such as the longest queue and the priority queue, to best

smooth the traffic bursts caused by ABR and VBR traffic.

41

4 Mathematical modelling and numerical results of the proposed

switch design To prove the proposed switch architecture, a mathematical model is required. As

it can be seen in Figure 19, the switch architecture design is very modular. Each bus

interface acts independently and is responsible for cell routing, cell duplication, cell

buffering and cell multicasting. Each relay ring is also independent and is responsible for

contention resolution of its dedicated output port. Due to this modular and independent

feature of the switch architecture design, its modelling can be simplified.

With a parallel buffering strategy in each bus interface, all buffers are

independent. As described in the previous chapter, each output is controlled by the

combination of the associated buffer in each bus interface and dedicated relay ring. If we

focus on a particular output, other outputs should work in the same way with the

exception of a different output address. Figure 27 illustrates the physical model of a

particular output. By modelling a particular output, switch performance can be proven

due to the modular and independent features of this design.

Figure 27 Physical model of a particular output

As we can see from Figure 27, this physical model is suitable in applying the

multi-access protocol theory to its mathematical model. There are a few proposed

mathematical models that can be applied to the multi-access protocol theory [66-74]. A

RELA

Y R

ING

E

1 2 3 N

Output port i

42

non-linear model is proposed by Tasaka, S [72] and by Woodward, M.E [10]. While it is

a very comprehensive model, it is difficult to solve the big, non-linear equations to

arrive at the equilibrium values that will be used to calculate performance measures.

Aura Ganz and Imrich Chlamtac have proposed a linear solution for the multi-

access protocol [67-69]. This is also a comprehensive model, using the restricted

occupancy URN model to calculate the number of ways of distributing n

indistinguishable balls among m distinguishable URNs under the K (occupancy)

restriction. With this restricted occupancy URN model, the exact number of buffers with

n packets can be obtained. And by solving N*L linear equations, the performance

measures can be derived (where N is the switch size and L is the buffer capacity).

The mathematical model that we are using in this chapter is proposed by

Efstachios D. Sykas, Dionysios E. Karvelas and Emmanuel N. Protonotarios [66]. This

is also a linear solution and perfectly matches our physical model. It has a better

algorithm for calculating state transition probability than the one in [67-69]. But it still

has N*(L+1) linear equations to solve, which need computer programming

implementation (where N is the switch size and L is the buffer capacity).

4.1 Mathematical modelling

Multi-access protocols are required when several users attempt to communicate

through a common channel. The analysis of multi-access protocols and the evaluation of

their performance is an extremely interesting and important theoretical problem. In

practice, a finite buffer capacity is used so that better throughput and lower cell loss

probabilities can be achieved at the expense of an increase in delay. Such schemes can be

modelled as discrete-time queuing systems consisting of N queues. Each has a buffer

capacity of L packets. The packet transmission among the N queues is managed

according to the rules of the deployed multi-access protocol for common communication

channels. In general, the behaviour of such a queuing system can be described as a N-

dimensional Markov chain with state vector b = (b1, b2, ---, bN ), where bi is the number of

packets in the queue at the ith queue, i = 1, 2, ---, N. For such a system, the possible

transition states are 2)]1*[( +LN , so it is difficult to calculate its probability. In the case

of a symmetric communication channel, b carries more information than is necessary to

43

compute performance characteristics such as throughput, delay, cell loss, etc. It is then

possible to describe the system transition state by vector n = (n1, n2, ---, nL), where ni is the

number of buffers that have i packets, i = 0, 1, ---, L. Obviously n0 = N - n1 – n2 - --- - nL

and the number of possible states are (N +L

L - 1). When L < N, this results in a considerable

reduction for the system transition state space and makes the state transition probability

easier to calculate. It becomes more important and crucial for computing the performance

measures of the system queuing model when considering the reduction of state space.

According to the analysis in the previous chapter, we focus our attention on a

particular output that is due to the modular and independent features of the designed

switch so that the problem can be simplified.

By following Efstachios D. Sykas, Dionysios E. Karvelas and Emmanuel N.

Protonotarios [66], the model can consist of N buffers b1, b2, ---, bN that are connected to a

common communication channel (relay ring), where each buffer has a capacity of L

packets. The switch is synchronised to work in a fixed time slot. Packets have constant

time slots and their transmission time equals the slot duration. Every buffer generates

packets with a fixed probability r per slot and works independently from other buffers.

Packets arriving at a buffer when it is full are lost and never recover. Packets queued in a

buffer are served in a first-in-first-out order. Busy buffers attempting transmission of the

packet at the head of queue follow the perfect scheduling protocol. This is the ideal multi-

access protocol where there is a successful transmission in every slot as long as there are

busy buffers. The perfect scheduling protocol guarantees there is at least one successful

packet transmission in every time slot provided, ensuring that at least one of N buffers is

not empty. This reflects and matches the working principle of the relay ring introduced in

the previous chapter. There may be zero, one, or multiple packet transmission in one time

slot. Colliding packets follow the round robin rule to obtain permission (Priority Token)

for transmission. The blocked packets must wait for the next time slot to try again and no

blocked packet will be lost. The communication channel (relay ring) is assumed to have

no operation delay. A packet is removed from its buffer as soon as it transmits

successfully. Figure 28 shows the discrete time queuing model for the ATM switch

design.

1-b

44

Figure 28 Discrete time queuing model of the ATM switch design

Where Bi represents the buffers that have i packets in the buffer; p represents the

transmission probability for one of those packets in the buffers to pass the relay ring in

each time slot (all packets are destined for the same output according to the work

principle of the relay ring); b represents the probability for a packet to arrive in a buffer

in a time slot; S represents the success of packet transmission; and F represents the failure

of packet transmission.

The assumptions for the rule that control the behaviour of the random multiple-

access scheme (relay ring) are now explained. First, the conditional probabilities s(n)

follow the perfect scheduling protocol. s(n) means that there is one successful

transmission given n busy buffers competing for packet transmission in the same time

slot. Second, the probability of the successful transmission from one of n busy buffers is

1/n. This matches the property of the relay ring in that the contribution of each buffer for

RELA

Y R

ING

B0

B1

Bj

Bj-1

B0

B1

Bj-1

Bj

p

b

b(1 - p)

p

p

b(1 - p)

b

1-b

b

b

1-b

b

S

F

S

S

S

F

F

45

the successful transmission is the same (fairness property). Third, the input traffic

distributed at each buffer follows the uniform distribution. Together the second and third

assumptions express the symmetry property of the common communication channel

(relay ring).

Now is the time to build the queuing model that has less system transition state

space. As described earlier, the state of a symmetric multiple-access protocol can be

expressed by the vector n = (n1, n2, ---, nL). In order to make the calculation of state

transition probabilities easier and solvable, the state space must be reduced as much as

possible. The following facts are taken into consideration: a) the number of busy buffers

n = n1 + n2 + --- + nL represent the throughput characteristics of our model appropriately, b)

the number of packets j queued in a buffer, for instance buffer b1, represent every other

buffer’s queuing behaviour. The first fact means that the probability of a successful

transmission depends only on the number of busy buffers. The second fact follows from

the symmetry property of the common communication channel (relay ring). As described

in [66], the complexity of the system can be significantly reduced if the system state is

expressed by the vector s = (n,j). The possible states are then reduced to (N+1)*(L+1)

from (N +L

L - 1). In addition, L+1 of those states are never reached. The states (0,1), (0,2), -

--, (0,L) and (N,0), allow the whole system state space to be further reduced. This is

because there is at least one busy buffer (n ≥ 1) if buffer b1 has at least one packet in its

queue (j ≥ 1). Similarly, buffer b1 has at least one packet in its queue if there are N busy

buffers.

A two-dimensional Markov chain is constructed with the vector s = (i, j), where i

is the number of busy buffers in the set B = {b2, b3, ---, bN} (buffer b1 is excluded). The

approach described here follows [66] and uses the number of busy buffers in the system

and the queue length at a particular buffer in order to derive an efficient, approximate

expression of the queuing model. The symmetry of the common communication channel

(relay ring) is used to estimate the probability that a busy buffer has exactly one packet in

its queue, which is needed to calculate the state transition probabilities. The reason that

the state vector s = (n , j) is modified to s = (i, j) is to derive a more generic expression

for calculating state transition probabilities. Figure 29 shows the state transition diagram

of the modelled system (a particular output of the ATM switch design).

46

Figure 29 State transition diagram of the modelled system

As illustrated in Figure 29, the system is in state (i,j) when there are j packets in

the queue at buffer b1 and i busy buffers among the buffers in the set B = {b2, b3, ---, bN}.

The number in parentheses indicates the equation that calculates the transition probability.

This state (i-1,j-1) is never reached and there is no need to calculate the transition

probability because only one packet is allowed to transmit in each time slot. Therefore if

the packet number in buffer b1 is sent to j-1 from j, the buffer b1 must have transmitted a

packet, the i busy buffers among the buffers in the set B = {b2, b3, ---, bN} shouldn’t be

decremental to i-1. On the other hand, if the number of busy buffers among the buffers in

set B = {b2, b3, ---, bN} is decremental to i-1, one of the buffers in set B must have

transmitted a packet. The buffer b1 shouldn’t be sent to j-1. For a more detailed

description as to how those expressions of the state transition probabilities are derived,

(10)

(7)

(6)

0,L N-1,L ---

i-1,j+1 i,j+1 k,j+1---

i-1,j i,j k,j ---

i-1,j-1 i,j-1 k,j-1 ---

0,0 N-1,0 ---

.

.

.

.

.

.

(9)

(8)

47

refer to [66]. Here, only the equations that are used to calculate the state transition

probabilities and performance measures are listed. The numerical analysis and numerical

result will be described in the following sections.

)( juin += (1)

where u(j) represents whether s1 is busy.

⎪⎩

⎪⎨

⎧=

01

)( ju 00

=≠

jj

(2)

Let s* denote the station that transmitted in case of a successful transmission and E- be

the set of the idle stations in the system, station s1 excluded and S- = { s2, s3, ⋅ ⋅ ⋅ , sN}

(station s1 excluded).

P[unsuccessful transmission | s = (i. j)] = 1 - s(n) (3) )()( nuns = (perfect scheduling protocol) (4)

Where s(n) is the probability of a successful transmission given there are n busy users in

total. The perfect scheduling protocol is defined when there is a successful transmission

in every slot, as long as there are busy buffers.

P[successful transmission, s* = s1 | s = (i, j )] = u (j) - s (n) / n = u (j) * e(n) (5) where

nnsne /)()( =

P[successfuI transmission, s* ∈ S- | s= ( i , j )] = i * s(n) / n = i * e(n) (6) P [k packet arrivals at the stations in E- | s = ( i , j )] = b ( N - i - 1 , k, r ) (7)

where

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛=

mM

rmMb ),,( ⋅⋅ mr mMr −− )1(

P[s1 receives a new packet) | s=(i, j )] = r * u ( L - j ) (8) P[s1 does not receive a new packet | s= ( i , j )] = 1 - r * u(L-j) (9) P[s* has exactly one packet in a queue | s* ∈ S- , s = ( i , j )] = x( n) (10)

48

where x(n) = P[sl has exactly one packet in queue sl is busy, there are n busy

buffers]

As the state vector s does not allow the exact calculation of the probability in (10), based

on the symmetry property described above, the estimation is used as the conditional

probability that a typical busy station, s1, has exactly one packet in queue, and it is

assumed that it depends only on the number n of busy buffers in the system. It should be

clear that (10) is the critical assumption in order to analyse our system, that is, to

establish the Markovian property. The steady state probabilities are given below.

),( jis = , ),1( jis −=′ , i=1, 2, ---, N-1, j=0, 1, ---, L

pss, = P[successful transmission, ,s* ∈ S-] * P[s* has only one packet in a queue

and does not receive a new one] * P[0 new packets received in E-] * P[sl does not

receive a new packet]

),0,1()1()()( )()1( riNbrnxneip jLuLuss −−⋅−⋅⋅⋅= −+−′ (11)

),( jis = , )1,1( +−=′ jis , i=1, 2, ---, N-1, j=0, 1, ---, L-1

pss, = P[successful transmission, s* ∈ S-] * P[s* has only one packet in a queue

and does not receive a new one] * P[0 new packets received in E-] * P[sl receives

a new packet]

),0,1()1()()( )1( riNbrrnxneip Luss −−⋅−⋅⋅⋅⋅= −′ (12)

),( jis = , )1,( −=′ jks , i=0, 1, ---, N-1, j=1, 2, ---, L, k=i,i+1, ---, N-1

pss, = P[successful transmission, s* =s1] * P[sl does not receive a new packet] *

P[ k - i new packets received in E-]

),,1()1()( )( rikiNbrnep jLuss −−−⋅−⋅= −′ (13)

),( jis = , )1,( +=′ jks , i=0, 1, ---, N-1, j=0, 1, ---, L-1, k=i,i+1, ---, N-1

49

pss, = P[unsuccessful transmission] * P[sl receives a new packet] * P[ k - i new

packets received in E-] + P[successful transmission, s* ∈ S-] * P[s* has more than

one packet in a queue] * P[sl receives a new packet] * P[ k - i new packets

received in E-] + P[successful transmission, s* ∈ S-] * (P[s* has only one packet

in a queue and receives a new one] * P[k- i new packets received in E-] + P[s* has

only one packet in a queue and does not receive a new one] * P [k - i + 1 new

packets received in E-]) * P[sl receives a new packet]

[ )1(()()(1()()(1),,1(' −⋅⋅+−⋅⋅+−⋅−−−⋅= LurnxnxneinsrikiNbrpss

])))1()1/()1( 1)1( −−−⋅+−−−+ LurikkN (14)

),( jis = , ),( jks =′ , i=0, 1, ---, N-1, j=0, 1, ---, L, k=i,i+1, ---, N-1

pss, = P[unsuccessful transmission] * P[sl does not receive a new packet] * P [ k - i

new packets received in E-] + P[successful transmission, s* ∈ S-] * P[s* has more

than one packet in a queue] * P[sl does not receive a new packet] * P[k - i new

packets received in E-] + P[successful transmission, s* ∈ S-] * (P[s* has only one

packet in a queue and receives a new one] * P[k - i new packets received in E-] +

P[s* has only one packet in a queue and does not receive a new one] * P [k - i + 1

new packets received in E-]) * P[sl does not receive a new packet] + P[successful

transmission, s* =s1] * P[sl receives a new packet] * P[k- i new packets received

in E-]

[ )1(()()(1()()(1),,1())(1(' −⋅⋅+−⋅⋅+⋅−⋅−−−⋅−⋅−= LurnxnxneinsrikiNbjLurpss

]))(1/()()()()))1()1/()1( 1)1( jLurrnejujLurikkN Lu −⋅−⋅⋅⋅−+−⋅+−−−+ −− (15)

∑′

′′ ⋅=s

ssss pww (16)

∑ =s

sw 1 (17)

In equations (16) and (17), ws represents the steady-state probability that the system is in

state s = (i,j), where i=0, 1, ---, N-1, j=0, 1, ---, L.

50

⎟⎟⎠

⎞⎜⎜⎝

⎛= ∑

=−−

L

jjnn wwnx

1),1()1,1( /)( , n = 1, 2, ---, N (18)

x(n) = 1 n = 1, 2, ---, N when L = 1 (19)

The main performance measures such as throughput, delay, average queue length and cell

loss rate are given as follows.

The throughput S of the system is defined as the average number of successfully

transmitted packets per time slot.

Throughput ∑∑−

= =

⋅=1

0 0),( )(

N

i

L

jji nswS (20)

The average queue length Q is the mean number of packets in the buffer.

Average queue length ∑∑−

= =

⋅=1

0 1),(

N

i

L

jjiwjQ (21)

As we are applying for parallel buffering strategy, S1 represents the throughput of buffer

b1.

Throughput of buffer b1 ∑∑−

= =

⋅=1

0 1),(1 )(

N

i

L

jji newS (22)

Delays in mean waiting time is defined as the average delay in a time slot for packets to

be transmitted successfully.

Delay 1/ SQD = (23)

Cell loss rate is defined as packet reject probability when the packet arrives and its buffer

is full.

Cell loss rate ∑−

=

=1

0),(

N

iLiLoss wP (24)

4.2 Numerical results

The main problem in obtaining numerical results is in solving equations (16), (17)

and (18). As the number of buffers N (the size of the switch) becomes larger, the linear

equations (16) and (17) and the non-linear equation (18) become larger and more difficult

to solve. Computer aid solution is certainly needed for larger N. The following iterative

51

algorithm is used in the computer program to calculate steady-state probability and

performance measures:

a) initialise the x(n), set x(n) = 1 if L = 1, otherwise set x(n) = 0.5 according to [66]

that assigns an initial arbitrary value 0 < x(n) < 1

b) calculate the coefficients of ws

c) solve the linear equations (16) and (17)

d) calculate the throughput S and use it as the convergence criteria

e) recalculate x(n) based on the ws from step c)

f) repeat steps b) to e) until the convergence criteria is reached.

4.2.1 Computer aid program

This program is definitely needed for larger N as it is almost impossible to solve the

N linear equations by hand or by calculator. By examining equations (1) to (24),

obviously simple equations such as (1) to (10) can be easily implemented in their relevant

functions in MATLAB. MATLAB is used because it can be installed and run on a PC.

The main advantage in using MATLAB is that it provides a good implementation of

Matrix operations, which makes it easier to solve linear equations. It also has excellent

Binomial coefficients and programming capabilities.

As mentioned before, the main problem is to find a better algorithm to solve the large

linear equations (16) and (17). The Gauss Elimination algorithm is a traditional and well-

known algorithm to solve large scale linear equations. Gauss elimination has the goal of

producing a solution x to the system of linear equations A*x=b, where A is the matrix of

order N, and b is a vector of length N. For a detailed description of the Gauss Elimination

algorithm, see the next section and [105].

The variable index in equation (16) also makes the calculation of the steady-state

probability ws very tricky. The right ssp ′ needs to be found in order to calculate the right

ws. The method used in our program is to calculate all the state transition probabilities

ssp ′ and then store them in their respective two-dimension arrays, with the first

dimension (row) varied according to equations (11) to (15), and the second dimension

(column) fixed at 5. With each array calculated and filled from its related equation, the

first and second columns store the state value s, the third and fourth columns store the

52

state value s′ , the fifth column stores the state transition probability from state value s to

state value s′ . The steady-state probability ws is stored in the N*(L+1) array, which is the

state number required in this queuing model as described in this chapter. Another

N*(L+1) x N*(L+1) one dimension array is defined to store the elements calculated from

equations (11) to (15). The reason why this array is defined as one dimension is due to

the Gauss Elimination algorithm implemented in [105]. The program flow chart is shown

in Figure 30. The detailed program implementation is shown in Appendix A.

Start

Initialise Xn

Iterations < 10

Initialize state transition probabilities ssw ′

Yes

From previous iteration ***

Calculates and stores state transition probabilities ssw ′ as per equation (6) to (10)

Fill in the coefficient array of sw ′ in linear equation (11)

53

Calculates the performance Measures **

Initialize the vector sw and B in linear equation (11)

Solve the linear equation (11) using Gauss Elimination Algorithm

End

Calculates the Throughput S

Iteration counter == 10Yes

Calculates the difference between current S and its previous iteration value

No

Difference of S == 0Yes

Calculates vector Xn in equation (13) and store the current S value for comparison in next iteration

No

54

Figure 30 Flow diagram for a computer aid program to calculate

state transition probabilities and performance measures

Where *** indicates that the program flow goes to the other box containing ***.

4.2.2 Gauss elimination algorithm with partial pivoting

4.2.2.1 System of linear equations

A system of linear equations is shown as follow. Each equation is a linear

equation:

a11x1 + a12x2 + ... + a1nxn = b1

a21x1 + a22x2 + ... + a2nxn = b2

a31x1 + a32x2 + ... + a3nxn = b3

.

.

.

an1x1 + an2x2 + ... + annxn = bn

The above system can also be written in matrix form:

AX = B

where A is a n by n matrix with elements

a11, a12, ... a1n

a21, a22, ... a2n

a31, a32, ... a3n

.

.

.

an1, an2, ... ann

and

X and B are n-vectors with components (x1, x2, ... xn) and (b1, b2, ... bn).

Go to next iteration and decrement the iteration counter by one ***

55

4.2.2.2 Gaussian elimination method

Let the system of equations be:

a11x1 + a12x2 + ... + a1nxn = b1

a21x1 + a22x2 + ... + a2nxn = b2

a31x1 + a32x2 + ... + a3nxn = b3

.

.

.

ai1x1 + ai2x2 + ... + ainxn = bi

.

.

.

an1x1 + an2x2 + ... + annxn = bn

Phase 1

Step 1: Eliminate x1 from 2nd, 3rd, 4th, ... nth equations. In other words, turn the

coefficient of x1 to zero in 2nd, 3rd, 4th, ... nth equations. This can be achieved by

subtracting appropriate multiples of the first equation from 2nd, 3rd, 4th, ... nth equations.

That is,

aij <- aij - {ai1/a11}*a1j for 1 <= j <= n and

bi <- bi - {ai1/a11}*b1 2 <= i <= n

After these operations the system will become,

a11x1 + a12x2 + ... + a1nxn = b1

0 + a22x2 + ... + a2nxn = b2

0 + a32x2 + ... + a3nxn = b3

.

.

.

0 + ai2x2 + ... + ainxn = bi

.

.

.

56

0 + an2x2 + ... + annxn = bn

Note that the coefficients in the 2nd, 3rd, 4th, ... nth equations are different from the

original coefficients, even though we used the same notation.

Step 2: Now eliminate the coefficients of x2 from the 3rd, 4th, ... nth equations without

changing the first equation and the zero coefficients. This could be accomplished by the

following operations:

aij <- aij - {ai2/a22}*a2j for 2 <= j <= n and

bi <- bi - {ai2/a22}*b2 3 <= j <= n

Using these types of operation, if we eliminate the variables x1, x2, x3... xn, the system will

look like,

a11x1 + a12x2 + ... + a1nxn = b1

0 + a22x2 + ... + a2nxn = b2

0 + 0 + ... + a3nxn = b3

.

.

.

0 + 0 + ... + ainxn = bi

.

.

.

0 + 0 + ... + annxn = bn

where the Matrix of the system is a lower triangular one.

Phase 2

We need to solve for the variables xn, xn-1, xn-2 ... x1. At the beginning of the phase the

system would take the following form:

a11x1 + a12x2 + a13x3+ ... + a1nxn = b1

a22x2 + a23x3+... + a2nxn = b2

a33x3+... + a3nxn = b3

.

.

.

57

an-1,n-1xn-1 + an-1,nxn = bn-1.

annxn = bn

The backward substitution starts from the last equation with solving for xn

xn = bn/ann

Substituting the value for xn in the last but one equation and solving for xn-1 we get

xn-1 = {bn-1 - an-1,nxn}/an-1,n-1

By a similar process we can get the value of xi as follows:

xi = {bi - Sum[aijxj]}/aii

where the summation is over j varying from i+1 to n

The above equations give xi for i = n-1, n-2, ...1

If the arithmetic is exact, and the matrix A is not singular, then the answer

computed in this manner will be exact (provided no zeros appear on the diagonal).

However, as computer arithmetic is not exact, there will be some truncation and rounding

error in the answer. The cumulative effect of this error may be very significant if the loss

of precision occurs at an early stage in the computation. In particular, if a numerically

small number appears on the diagonal of the row, then its use in the elimination of

subsequent rows may lead to differences being computed between very large and very

small values with a consequential loss of precision. Unless this problem is addressed it

will not be possible to proceed with the computation.

One of the ways around this problem is to ensure that small values (especially

zeros) do not appear on the diagonal and, if they do, to remove them by rearranging the

matrix and vectors. In partial or column pivoting, we rearranged the rows of matrix and

the right side to bring the numerically largest value in the column onto the diagonal.

Note that the variables remain in the same order which simplifies the

implementation of this procedure. The right side vector, however, has been rearranged.

Partial pivoting may be implemented for every step of the solution process, or only when

the diagonal values are sufficiently small as to potentially cause a problem. Pivoting for

every step will lead to smaller errors being introduced through numerical inaccuracies,

but the continual reordering will in effect make the calculation process slower.

There are a several versions for the implementation of the Gauss Elimination

algorithm. For other versions, please refer to [105] and [106].

58

4.2.3 Numerical results analysis

The switch performance is evaluated with performance measures such as throughput,

average packet delay, average queue length and packet loss probability.

• Throughput indicates the average number of successfully transmitted packets

per time slot

• Average packet delay shows the average time slots required for a packet in a

buffer to be transmitted successfully

• Average queue length indicates the average number of packets in a buffer in a

time slot

• Packet loss probability shows the probability that a buffer is full when a new

packet arrives.

In this chapter, we choose a range of buffer sizes and traffic loads to evaluate

throughput, average packet delay, average queue length and packet loss probability

behaviour. We are particularly interested to see whether the throughput and packet loss

probability can be improved by increasing buffer size, whether the queue length and

packet delay grow with buffer size, and the load at which the switch starts to saturate.

Note that the Traffic Load in [66] is defined in terms of each individual buffer

generating packets with a fixed probability r per time slot that operate independently

from the other parallel buffers in the same bus interface. The numerical results are

calculated with the switch size N = 8, so the actual traffic load for the given bus interface

should be N x r. The following graphs show a buffer load ranging from 0 to 0.125,

corresponding to a full bus interface load of 0 to 100 per cent. That is why the switch

performance measures start to saturate at a traffic load of 0.125 as shown in this chapter.

The numerical results derived from the previously described computer aid program

are shown and analysed in this section. The figures are drawn using a version of windows

with GNU plot software. For detailed drawing steps for the figures shown in this section,

please refer to GNU plot software on-line help.

Note that all the figures in this section are drawn with the switch size fixed at 8.

Certainly the numerical results for the larger system can be easily derived using the

computer program described in the previous section, and by just changing the definition

for the switch size in the program.

59

4.2.3.1 Introduction to figures 31 to 40

The following figures show the modelled performance of the proposed switch for

buffer loads from 0 to 0.125 (representing the full load). This approach is consistent with

the approach in section 5.1, so that direct comparisons can be made.

Figures 31 to 34 show the system throughput versus the offered traffic load when

the buffer length L is 3, 10 and 30. Figure 34 shows all three curves together. The

throughput is almost equal to the switch traffic load when the throughput is almost 1.0

and the switch then begins to saturate. The buffer size has little effect on throughput, and

the proposed switch architecture can achieve high performance with a very small buffer

size.

The average queue length versus the offered traffic load is shown in Figure 35 for

L=3, 10 and 30 together. The queue length is very small (no more than two cells) up to,

and including a full load.

Figure 36 shows the average packet delay versus the offered traffic load for L=3,

10 and 30 together. Its characteristics are similar to the average queue length. Buffer size

is very small at first but increases sharply as the switch approaches saturation. Increasing

buffer size will introduce more delays.

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S

Offered Traffic Load r

L=3

Figure 31 Throughput versus offered traffic load between 0 and 0.13 with L=3

60

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


L=10


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


L=30


61

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


Blue line with squares --- L=30

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


Green line with crosses --- L=10

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


Red line --- L=3

Figure 34 Throughput versus offered traffic load between 0 and 0.13 with L=3,

10 and 30

0 0.2 0.4 0.6 0.8

1 1.2 1.4 1.6 1.8

2 2.2 2.4 2.6 2.8

3

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


Red line --- L=3

0 0.2 0.4 0.6 0.8

1 1.2 1.4 1.6 1.8

2 2.2 2.4 2.6 2.8

3

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q



0 0.2 0.4 0.6 0.8

1 1.2 1.4 1.6 1.8

2 2.2 2.4 2.6 2.8

3

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q



Figure 35 Mean queue length versus offered traffic load between 0 and 0.13 with L=3, 10 and 30

62

0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


Red line --- L=3

0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D



0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D



Figure 36 Average packet delay versus offered traffic load between 0 and 0.13

with L=3, 10 and 30

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


L=3

Figure 37 Cell loss probability versus offered traffic load between 0 and 0.13 with L=3

63

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 0.13

Cel

l Los

s P

roba

bilit

y P

r


L=10


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 0.13

Cel

l Los

s P

roba

bilit

y P

r


L=30


64

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


Red line --- L=3

Green line --- L=10

Blue line --- L=30

Figure 40 Cell loss probability versus offered traffic load between 0 and 0.13 with L=3,10 and 30

Figures 37 to 39 show the cell loss probability versus the offered traffic load for a

different buffer size L. Cell loss probability is drawn with a logarithm scale. Figure 40

shows all three curves together. The cell loss is very small at light loads, meaning that

there is no loss within the numerical accuracy of the computation. As the load increases,

a turning point is reached, where there is measurable loss. As the buffer size is increased

the turning point occurs at a higher load and the maximum cell loss probability decreases.

This means that increasing the buffer size will improve the cell loss significantly.

4.3 Multicast traffic model

The burst traffic model is usually used to model multicast traffic based on a two-

state Markov chain [113], consisting of an ON/OFF state. In the OFF state no traffic is

generated whereas in the ON state, traffic is generated at line speed. While in the ON

state, the packet stream is divided into consecutive bursts; all packets in one burst have

the same destination.

65

Simulation will be used to calculate performance for the proposed switch, the

proposed switch with priority queue control and the output buffering switch using the

same burst traffic model. This will be introduced in Chapter 7.

There are six important parameters in the simulation.

1) The binomial traffic is generated independently from multicast traffic. It is the normal

traffic

2) Multicast cycle is the simulation length when the traffic generator generates the

multicast packet cyclically, and multicast traffic is distributed on a particular port

3) Burst length is the simulation length of the burst traffic.

4) Mixed traffic load = normal traffic + multicast traffic

5) Multicast traffic = 0, when simulation ≥ multicast cycle

6) Multicast traffic = 1, when simulation ≤ multicast cycle + burst length.

4.4 Summary

In this chapter we have analysed the ATM switch design and proposed a queuing

model based on a multiple-access protocol model well suited to the modular features of

the switch. There are many research papers about queuing models for multiple-access

protocols in which the advantages and drawbacks are introduced and analysed. After

careful consideration and evaluation, the queuing model proposed by Sykas, Karvelas

and Protonotarios [66] is adopted. A detailed description and analysis of this queuing

model is given. The equations used to calculate the state transition probabilities and

performance measures are listed.

In order to solve the large linear equations that are required to calculate the state

transition probabilities, a computer aided program is introduced. Gauss Elimination with

partial pivoting is used to solve the linear equations. The numerical results are given in

the figures and further analysed. Due to the nature of the traffic load definition in [66],

the proposed switch saturates at a buffer traffic load of 0.125 (corresponding to full load

for a switch size of N=8). The numerical result analysis shows that the proposed ATM

switch can achieve high throughput, low packet loss probability, short average queue

length and short average packet delay with a very small buffer size. The performance

comparison with other switch designs will be examined in the next chapter.

66

5 Simulation design and results comparison with proposed switch,

input queuing switch and output queuing switch

Simulation is used here as a consistency check with the numerical results derived

in the previous chapter, and to compare the performance of the proposed switch with that

of input and output buffering switches.

In this chapter, all the simulation programs are written in C language and

compiled using GNU gcc compiler, as the GNU C++ library provides a good random

number generator for binomial distribution and uniform distribution, and it can also run

on a PC. The simulation length is set to 109 cell time, being the largest value that gives

reasonable computation time. The cell time is represented as discrete steps of fixed size

(the cell service time). Simulation programs for the proposed switch design, and input

and output buffering strategies are introduced, and the simulation results are compared

and analysed in the following sections respectively. The GNU plot windows version is

used to draw the performance results.

5.1 Simulation for proposed switch design

In this program, the traffic generator follows binomial distribution that can be

treated as a combination of each independent Bernoulli distribution. By looking at the

definition of the offered traffic load described in the previous two chapters, this traffic

generator will perfectly match the definition.

As described before, the parallel buffering strategy is deployed in the proposed

switch design (see Figure 20). We focus our attention on a particular output port. N

parallel buffers request packet transmission from the relay ring. The request

acknowledgement is decided by the priority token scheme in the relay ring. We assume

that the packets generated by the traffic generator are distributed to buffers according to

their uniform distribution. The priority token scheme is implemented to control the

packet release among N parallel buffers and a FIFO queue is also implemented.

The performance measures in the program are defined as follows:

• throughput is the average packet successfully transmitted per time slot

67

• mean queue length is the average number of packets in a buffer per time slot

• packet loss probability is the average packet rejection when the packet is

generated, but its buffer is full.

These follow the same definitions given in [66]. Here, we are going to use the

program flow chart to concisely present the simulation design and program

implementation. The simulation results will be shown and analysed below.

Start

Initialise buffers, buffer pointers and Performance measures counters

Traffic generation with binomial distribution

Distribute generated packets to their associated buffer with uniform distribution

Apply Relay Ring with Priority Token control scheme

Transmit packet and Apply FIFO queuing scheme

68

Figure 41 Simulation program flow chart for proposed switch design

Is simulation finished?

Collect performance data

No

Compute the performance measures

End

Yes

Start

Generate input buffer number with uniform distribution

Make sure that the newly generated input queue number is unique in this time slot

There is only one packet arrival for each buffer in each time slot if multiple packets are generated.

69

Figure 42 Program flow chart for distributing generated packets to its associated buffer with uniform distribution

No

Is the buffer associated with the newly generated buffer number full?

Increment the packet loss counter for this buffer

Yes

Set the buffer and increment its buffer pointer

No

Has all newly generated packets been put into their associated buffer?

End

Yes

Start

Check if all the buffers are empty from the buffer pointed by current token position

Yes

No

70

Figure 43 Program flow chart for applying the relay ring to the priority token control scheme

Refer to Appendix B for a detailed program implementation for collecting

performance data blocks and computing performance measures blocks. These program

blocks are strictly implemented according to the definitions given in the previous

paragraph in this section. The simulation results plotted in the chart are shown and

analysed and follow the program flow chart.

Increment the throughput counter for calculating the throughput at last

Increment packet counter for successful transmission for the buffer pointed by current token position

Apply FIFO queuing scheme

Increment token position pointer

End

Start

Check if there is more than one packet in the buffer pointed by current token position

No

71

Figure 44 Program flow chart for applying the FIFO queuing scheme

To allow a proper comparison, all the simulation results are plotted against the

numerical results shown in Chapter 4 for the mathematical model analysis. Ideally, we

should expect perfect matches for all performance measures: throughput, average queue

length and packet loss probability. Figures 45 to 53 show quite a close match for all the

performance measures using both methods (mathematical model and simulation). The

throughput measures shown in Figures 45 to 47 for both methods are almost identical. As

we can see in Figures 48 to 50, the average queue length for both methods match closely

until the load reaches 0.12, which is equivalent to 0.96 of the full switch load. There is a

difference between theory and simulation for a traffic load greater than 0.12, because the

switch starts to saturate after the load reaches 0.12 and the simulation queue length then

starts to increase sharply. The simulation values are taken to be more accurate. The

agreement is very close for the range of operating loads of most importance to us. The

cell loss probability measures are shown in Figures 51 to 53. They agree fairly loosely,

taking into account the logarithmic scale.

Set the first element in the buffer to zero and set the buffer pointer to –1 to indicate that the buffer is empty

Shifts buffer element forward one by one to remove the first element and decrement the buffer pointer by one

Yes

End

72

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


L=3

Red line --- simulation

Green line with crosses --- theory

Figure 45 Throughput versus offered traffic load between 0 and 0.13 for the proposed switch design with L=3

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


L=10



Figure 46 Throughput versus offered traffic load between 0 and 0.13

for the proposed switch design with L=10

73

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


L=30



Figure 47 Throughput versus offered traffic load between 0 and 0.13


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


L=3



Figure 48 Mean queue length versus offered traffic load between 0 and 0.13


74

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


L=10



Figure 49 Mean queue length versus offered traffic load between 0 and 0.13 for the proposed switch design with L=10

0

1.5

3

4.5

6

7.5

9

10.5

12

13.5

15

16.5

18

19.5

21

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


0

1.5

3

4.5

6

7.5

9

10.5

12

13.5

15

16.5

18

19.5

21

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q




L=30

Figure 50 Mean queue length versus offered traffic load between 0 and 0.13 for the proposed switch design with L=30

75

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Cel

l Los

s P

roba

bilit

y P

r


L=3

Red line --- theory

Green line --- simulation

Figure 51 Packet loss probability versus offered traffic load between 0 and 0.13


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 0.13

Cel

l Los

s P

roba

bilit

y P

r


L=10

Red line --- theory




76

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 0.13

Cel

l Los

s P

roba

bilit

y P

r


L=30

Red line --- theory




5.2 Simulation for input buffering switch

In this program, the traffic generator follows the binomial distribution that can be

treated as a combination of each independent Bernoulli distribution. By looking at the

definition of the offered traffic load described in [8] and [77], this traffic generator

matches the definition.

We assume that the packets generated by the traffic generator are distributed to

the buffers according to a uniform distribution and also follow the uniform distribution to

be addressed to output ports. The random selection policy for contention control is

implemented in the program when multiple packets compete for the same output port,

and a FIFO queue is also implemented.


• throughput is the average packet transmitted successfully per time slot


77

• packet loss probability is the average packet rejection when packet is generated,

but its buffer is full. HOL blocking is taken into consideration when calculating

switch throughput.

Here, we are going to use the program flow chart to concisely present the

simulation design and program implementation. The simulation results will be

shown and analysed below.

To next page

Start

Initialise buffers, buffer pointers, contention controllers and Performance measures counters



Assign output port number to generated packets by uniform distribution

Apply contention control scheme

78

Figure 54 Simulation program flow chart for the switch with input queuing strategy

When assigning the output port number to newly-generated packets, an array is

defined in order to store the output port numbers. There is no duplication check for the

generated output port number because multiple packets can be addressed to the same

output port. For more detailed implementation see Appendix C.

Refer to Figure 42 in the previous section for distributing generated packets to

their associated buffer with uniform distribution. The only difference is that “1” is set in

Transmit packet with random selection policy if more than one packet for the same output port and Apply FIFO queuing scheme

Count HOL blocking for calculating throughput at last


End


No Is simulation finished?

79

the associated buffer in the simulation of proposed switch design if the buffer full

checking is No, where here the output port number is set in the associated buffer in the

simulation of input queuing strategy instead.

Please refer to Figure 44 for the FIFO queuing scheme and Appendix C for more

detailed program implementation for collecting performance data blocks and computing

the performance measures block. These program blocks are strictly implemented

according to the definitions given in this section. The simulation results plotted in the

chart are shown and analysed below.

Figure 55 Program flow chart for applying a contention control scheme

Start

Check if the current input buffer is empty

End

Yes

Save current input buffer pointer to the contention controller for the output port pointed by the first element in current input buffer and increment the packet counter for the output port

No

Have all input buffers been checked?

End

Yes

No

80

Figure 56 Program flow chart for counting HOL blocking

used to calculate the throughput

Start

Is output port idle? No

Yes

Increment throughput counter by one

Has all output port been checked?

End

Yes

No

Start

Has more than one packet been addressed to the current output?

No

Yes

81

Figure 57 Program flow chart for transmitting packets with random selection

policy if more than one packet is transmitted to the same output port

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Thro

ughp

ut S


Red line --- input buffering switch with L=24

Green line --- proposed switch with NL=24

Figure 58 Throughput versus offered traffic load for input queuing switch versus the proposed switch with L=24

Has all output port been checked?

End

Yes

No

Increment the packet counter for successful transmission for the associated input buffer pointed by contention controller and apply FIFO queuing scheme

Select one randomly with uniform distribution, increment the packet counter for successful transmission for the associated input buffer pointed by contention controller and apply FIFO queuing scheme

82

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Thro

ughp

ut S





0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Thro

ughp

ut S





83

0 2 4 6 8

10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Del

ay D


0 2 4 6 8

10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Del

ay D




Figure 61 Delay versus offered traffic load for input queuing switch

versus the proposed switch with L=24

0 6.5 13

19.5 26

32.5 39

45.5 52

58.5 65

71.5 78

84.5 91

97.5 104

110.5 117

123.5 130

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Del

ay D


0 6.5 13

19.5 26

32.5 39

45.5 52

58.5 65

71.5 78

84.5 91

97.5 104

110.5 117

123.5 130

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Del

ay D





Versus the proposed switch with L=80

84

0 18 36 54 72 90

108 126 144 162 180 198 216 234 252 270 288 306 324 342 360 378

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Del

ay D


0 18 36 54 72 90

108 126 144 162 180 198 216 234 252 270 288 306 324 342 360 378

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Del

ay D





versus the proposed switch with L=240

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Cel

l Los

s P

roba

bilit

y P

r




1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Cel

l Los

s P

roba

bilit

y P

r




Figure 64 Cell loss probability versus offered traffic load for input queuing

switch versus the proposed switch with L=24

85

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Cel

l Los

s P

roba

bilit

y P

r




Figure 65 Cell loss probability versus offered traffic load for input

queuing switch versus the proposed switch with L=80

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Cel

l Los

s P

roba

bilit

y P

r




Figure 66 Cell loss probability versus offered traffic load for input queuing

switch versus the proposed switch with L=240

Figures 58 to 66 are drawn against the offered traffic load 1, which is the agreed

traffic load definition for the input buffering model. We should expect the proposed

86

switch to demonstrate much better performance than input buffering switch. By

comparing the simulation results derived in this section with the results for traffic load 0

to 0.13 derived for the proposed switch design in the previous section, the proposed

switch surely has a much better performance. The throughput for the proposed switch can

reach 0.9999 (see Figures 45 to 47), where the maximum value for input buffering is

about 0.62 (see Figures 58 to 60). There is much less delay for the proposed switch. By

examining Figures 61 to 63, the turning point for the proposed switch is at about a traffic

load of 0.96. After the turning point, the delay starts to increase. But the turning point for

input buffering is at about a traffic load of 0.55. The packet loss probability for the

proposed switch is also much better than the input buffering switch.

As shown in Figures 64 to 66, the packet loss probability is minimal before the

turning point. After that point, the curves start to rise quickly. There is not much change

for the turning point of the input buffering switch along with the buffer size increment

from Figures 64 to 66, where the turning point of the proposed switch markedly shifts to

the right. This means that there is not much improvement for cell loss probability for the

input buffering switch when the buffer size reaches a certain value such as L=24. But the

cell loss probability for the proposed switch greatly improves when the buffer size is

incremented. The turning point for input buffering is smaller than the turning point for

the proposed switch. This means that the input buffering switch starts to lose packets

even when the traffic load is small.

There is an exception in Figure 64, in which the turning point for the proposed

switch is smaller than the input buffering switch. This is because the proposed switch has

L=3 for each buffer, whereas the input buffering switch has L=24. After the turning point

(about load r = 0.55) for the input buffering switch, the proposed switch still performs a

lot better than the input buffering switch. Thus, the performance of the proposed switch is

superior to the input buffering switch.

5.3 Simulation for output buffering switch

In this simulation program, the traffic generator follows the binomial distribution

that can be treated as a combination of each independent Bernoulli distribution. By

looking at the definition of the offered traffic load described in [8] and [77], this traffic

87

generator matches the definition. But the difference lies in the success probability. The

success probability in the definition of the mathematical model is p/N, but we use p for

the simulation. The main reason in doing so is to make the simulation condition as close

as possible to the simulation condition for the proposed switch design described in the

previous section.

We assume that the packets generated by the traffic generator are distributed to

the output buffer according to uniform distribution. Packets in the buffer are transmitted

by a first come first serve policy and a FIFO queue is also implemented.


• throughput is the average packet transmitted successfully per time slot


• packet loss probability is the average packet rejection when the packet is

generated, but its buffer is full.

Here, the program flow chart is used to concisely present the simulation design

and the program implementation. The simulation results will be shown and analysed

below.

Start



Distribute generated packets to the tagged output buffer with uniform distribution

88

Figure 67 Simulation program flow chart for the switch with output buffering strategy

Refer to Figure 42 in section 5.1 for distributing generated packets to the tagged

output buffer with uniform distribution. One difference is that only a tagged output buffer

is considered in this simulation, whereas all buffers are counted in the simulation for the

proposed switch design and the input buffering switch. The other difference is that one is

set in the associated buffer in the simulation of the proposed switch design if the buffer is

not full, but the input port number is set in the tagged output buffer in this simulation

instead.

Refer to Figure 44 for the FIFO queuing scheme and Appendix D for a more

detailed program implementation for collecting performance data blocks and computing

the performance measures block. These program blocks are strictly implemented

according to the definitions given in this section. The simulation results plotted in the

chart are shown and analysed below.

We should expect the proposed switch to perform as well as the output buffering

switch. By examining the simulation results for the proposed switch and output buffering

Transmit packet by First Come First Serve policy and Apply FIFO queuing scheme


Is simulation finished? No

End


Yes

89

switch as shown in Figures 68 to 76, the results for both simulations are very close. In

terms of throughput, the proposed switch performs as well as the output buffering switch.

The throughput characteristic for both switches is almost identical when the switch buffer

is incremented. The proposed switch is far superior in terms of delay. As we can see from

Figures 71 to 73, along with the buffer size increases, the turning point of the curves is

moving close to the maximum traffic load 1 or 0.125 for simulation. But the turning point

for the proposed switch is always better than that of the output buffering switch. This

means that the proposed switch always has smaller delay than the output buffering switch.

For cell loss probability shown in Figures 74 to 76, the output buffering switch is the

better performer. This is because the output buffer is one single buffer with size L,

whereas the proposed switch has N smaller buffers with size L/N.

In summary:

• the throughput for the proposed switch is as good as the output queuing switch

• the proposed switch has better delay characteristics than the output queuing switch

because it has smaller buffers to take care of one output port

• the output buffering switch has better cell loss probability than the proposed switch

because it has a single buffer, whereas the proposed switch has N smaller buffers.

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


Red line --- output buffering switch with L=24


Figure 68 Throughput versus offered traffic load between 0 and 0.13 for the

output buffering switch versus the proposed switch with L=24

90

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S




0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S






0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S





output buffering switch with L=240

91

0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D




Figure 71 Delay versus offered traffic load between 0 and 0.13 for the


0 3 6 9

12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


0 3 6 9

12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D






92

0 10 20 30 40 50 60 70 80 90

100 110 120 130 140 150 160 170 180 190 200 210 220 230

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


0 10 20 30 40 50 60 70 80 90

100 110 120 130 140 150 160 170 180 190 200 210 220 230

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D






1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r




1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r





for the output buffering switch versus the proposed switch with L=24

93

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r




Figure 75 Packet loss probability versus offered traffic load between 0 and 0.13 for the output buffering switch versus the proposed switch with L=80

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r




Figure 76 Cell loss probability versus offered traffic load between 0 and 0.13

for the output buffering switch versus the proposed switch with L=240

94

5.4 Summary

In this chapter, we have presented simulation designs for the proposed switch

design, the input buffering switch and the output buffering switch. The simulation results

are shown and analysed. The performance conclusion is that the numerical results from

the mathematical model and the simulation results are well matched for the proposed

switch. The proposed switch design has superior performance compared with that of the

input buffering switch, and performs as well as the output buffering switch that was

proven to have optimal performance over switches with other buffering strategies.

95

6 Simulation design and results comparison under multicast traffic

with proposed switch, and output queuing switch

In this chapter, we will show how the proposed switch performs under multicast

traffic. As the output buffering switch is always considered to have the best performance,

we will compare switches under the same multicast traffic model. We should expect the

proposed switch to have the same or similar performance behaviour as the output

buffering switch. In addition, since the priority queue strategy will improve switch

performance, we will compare the proposed switch with priority queue strategy with the

output buffering switch under the same multicast traffic model. We should expect the

proposed switch with priority queue strategy to have the same performance behaviour as

the output buffering switch.

All the simulation programs in this chapter are written in C language and

compiled using the GNU gcc compiler. The simulation length is set to 109 cell time, this

being the largest value giving reasonable computation time. The cell time is represented

as discrete steps of fixed size (the cell service time). Simulation programs for the

proposed switch design and the output buffering strategy are introduced and the

simulation results are compared and analysed in the following sections. The GNU plot

windows version is used to draw the performance results.

6.1 Simulation for proposed switch design under multicast traffic

In this program, the normal traffic generator still follows the binomial distribution

as described in Chapters 4 and 5. In addition, another traffic generator that follows the

ON/OFF traffic model in section 4.3 will be used to model multicast traffic. The ON and

OFF period is a regular cycle.

The switching algorithm and performance measures are still the same, as

described in Chapters 4 and 5. The only difference between this simulation and the

simulation in section 5.1 is the ON/OFF traffic generator. The program flow chart is

shown in Figure 77, and it concisely presents the simulation design and program

implementation. The simulation results will be shown and analysed below.

96

Start




Apply Relay Ring with Priority Token control scheme

Generate multicast burst traffic

= > multicast cycle? Simulation length <= multicast cycle + burst length ?

Yes

No

97

Figure 77 Simulation program flow chart for proposed switch under multicast traffic

Figure 78 Throughput versus offered traffic load between 0 and 0.13 for

proposed switch with L=50 and burst traffic length=50

Transmit packet and Apply FIFO queuing scheme



No


End

Yes

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


L=50 and Burst Length=50

98

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S



Figure 79 Throughput versus offered traffic load between 0 and 0.13 for proposed switch with L=100 and burst traffic length=100

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S





99

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


Red line --- L=50 and Burst Length=50

Green line with crosses --- L=100 and Burst Length=100

Blue line with squares --- L=200 and Burst Length=200

Figure 81 Throughput versus offered traffic load between 0 and 0.13 for proposed switch with L=50,100,200 and burst traffic length=50,100,200

0 10 20 30 40 50 60 70 80 90

100 110 120 130 140 150 160 170 180 190

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


0 10 20 30 40 50 60 70 80 90

100 110 120 130 140 150 160 170 180 190

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


0 10 20 30 40 50 60 70 80 90

100 110 120 130 140 150 160 170 180 190

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q



Green line with crosses --- L=100 and Burst Length=100

Blue line with squares --- L=200 and Burst Length=200

Figure 82 Average queue length versus offered traffic load between 0 and 0.13

for proposed switch with L=50,100,200 and burst traffic

length=50,100,200

100

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r



Figure 83 Cell loss probability versus offered traffic load between 0 and 0.13 for


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r





101

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r





1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r



Green line --- L=100 and Burst Length=100

Blue line --- L=200 and Burst Length=200


proposed switch with L=50,100,200 and burst traffic length=50,100,

200

102

The simulation results for the proposed switch under the multicast traffic model

are shown in Figures 78 to 86. The throughput results (Figures 78 to 80 and combined in

Figure 81) show the ideal results up to saturation at almost full load. The proposed switch

still achieves high throughput under mixed traffic load. The average queue length (Figure

82) is quite low until the load reaches 0.12 that is equivalent to 0.96 for a full switch load.

The average queue length starts to increase sharply after a load of 0.12 as the switch is

approaching saturation. Cell loss probability is shown in Figures 83 to 85 and combined

in Figure 96. We can see that the rising point shifts to the right as the burst length (and

the buffer size) increases. Less than 10-9 cell loss occurs in these simulations before the

rising point. The cell loss probability increases, but is still less than 10-7 at traffic load

0.12. One conclusion from these tests is that the buffer size L must be equal to or greater

than the burst traffic length in order to obtain satisfactory cell loss probability. Another

conclusion is that if the buffer size is increased it will help shift the rising point, even

though the burst traffic length also increases. Therefore this will help to reduce the switch

cell loss probability even though the burst length increases.

6.2 Simulation for output buffering switch design under multicast traffic



ON/OFF traffic model in section 4.3 will be used to model multicast traffic. The ON/OFF

period is a regular cycle.



simulation in section 5.3 is the ON/OFF traffic generator. The program flow chart is

shown in Figure 87 and it concisely presents the simulation design and program

implementation. The simulation results will be shown and analysed below. Refer to

Appendix F for more detailed program implementation.

Start

Initialise buffers, buffer pointers and performance measures counters

103


Distribute generated packets to the tagged output buffer with uniform distribution

From last page

= > multicast cycle? Simulation length <= multicast cycle + burst length?

No

Transmit packet by First Come First Serve policy and Apply FIFO queuing scheme



Yes

Is simulation finished? No

Yes

104

Figure 87 Simulation program flow chart for output buffering switch under multicast traffic

The simulation results for the output buffering switch under the multicast traffic

model are shown in Figures 88 to 96 and the results are drawn against the proposed

switch under the same conditions. As we can see in Figures 88 to 90, there is no

difference for throughput measure between the output buffering switch and the proposed

switch. Both achieve optimal throughput under mixed traffic and also behave similarly

for delay measure. As shown in Figures 91 to 93, the curves for both switches are very

close to each other before the traffic load reaches 0.12 that is equivalent to 0.96 for a full

switch load. The delay starts to increase sharply after a load of 0.12 as the switch

becomes saturated. On close inspection, the proposed switch has a superior delay

performance to the output buffering switch. But with the increase in buffer size, the

output buffering switch performance improves. By examining Figures 94 to 96, we can

see that the rising point shifts to the right along with the buffer size increment. The cell

loss probability is less than 10-9 before the rising point and increases slowly after the

rising point. One conclusion for cell loss probability is that the buffer size L must be

equal to or greater than the burst traffic length to obtain improved cell loss probability.

Another conclusion is that by increasing the buffer size it will shift the rising point, even

though the burst traffic length will also increase. Therefore, it will improve switch cell

loss probability. Finally, the output buffering switch has superior cell loss probability

with the round robin queuing control algorithm. When the buffer size increases, the

difference in cell loss probability between switches decreases.


End

105

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


Burst Length=50



Figure 88 Throughput versus offered traffic load between 0 and 0.13 for output

buffering switch versus proposed switch with L=400 and burst traffic length=50

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S




Burst Length=100

Figure 89 Throughput versus offered traffic load between 0 and 0.13 for output buffering switch versus proposed Switch with L=800 and burst traffic length=100

106

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


Burst Length=200



Figure 90 Throughput versus offered traffic load between 0 and 0.13 for output


0 20 40 60 80

100 120 140 160 180 200 220 240 260 280 300 320 340 360 380

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


0 20 40 60 80

100 120 140 160 180 200 220 240 260 280 300 320 340 360 380

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


Burst Length=50



Figure 91 Delay versus offered traffic load between 0 and 0.13 for output buffering switch versus proposed switch with L=400 and burst traffic length=50

107

0 40 80

120 160 200 240 280 320 360 400 440 480 520 560 600 640 680 720 760

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


0 40 80

120 160 200 240 280 320 360 400 440 480 520 560 600 640 680 720 760

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


Burst Length=100



Figure 92 Delay versus offered traffic load between 0 and 0.13 for output


0 80

160 240 320 400 480 560 640 720 800 880 960

1040 1120 1200 1280 1360 1440 1520

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


0 80

160 240 320 400 480 560 640 720 800 880 960

1040 1120 1200 1280 1360 1440 1520

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D




Burst Length=200

Figure 93 Delay versus offered traffic load between 0 and 0.13 for output buffering switch versus proposed switch with L=1600 and burst traffic length=200

108

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r




Burst Length=50


output buffering switch versus proposed switch with L=400 and burst traffic length=50

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


Burst Length=100



Figure 95 Cell loss probability versus offered traffic load between 0 and 0.13 for output buffering switch versus proposed switch with L=800 and burst traffic length=100

109

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r




Burst Length=200


output buffering switch versus proposed switch with L=1600 and burst traffic length=200

6.3 Simulation for proposed switch design with priority queue strategy under multicast traffic



ON/OFF traffic model in section 4.3 will be used to model multicast traffic. The ON and

OFF period is a regular cycle.



simulation in section 5.1 is the ON/OFF traffic generator and the priority queuing control

algorithm. The program flow chart is shown in Figure 97 and it concisely presents the

simulation design and program implementation. The simulation results will be shown and

analysed below. Refer to Appendix G for detailed program implementation.

110

Start




Apply Relay Ring with Priority Queue control scheme


= > multicast cycle? Simulation length <= multicast cycle + burst length ?

Yes

No

111

Figure 97 Simulation program flow chart for proposed switch with priority

queue buffer control

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


Burst Length=50 and Priority Queue Count=5



0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S





Figure 98 Throughput versus offered traffic load between 0 and 0.13 for proposed switch with priority queue buffer control with L=10 versus output buffering switch with L=50 for burst traffic length=50 and priority queue count=5

Transmit packet with Priority Queue Control and Apply FIFO queuing scheme



No


End

Yes

112

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S






proposed switch with priority queue buffer control with L=20 versus output buffering switch with L=100 for burst traffic length=100 and priority queue count=5

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S





Figure 100 Throughput versus offered traffic load between 0 and 0.13 for proposed switch with priority queue buffer control with L=40 versus output buffering switch with L=200 for burst traffic length=200 and priority queue count=5

113

0 3.5

7 10.5

14 17.5

21 24.5

28 31.5

35 38.5

42 45.5

49 52.5

56 59.5

63 66.5

70

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


0 3.5

7 10.5

14 17.5

21 24.5

28 31.5

35 38.5

42 45.5

49 52.5

56 59.5

63 66.5

70

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D





Figure 101 Delay versus offered traffic load between 0 and 0.13 for proposed

switch with priority queue buffer control with NL=80 versus output buffering switch with L=80 for burst traffic length=50 and priority queue count=5

0 7.5 15

22.5 30

37.5 45

52.5 60

67.5 75

82.5 90

97.5 105

112.5 120

127.5 135

142.5 150

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


0 7.5 15

22.5 30

37.5 45

52.5 60

67.5 75

82.5 90

97.5 105

112.5 120

127.5 135

142.5 150

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D





Figure 102 Delay versus offered traffic load between 0 and 0.13 for proposed switch with priority queue buffer control with NL=160 versus output buffering switch with L=160 for burst traffic length=100 and priority queue count=5

114

0 15 30 45 60 75 90

105 120 135 150 165 180 195 210 225 240 255 270 285 300

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D


0 15 30 45 60 75 90

105 120 135 150 165 180 195 210 225 240 255 270 285 300

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Del

ay D





Figure 103 Delay versus offered traffic load between 0 and 0.13 for proposed

switch with priority queue buffer control with NL=320 versus output buffering switch with L=320 for burst traffic length=200 and priority queue count=5

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r





1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r





Figure 104 Cell loss probability versus offered traffic load between 0 and 0.13 for proposed switch with priority queue buffer control with NL=80 versus output buffering switch with L=80 for burst traffic length=50 and priority queue count=5

115

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r






proposed switch with priority queue buffer control with NL=160 versus output buffering switch with L=160 for burst traffic length=100 and priority queue count=5

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r





Figure 106 Cell loss probability versus offered traffic load between 0 and 0.13 for proposed switch with priority queue buffer control with NL=320 versus output buffering switch with L=320 for burst traffic length=200 and priority queue count=5

116

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


Red line --- proposed switch with priority queue count=5

Green line with crosses --- proposed switch with priority queue count=8

Proposed switch with priority queue buffer control



proposed switch with priority queue buffer control with L=10 and burst traffic length=50 for priority queue count=5 and 8

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S








117

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S


0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45 0.5

0.55 0.6

0.65 0.7

0.75 0.8

0.85 0.9

0.95 1

1.05

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Thro

ughp

ut S








0 0.5

1 1.5

2 2.5

3 3.5

4 4.5

5 5.5

6 6.5

7 7.5

8 8.5

9 9.5 10

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


0 0.5

1 1.5

2 2.5

3 3.5

4 4.5

5 5.5

6 6.5

7 7.5

8 8.5

9 9.5 10

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q






Figure 110 Average queue length versus offered traffic load between 0 and 0.13 for proposed switch with priority queue buffer control with L=10 and burst traffic length=50 for priority queue count=5 and 8

118

0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q






Figure 111 Average queue length versus offered traffic load between 0 and 0.13

for proposed switch with priority queue buffer control with L=20 and burst traffic length=100 for priority queue count=5 and 8

0 2 4 6 8

10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q


0 2 4 6 8

10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Ave

rage

Que

ue L

engt

h Q






Figure 112 Average queue length versus offered traffic load between 0 and 0.13 for proposed switch with priority queue buffer control with L=40 and burst traffic length=200 for priority queue count=5 and 8

119

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r



Green line --- proposed switch with priority queue count=8


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r







1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r





1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.07 0.08 0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r





Figure 114 Cell loss probability versus offered traffic load between 0 and 0.13 for proposed switch with priority queue buffer control with L=20 and burst traffic length=100 for priority queue count=5 and 8

120

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r


1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.09 0.1 0.11 0.12 0.13

Cel

l Los

s P

roba

bilit

y P

r







The simulation results for the proposed switch with the priority queuing control

algorithm under the multicast traffic model are shown in Figures 98 to 115. The results

are drawn against the output buffering switch. As we can see from Figures 98 to 100,

there is no difference for throughput measure between the output buffering switch and the

proposed switch. Both achieve optimal throughput under mixed traffic and behave

similarly for delay, which is very low before a traffic load of 0.12 that is equivalent to

0.96 for a full switch load (see Figures 101 to 103). The delay starts to grow sharply after

load 0.12 as the switch is getting saturated. The proposed switch has slightly better delay

characteristics than the output buffering switch, as shown in Figures 101 to 103. By

examining Figures 104 to 106, we can see that the rising point shifts to the right with the

buffer size increment. The cell loss probability is less than 10-9 before the rising point and

quickly increases after the rising point. One conclusion for cell loss probability is that the

buffer size L can be smaller than the burst traffic length for the proposed switch with a

priority queue control algorithm, but it still has superior cell loss probability. The total

buffer size NL must surely be equal to or greater than the burst traffic length in order to

121

obtain improved cell loss probability. Another conclusion is that increasing the buffer

size will shift the rising point, even though the burst traffic length is also increased. This

will improve switch cell loss probability. The final conclusion is that the output buffering

switch still has superior cell loss probability with a priority queuing control algorithm.

But when the buffer size in the proposed switch continues to increase, the cell loss

probability for the proposed switch is getting closer to that of the output buffering switch,

as shown in Figures 104 to 106.

By examining Figures 107 to 115, we can see that there is minimal difference in

performance measures when the priority queue count is increased from 5 to 8. Therefore,

we can achieve high performance with a priority queue buffer control (small control

count) while fairness between parallel buffers is still maintained.

6.4 Summary

Given the above analysis, our conclusions are as follows:

the proposed switch can achieve optimal throughput behaviour like the output

buffering switch under mixed binomial and burst traffic

both switches have similar delay behaviours. The proposed switch has slightly better

delay characteristics than the output buffering switch

the total buffer length has to be equal to or greater than the burst traffic length for

both switches in order to obtain improved cell loss probability

increasing the buffer size will help to improve cell loss probability

the output buffering switch can achieve better cell loss probability than the proposed

switch with round robin buffer control. This is because round robin buffer control

maintains fairness between parallel buffers, but it cannot smooth burst traffic, while

the output buffering switch has enough buffers to smooth burst traffic

the proposed switch with a priority queue buffer control can achieve better cell loss

probability with a smaller buffer size L. But even though a priority queue buffer

control greatly improves cell loss probability and requires less buffers, total buffer

size still has to be equal to or greater than the burst traffic length. The output

buffering switch still has slightly better cell loss probability behaviour than the

proposed switch. This is because a priority queue buffer control still needs to

122

maintain fairness between parallel buffers and cannot fully smooth burst traffic,

while the output buffering switch always has enough buffers to smooth burst traffic

increasing the priority queue buffer control counter will not significantly improve

switch performance. This indicates that we can achieve better performance while

maintaining fairness between parallel buffers

the output buffering switch always has optimal cell loss probability characteristics.

But it needs N times speedup in the internal link in order to avoid internal cell loss

due to multiple cells destined to the same output port. The larger the switch size N,

the harder it is to implement the output buffering switch.

123

7 Complexity and feasibility analysis of the proposed switch design Switch complexity will decide whether the implementation of the switch is feasible,

the suitability of the switch element or switch fabric, or whether the switch can be

expanded to a larger scale. In this chapter, we will analyse the complexity and feasibility

of the proposed switch and compare it with typical ATM switches in terms of complexity.

Finally, the expansion required for a large scale switch will be discussed.

7.1 Complexity and feasibility analysis of proposed switch

In this section, we will analyse the complexity and feasibility of the proposed

switch. By re-examining the proposed switch architecture in Figures 19 and 20 , it is

evident that it is a N x N switch module (N inputs and N outputs) with a fully connected

topology. Further, it is constructed and based on reverse Knockout switch architecture,

and as such can be treated as a type of crossbar topology. But unlike the crossbar switch,

it does not require the maximum number of cross points (N x N) that implement the

packet switch function. Instead, it uses N switching function blocks (bus interfaces).

Since the parallel buffering strategy is adopted in each bus interface, the total number of

buffers for the proposed switch architecture is N x N x b, where b is the buffer size and is

dependent on the performance requirements of the system. But for the input and output

buffering switch, only N x b number of buffers are required. This disadvantages the

proposed switch architecture. But as we have shown and proved in Chapter 6, the

proposed switch with a priority queue buffer control can achieve high performance with a

smaller buffer size. Therefore, the buffer size NL for the proposed switch can be equal to

or smaller than the buffer size L for both the input and output buffering switches. But the

performance is better than the input buffering switch and as good as that of the output

buffering switch. The MGT and TNT in each bus interface can be implemented using

memory technology with simple control logic, as described in Chapter 3. The multicast

function is realised and fulfilled using MGT and TNT, which saves the need for a copy

network for the other switch topology, and greatly reduces hardware complexity. N relay

rings with a simple control logic are required to carry the contention control task for each

output port. The contention control mechanism is also required for the crossbar switch

and/or other switch topology, such as the multistage switch. Here, N relay rings can be

124

made in a separate IC chip module. The signals between parallel buffers in the bus

interface and the relay rings communicate via a data bus (parallel communications) or

shift registers (serial communications). A data bus on IC chip will use more pins, which

require a larger IC chip size, where the shift register only uses one pin, thus requiring a

smaller IC chip. Such a modular design for the bus interface and relay rings make the

switching system easy to implement. The control logic in MGT, TNT and each relay is

very simple. As the switch size increases, the control logic in the relay ring must operate

N times faster. It seems likely that the processing speed of VLSI technology will allow a

very large switch to meet the processing speed requirement of the control logic.

Another advantage of the proposed modular switch design is that it can be well

maintained. If any fault occurs on any one of the bus interfaces, it only affects the traffic

from that particular input port, and can be repaired by simply replacing the faulty bus

interface without interrupting the operations of the whole switching system. On the other

hand, if an intermediate switching point fails in multistage switching topology, it will

affect multiple input/output paths and may also be difficult to locate.

In most switching equipment fault tolerance is required. This is usually

implemented by duplicating the entire switch fabric to serve as a standby in case of

failure. Since all bus interfaces in the proposed switch design are identical, a fault

tolerance can be implemented by adding an extra bus interface as a standby. This standby

could take over the operation of any one of the N bus interfaces if a failure occurs.

Certainly some control mechanism for detecting which bus interface is faulty and

replacing it with a standby is required. The safest implementation for fault tolerance is to

allocate a standby bus interface to every bus interface, increasing switch complexity and

reducing hardware costs.

7.2 Complexity comparison with typical ATM switching network topologies

From a topology point of view, shared memory and shared medium are the

simplest because they only have N input and output ports, and one shared memory or

medium. A Banyan network needs log2N stages and (N/2)*log2N switch elements. But as

it is blocking switch, and if we consider the non-blocking function, the Batcher network

125

is required in front of the Banyan network and needs (N/16)((log2N)2 + log2N) stages and

(N/4)((log2N)2 + log2N) sorting elements. A crossbar switch needs N2 AFs and N

controllers and the Knockout switch needs N bus interfaces only. The proposed switch

needs N bus interfaces and N relay rings.

From a buffer aspect, shared memory and shared medium needs N x L buffers. A

Banyan network needs 2 x L x (N/2)*log2N = N x L x log2N buffers, as it still has

internal blocking even though the Batcher sorting network is applied, as shown in Figure

2. A crossbar switch needs L buffers in each AF in order to solve output contention. That

gives N2 x L buffers for the crossbar switch. The Knockout switch needs N x L buffers.

The proposed switch needs N2 x L buffers. Therefore, the crossbar switch and the

proposed switch have the most buffers.

From an implementation point of view, the Banyan network is simple and only

needs to check whether the bit is “1” or “0” based on the stage to switch the packet. A

shared memory switch requires more complex software to maintain packet sequence,

queuing policy and switch algorithm. A shared medium switch needs N times faster TDM

bus. A crossbar switch only needs AFs and a controller for the queuing policy, switch

algorithm and contention control and should be easy to implement. For the Knockout

switch, each bus interface needs N packet filters, a N to L concentrator and L packet

buffers, and it is not easy to implement the N to L concentrator (N to L tournament). For

the proposed switch, the bus interface and relay ring have a very simple control logic and

switching algorithm and it should be easy to implement.

From a scalability point of view, it is difficult to build a large scale switch fabric

with a Banyan network in terms of the circuit print, as more and more cross connections

between each switch element are required across the large switch fabric. For the shared

memory switch, it would be more difficult to maintain packet sequence, switching delay

and switching software. For a shared medium switch, the bottleneck is surely the TDM.

For the crossbar switch, it is less problematic to build a large scale switch. Even though it

has N2 cross points, it shouldn’t be a problem as the bus structure can be applied at both

input and output. The only limitation is the controller and its speed will determine the

size of the switch fabric. For the Knockout switch, the limitation is the tournament

mechanism for the N to L concentrator. The larger the switch fabric size, the harder it

126

will be to implement the tournament and there will be longer switch delays. For the

proposed switch, the only limitation is the speed of the relay ring, which will determine

the size of the switch fabric. As shown in the tables below, the buffer size required for the

proposed switch decreases incrementally with the switch size. This is due to the

probability of a packet arriving at a particular buffer/output which becomes 1/N. But the

buffer size required for the output buffering switch increases incrementally with the

switch size.

From a functionality point of view, the proposed switch can handle both unicast

and multicast traffic. But the Banyan network, shared memory switch, shared medium

switch, Knockout switch and crossbar switch don’t have a multicast function. To be able

to handle multicast traffic, extra hardware or topology is required, as discussed in

Chapter 2. The Starlite switch, Turner’s broadcast switch, the Recursive switch, Tony

Lee’s multicast switch, the SCOQ switch and the ORCN multicast switch are built and

based on the Banyan network, or Batcher-Banyan network. All inherit this network’s

advantages and drawbacks, as described above. For a more detailed discussion regarding

multicast switches, refer to Chapter 2. These switches propose some kind of copy

network that respectively propose their own copy algorithm or implementation that are

complex to implement and increase the complexity of switch topology, as additional

switching elements or hardware implementation are required. For the Knockout switch,

extra multicast modules or bus interfaces are required, which increase the complexity of

switch topology (using table lookup technique for packet copy), inheriting the advantages

and drawbacks of the Knockout switch. For a more detailed discussion of the Knockout

multicast switch, refer to section 2.2.2. There is no copy network required for the

proposed switch and no extra hardware is required for the multicast function. With the

same topology and implementation, it performs both a unicast and multicast function.

In summary, by taking N, L, cross connection/point, speed of switching

component and switching control logic as the complexity factors, we arrive at the

following conclusions:

1) the Banyan network has log2N stages, (N/2)*log2N switch elements, N x L x log2N

buffers, N x (log2N – 1) cross connections, 2 time speedup for output buffering in

each switching element and a simple switching control logic. The Batcher sorting

127

network has (N/16)((log2N)2 + log2N) stages and (N/4)((log2N)2 +

log2N) sorting elements

2) a shared memory switch has a single switching element, N x L buffers, zero cross

connection, no internal speedup and a complex switching control logic

3) a shared medium switch has a single switching element, N x L buffers, zero cross

connection, N time speed TDM bus and a simple switching control logic

4) a Knockout switch has N bus interfaces, N x L buffers, N2 cross points, a faster N

to L concentrator (N to L tournament) and a simple switching control logic

5) a crossbar switch has a single switching element, N2 x L buffers, N2 cross points, no

internal speedup and a simple switching control logic

6) the proposed switch has N bus interfaces and N relay rings, N2 x L buffers, N2 cross

points, no internal speedup and a simple switching control logic

7) for the Knockout switch, crossbar switch and proposed switch, N2 cross points are

required, which is the most compared with other switch topology. But as they all

have bus type connections, N2 cross points should not be a problem in building a

large scale switch

8) as shown in Tables 1 to 4, for the proposed switch, the buffer size L in each bus

interface would not change with the switch size increment, even though the total

buffer size is still equal to N2 x L. However, due to the modular structure of the

proposed switch, each bus interface operates independently. This is recommended

for building switching fabric with a larger N value. For the output queuing switch,

the buffer size L increases significantly with the switch size increment in order to

maintain good cell loss probability.

Offered traffic load r L=10 L=20 L=30

0 0 0 0

0.003 0 0 0

0.006 0 0 0

0.009 0 0 0

0.012 0 0 0

0.015 0 0 0

0.018 0 0 0

0.021 0 0 0

128

0.024 0 0 0

0.027 0 0 0

0.03 0 0 0

0.033 0 0 0

0.036 0 0 0

0.039 0 0 0

0.042 0 0 0

0.045 0 0 0

0.048 0 0 0

0.051 0 0 0

0.054 0.000003 0 0

0.057 0.000012 0 0

0.06 0.000455 0.000005 0

Note: Offered traffic load = 1/16 = 0.0625 represents full traffic

load

Table 1 Cell loss probability for proposed switch with N=16

Offered traffic load r L=10 L=20 L=30

0 0 0 0

0.001 0 0 0

0.002 0 0 0

0.003 0 0 0

0.004 0 0 0

0.005 0 0 0

0.006 0 0 0

0.007 0 0 0

0.008 0 0 0

0.009 0 0 0

0.01 0 0 0

0.011 0 0 0

0.012 0 0 0

0.013 0 0 0

0.014 0 0 0

0.015 0.000002 0 0

0.016 0.023622 0.022086 0.022516

129

Note: the offered traffic load = 1/64 = 0.015625 represents full traffic

load

Table 2 Cell loss probability for proposed switch with N=64

Offered traffic load r L=30 L=60 L=90 L=120

0 0 0 0 0

0.003 0 0 0 0

0.006 0 0 0 0

0.009 0 0 0 0

0.012 0 0 0 0

0.015 0 0 0 0

0.018 0 0 0 0

0.021 0 0 0 0

0.024 0 0 0 0

0.027 0 0 0 0

0.03 0 0 0 0

0.033 0 0 0 0

0.036 0 0 0 0

0.039 0 0 0 0

0.042 0 0 0 0

0.045 0 0 0 0

0.048 0 0 0 0

0.051 0.000001 0 0 0

0.054 0.000025 0 0 0

0.057 0.00024 0.000003 0 0

0.06 0.003137 0.000285 0.000072 0.000038


load

Table 3 Cell loss probability for output queuing switch with N=16

Offered traffic load r L=30 L=60 L=90 L=120

0 0 0 0 0

0.001 0 0 0 0

130

0.002 0 0 0 0

0.003 0 0 0 0

0.004 0 0 0 0

0.005 0 0 0 0

0.006 0 0 0 0

0.007 0 0 0 0

0.008 0 0 0 0

0.009 0 0 0 0

0.01 0 0 0 0

0.011 0 0 0 0

0.012 0 0 0 0

0.013 0 0 0 0

0.014 0.000155 0 0 0

0.015 0.003305 0.000321 0.000014 0

0.016 0.030095 0.024204 0.023562 0.023082


load

Table 4 Cell loss probability for output queuing switch with N=64

7.3 Building a large scale ATM switch

In order to judge the feasibility of the switch architecture design, a main

requirement is the ability to build a large scale switch. In this section, we shall analyse

and propose the best way to build a large scale ATM multicast switch, based on the

proposed switching model. In Chapter 3, we have shown the simple control logic and

routing algorithm for the relay ring and MGT. The hardware implementation should be

simple. The LOCN (routing) table in MGT and TNT number in bus interface should be

programmable. With simple and modular design architecture and modern VLSI

technology, it should be able to build a fairly large switch module. However, this theory

should be tested by implementation of the switch in the laboratory.

131

Figure 116 Three-stage MARS switching network architecture

There are quite a few approaches to building a large scale switching network

based on N x N switch modules. Well-known approaches are the three-stage Clos

network [23], [27] and [46] and the three-stage MARS network [33]. Both are very

similar except that the MARS network is based on N x N switching modules and the Clos

network is cascaded with n x m switching modules at the first stage, N/n x N/n switching

modules at the second stage and m x n switching modules at the third stage, based on

uneven switching modules. A switch expansion approach based on a Knockout switching

module is also proposed by Yu-Shuan Yeh [9]. This approach needs to increase the

concentrator size in the switching module, which relies on hardware technology.

The best way for the proposed switch design to increase in size without any

changes to the switching module is to use the MARS or Clos networks, as shown in

Figures 116 and 117. Refer to reference [79] for reasons why the Clos network is not

constructed of even switching module and reference [50] for blocking probability of

MARS and Clos network. The proposed switch architecture could be arranged unevenly,

such as n x m or m x n, by changing the parallel buffer number and relay ring size to

match the output port number in order to use the Clos network. Both the MARS and Clos

networks provide multiple routes between each input and output. There is no inter-

132

connection blocking inside the switch, but outgoing inter-connection in each switching

module could occur and this will be resolved by each switching module respectively. A

very large switching network can be built by using the three-stage topology based on the

current switching module built with the MARS or Clos network.

N N

Figure 117 Three-Stage Clos switching network architecture

7.4 Summary

We have discussed the complexity, maintainability and fault tolerance for the

proposed switch architecture and we have also compared its complexity against other

typical ATM switches. In summary, the proposed switch design is modular and has a

very simple topology and control logic. There is no need for a copy network to fulfill the

multicast function because it is well-maintained and has fault tolerance. The only

drawback is that the number of buffers required is N x N x b, where the number of

buffers required is N x b for the input and output buffering switch. As shown in Tables 1

to 4, the proposed switch needs a small b to equal the performance of the output buffering

switch, especially for a large switch size N. It has been shown that the proposed switch

can build a fairly large switching module. Two strategies for building a large scale switch

are proposed in this chapter. The larger the switching module size is, the less switching

model it would need to build a large scale switch, and the less complexity the large scale

switch will be.

N/n

N/n

N/n

N/n

m

m m

n

n

n

n

m

133

8 Conclusion A high performance ATM multicast switch is proposed, designed and based on

reverse Knockout switch structure. It consists of bus interfaces and relay rings and

achieves high performance with high throughput, small buffer size, short packet delay

and low cell loss probability, and includes the following:

• a parallel buffering strategy to solve the HOL blocking problem associated with

input buffering

• a fast table lookup with CAM technique to achieve fast cell duplication and cell

routing for unicast and multicast switch function

• a relay ring is introduced to solve contention among packets from each bus

interface destined for the same output link without internal speedup required by

output buffering.

The idea of a priority token is adopted to achieve dynamic priority reassignment to

the relay ring and to maintain fairness for each bus interface. Each bus interface thus has

the same probability for sending a packet and includes the following:

• the whole switching architecture has a simple design topology and simple control

logic that is programmable and easy to implement

• through programming different control logics such as the longest queue and

priority queue in each relay ring, the proposed switch supports traffic bursts

caused by ABR and VBR traffic

• a queuing model based on multiple-access protocol is introduced. The numerical

results are analysed and compared with numerical results from the queuing model

of input and output buffering strategies

• simulations are also implemented for three different buffering strategies and

simulation results are analysed and compared with their respective numerical

results

• the performance analysis shows that the proposed switch design has superior

performance and can equal the performance of the output buffering switch that

has been proven to have optimal performance relative to any other buffering

strategy

134

• the complexity, maintainability and fault tolerance available using the proposed

switch architecture is discussed in Chapter 7. The analysis shows that the

proposed switch design is modular and has very simple topology and control logic

and there is no need for a copy network to fulfill the multicast function. It is well-

maintained and has fault tolerance because of its modular characteristics. The

only drawback is that the number of buffers required is N x N x b, compared with

N x b for the input buffering switch and the output buffering switch. As discussed,

this is not considered to be a significant limitation with modern VLSI memory

technology

• the scalability issue is also addressed in Chapter 7. With simple and modular

design architecture for the proposed switch and modern VLSI technology, it

should be feasible to build a fairly large switch module. However, this theory

should be proved by implementation of the switch in the laboratory.

When time and funding allow, we will undertake two additional tests in the future.

The first will measure the time delay in the relay ring to estimate how large the switch

module can be built. The second test will build a prototype switch module using the most

current VLSI technology to further prove and determine the maximum speed of the

proposed switch design.

As new technologies emerge, ATM technology could become obsolete. But it has

been used broadly in the current network infrastructure because it still provides a

commercial service and brings financial benefits to network operator and service

providers, while it also guarantees QoS compared with other technologies such as IP. On

the other hand, switching technology potentially offers superior features in higher

throughput, lower latency and increased bandwidth. The proposed switch has achieved

the objectives of analysing the strengths and weaknesses of various, previously proposed

ATM architectures.

135

References [1] H. Ahmadi, W.E. Denzel, ‘A survey of modern high-performance switching techniques,’ IEEE J. Select. Areas Commun., vol. 7, Sept. 1989. [2] Y. Oie, T. Suda, M. Murata, D. Kolson & H. Miyahara, ‘Survey of switching techniques in high-speed networks and their performance,’ in Proceedings, INFOCOM'90, San Francisco, CA, 3–7 June, 1990, pp. 1242-51. [3] J. Y. Hui, E. Arthurs, ‘A broadband packet switch for integrated transport,’ IEEE J. Select. Areas Commun., vol. 5, no. 8, pp. 1264-73, 1987. [4] A. Huang, S. Knauer, ‘Starlite: A wideband digital switch,’ in Proceedings, GLOBECOM'84, Atlanta, GA, Dec. 1984, pp. 121-25. [5] K. E. Batcher, ‘Sorting networks and their applications,’ in Proceedings, IFIPS Spring Joint Comput. Conf., 1968, pp. 307-13. [6] T. T. Lee, ‘Non-blocking copy network for multicast packet switching,’ IEEE J. Select. Areas Commun., vol. 6, pp. 1455-67, Dec. 1988. [7] D. X. Chen, J. W. Mark, ‘SCOQ: A fast packet switch with shared concentration and output queueing,’ IEEE/ACM Trans. Networking, vol. 1, no. 1, pp. 142-51, 1993. [8] M. G. Hluchyj, M. J. Karol, ‘Queueing in high-performance packet switching,’ IEEE J. Select. Areas Commun., vol. SAC-6, pp. 1587-97, Dec. 1988. [9] Y. S. Yeh, M. G. Hluchyj & A. S. Acampora, ‘The Knockout switch: A simple, modular architecture for high-performance packet switching,’ IEEE J. Select. Areas Commun., vol. 5, no. 8, pp. 1274-82, 1987. [10] M. E. Woodward, Communication and Computer Networks: Modelling with discrete-time queues, London, Pentech Press, 1993. [11] J. N. Giacopelli, W. D. Sincoskie & M. Littlewood, ‘Sunshine: A high

performance self-routing broadband packet switch architechture,’ in Proceedings, ISS'90, 1990, pp. 123-29.

[12] B. Bingham, H. Bussey, ‘Reservation-based contention resolution mechanism for Batcher-Banyan packet switches,’ IEEE Electronics Letters, vol. 24, no. 13, pp. 772-73, 23 June 1988. [13] J. S. Turner, ‘Design of an integrated services packet network,’ IEEE J. Select. Areas Commun., vol. 4, no. 8, pp. 1373-79, 1986.

136

[14] M. J. Narasimha, ‘The Batcher-Banyan self-routing network: Universality and simplification,’ IEEE Trans. Commun. vol. 36, no. 10, pp. 1175-78, Oct. 1988. [15] Y. M. Kim, K. Y. Lee, ‘PR-Banyan: A packet switch with a pseudo randomizer for non-uniform traffic,’ ICC'91, pp. 408-12, 1991. [16] Martin de Prycher, Asynchronous Transfer Mode: solution for broadband ISDN, New York, Ellis Horwood, 1991. [17] J. S. Turner, ‘Design of a Broadcast Packet Switching Network,’ in Proceedings, INFOCOM’86, 1986, pp. 667-75. [18] W. D. Zhong, Y. Onozato & J. Kaniyil, ‘A Copy Network with Shared Buffers for Large-Scale Multicast ATM Switching,’ IEEE/ACM Trans. Networking, vol. 1, no. 2, April 1993, pp. 157-65. [19] R. Cusani, F. Sestini, ‘A Recursive Multistage Structure for Multicast ATM Switching,’ IEEE INFOCOM’91, Miami, Florida, April 1991, pp. 171-80. [20] D. Chen, J. Mark, ‘Multicasting in the SCOQ Switch,’ in Proceedings, IEEE INFOCOM'94, pp. 290-97. [21] W. D. Zhong, S. Shimamoto, Y. Onozato & J. Kaniyil, ‘A Recursive Copy

Network for a Large Multicast ATM Switch,’ in Proceedings, ISS’92, Oct. 1992, vol. 2, pp. 161-65.

[22] K. Eng, M. Hluchyj & Y. Yeh, ‘Multicast and Broadcast Services in a Knockout Packet Switch,’ IEEE INFOCOM’88, pp. 29-34. [23] K. S. Chan, Sammy Chan, Kwan L. Yeung, K. T. Ko & Eric W. M. Wong, ‘Clos-

Knockout: A Large-Scale Modular Multicast ATM Switch,’ in Proceedings, Globecom’96, vol. 2, Nov. 1996, pp. 1358-62.

[24] E. Del Re, R. Fantacci, ‘Performance Evaluation of Input and Output Queueing Techniques in ATM Switching Systems,’ IEEE Trans. Communications, vol. 41, no. 10, Oct. 1993, pp. 1565-74. [25] A. J. McAuley, P. Francis, ‘Fast Routing Table Lookup Using CAMs,’ IEEE INFOCOM 93, vol. 3, Mar. 1993, pp. 1382-91. [26] H. Jonathan Chao, Byeong-Seog Choe, ‘Design and Analysis of a Large-Scale Multicast Output Buffered ATM Switch,’ IEEE/ACM Transactions on Networking, vol. 3, no. 2, Apr. 1995, pp. 126-38. [27] Tse-Yun Feng, ‘A Survey of interconnection Networks,’ Computer, vol. 14, pp. 12- 27, Dec. 1981.

137

[28] P. S. Min, H. Saidi, M. V. Hegde, ‘A Nonblocking Architecture for Broadband Multichannel Switching,’ IEEE/ACM Transactions on Networking,’ vol. 3, NO. 2, April, 1995, pp. 181-198. [29] Y. Oie, T. Suda, M. Murata, H. Miyahara, ‘Survey of the Performance of Nonblocking Switches with FIFO Input Buffers,’ IEEE Proc. of ICC' 90, April 1990. [30] A. Pattavina, ‘Nonblocking Architectures for ATM Switching,’ IEEE Comm. Magazine, Feb 1993, pp. 38-48. [31] E. W. Zegura, ‘Architectures for ATM Switching Systems,’ IEEE Comm. Magazine, Feb 1993, pp. 28-37. [32] Batcher, K. E., ‘Sorting Networks and Their Applications,’ AFIPS Conference Proceedings 1968, SJCC, pp. 307-314. [33] J. J. Degan, G. W. R. Luderer, A. K. Vaidya, ‘Fast Packet Technology for Future Switches,’ AT&T Technical Journal, March-April 1989, pp. 36-50. [34] F. A. Tobagi, ‘Fast Packet Switch Architectures for Broadband Integrated Services Digital Networks,’ Proceedings of the IEEE, vol. 78, January 1990. [35] Chuan-Lin Wu, Tse-Yun Feng, ‘On a Class of Multistage Interconnection Networks,’ IEEE Transactions on Computers, vol. C-29, no. 8, August 1980, pp. 694-702. [36] P. Coppo, M. D' Ambrosio, R. Melen, ‘Optimal Cost/Performance Design of ATM Switches,’ IEEE/ACM Transactions on Networking, vol. 1, no. 5, Oct 1993. [37] M. Beshai, E. Munter, ‘Multi-Tera-Bit/s Switch Based on Burst Transfer and Independent Shared Buffers,’ IEEE Proc. GLOBECOMM' 95, Nov. 1995, pp. 1724- 1730. [38] H. Ahmadi et al, ‘A High Performance Switch Fabric for Integrated Circuit and Packet Switching,’ IEEE Proc. INFOCOM' 88, New Orleans, LA, Mar. 1988, pp. 9- 18. [39] J. P. Coudreuse, M. Servel, ‘Prelude: An Asynchronous Time-Division Switched Network,’ IEEE Proc. ICC' 87, Seattle, WA, June 1987, pp. 769-773. [40] I. S. Gopal, I. Cidon, H. Meleis, ‘Paris: An Approach to Integrated Private Networks,’ IEEE Proc. ICC' 87, Seattle, WA, June 1987, pp. 764-773.

138

[41] X. Liu, H. T. Mouftah, ‘Design of a High Performance Nonblocking Copy Network for Multicast ATM Switching,’ IEE Proc. Comm. Vol. 141, No. 5, Oct. 1994, pp. 317-324. [42] R. Y. Awdeh and H. T. Mouftah, ‘The Expanded Delta Fast Packet Switch,’ Proc. IEEE ICC' 94, May 1994, pp. 397-401. [43] S. Kumar, D. P. Agrawal, ‘A Shared-Buffer Direct-Access (SBDA) Switch Architecture for ATM-Based Networks,’ Proc. IEEE ICC' 94, May 1994, pp.101- 105. [44] H. Saidi, P. S. Min, ‘Nonblocking Multi-Channel Switching in ATM Networks,’ Proc. IEEE ICC' 94, May 1994, pp. 415-419. [45] R. Venkateswaran, C. S. Raghavendra, ‘Multicast Switch Based on Tandem Expanded Delta Network,’ Proc. IEEE GLOBECOM' 95, Nov. 1995, pp. 1707-1711. [46] S. Kuroyanagi, K. Hironishi, T. Maeda, ‘Optical Cross-Connect Architecture Using Free-Space Optical Switches Based on PI-LOSS Topology,’ Proc. IEEE GLOBECOM' 95, Nov. 1995, pp. 2112-2117. [47] Xiaoqiang Chen, V. Kumar, ‘Multicast Routing in Self-Routing Multistage Networks,’ Proc. IEEE INFOCOM'94, pp. 306-314. [48] M. R. Hashemi, A. Leon-Garcia, ‘A Multicast Single-Queue Switch with a Novel Copy Mechanism,’ Proc. IEEE INFOCOM' 98, pp. 800-807. [49] A. Jajszczyk, M. Roszkiewicz, ‘ORANGE --- A New Class of ATM Switching Networks,’ Proc. IEEE ICC' 97, vol. 1, pp. 462-466. [50] M. Listanti, L. Veltri, ‘Blocking Probability of Three-Stage Multicast Switches,’ Proc. IEEE ICC' 98, vol. 1, pp. 623-629. [51] H. Yamanaka, H. Saito, H. Kondoh, et all, ‘Scalable Shared-Buffering ATM Switch with a Versatile Searchable Queue,’ IEEE J. Select. Areas Comm., vol. 15, No. 5, June 1997, pp. 773-783. [52] K. L. Eddie Law, A. Leon-Garcia, ‘A Large Scalable ATM Multicast Switch,’ IEEE J. Select. Areas Comm., vol. 15, No. 5, June 1997, pp. 844-854. [53] Jeen-Fong Lin, Sheng-De Wang, ‘High-Performance Low-Cost Non-Blocking Switch for ATM,’ IEEE INFOCOM' 96, vol. 2, pp. 818-821. [54] F. Sestini, ‘Recursive Copy Generation for Multicast ATM Switching,’ IEEE /ACM Trans. Networking, vol. 5, No. 3, June 1997, pp. 329-335.

139

[55] Hyoung-Il Lee, Seung-Woo Seo, Hyuk-Jae Jang, ‘A High Performance ATM Switch Based on the Augmented Composite Banyan Network,’ Proc. IEEE ICC' 98, vol.1, pp. 309-313. [56] H. J. Siegel, R. J. McMillen, ‘The Multistage Cube: A Versatile Interconnection Network,’ Computer, vol. 14, Dec. 1981, pp. 65-76. [57] D. Knox, S. Panchanathan, ‘Parallel Searching Techniques for Routing Table Lookup,’ IEEE Proc. INFOCOM 93, Vol. 3, Mar. 1993, pp. 1400-1405. [58] H. S. Kim, ‘Multinet Switch: Multistage ATM Switch Architecture with Partially Shared Buffers,’ IEEE Proc. INFOCOM 93, Mar. 1993, pp. 473-480. [59] S. Sibal, Ji Zhang, ‘On a Class of Banyan Networks and Tandem Banyan Switching Fabrics,’ IEEE Proc. INFOCOM 93, Mar. 1993, pp. 481-488. [60] P. Harubin, S. Chowdhury, B. Sengupta, ‘The Design of a Distributor in a Large ATM Switch,’ IEEE Proc. GLOBECOM' 95, Nov. 1995, pp. 2080-2086. [61] S. H. Kang, C. Oh, D. K. Sung, ‘A High Speed ATM Switch with Common Parallel Buffers,’ IEEE Proc. GLOBECOM' 95, Nov. 1995, pp. 2087-2091. [62] H. Jonathan Chao, J. S. Park, ‘Architecture Designs of A Large-Capacity ABACUS ATM Switch,’ IEEE Proc. GLOBECOM' 98, Nov. 1998, pp. 1793-1798. [63] J. D. Ho, N. K. Sharma, ‘A Growable Shared-Memory Based Multicast ATM Switch,’ IEEE Proc. GLOBECOM' 98, Nov. 1998, pp. 1829-1834. [64] N. K. Sharma, J. D. Ho, ‘Modular Design of a Large Multicast ATM Switch,’ IEEE Proc. GLOBECOM' 98, Nov. 1998, pp. 1841-1846. [65] Ming-Huang Guo, Ruay-Shiung Chang, ‘A Simple Multicast ATM Switches Based on Broadcast Buses,’ IEEE Proc. GLOBECOM' 98, Nov. 1998, pp. 1835-1840. [66] E. D. Sykas, D. E. Karvelas, E. N. Protonotarios, ‘Queueing Analysis of Some Buffered Random Multiple Access Schemes,’ IEEE Trans. Comm. Vol. COM-34, No. 8, August 1986, pp. 790-798. [67] A. Ganz, I. Chlamtac, ‘A Linear Solution to Queueing Analysis of Synchronous Finite Buffer Networks,’ IEEE Trans. Comm. Vol. 38, No. 4, April 1990, pp. 440- 446. [68] I. Chlamtac, O. Ganz, ‘An Optimal Hybrid Demand Access Protocol for Unrestricted Topology Broadcast Networks,’ IEEE Proc. INFOCOM' 86, Miami, FL, April 1986, pp. 204-213.

140

[69] A. Ganz, I. Chlamtac, ‘Queueing Analysis of Finite Buffer Token Networks,’ 1988 ACM SIGMETRICS, Sante Fe, NM, May 19988, pp. 30-36. [70] F. A. Tobagi, ‘Multiaccess Protocols in Packet Communication Systems,’ IEEE Trans. Comm. Vol. COM-28, No. 4, April 1980, pp. 468-488. [71] Leonard Kleinrock, ‘On Queueing Problems in Random-Access Communications,’ IEEE Trans. Information Theory, Vol. IT-31, No. 2, Mar. 1985, pp. 166-175. [72] S. Tasaka, ‘Performance Analysis of Multiple Access Protocols,’ MIT Press, 1986. [73] R. Rom, M. Sidi, ‘Multiple Access Protocols: Performance and Analysis,’ Springer- Verlag, 1990. [74] H. Tagagi, ‘Analysis of Polling Systems,’ MIT Press, 1986. [75] J. Walrand, ‘An Introduction to Queueing Networks,’ Prentice-Hall, 1988. [76] R. W. Wolff, ‘Stochastic Modelling and the Theory of Queues,’ Prentice-Hall, 1989. [77] Mark J. Karol, Michael G. Hluchyj, Samuel P. Morgan, ‘Input Versus Output Queueing on a Space-Division Packet Switch,’ IEEE Trans. Comm., vol. COM-35, No. 12, Dec. 1987, pp. 1347-1356. [78] Michael G. Hluchyj, Mark J. Karol, ‘Queueing in High-Performance Packet Switching,’ IEEE J. Select. Areas Comm., vol. SAC-6, pp. 1587-1597, Dec. 1988. [79] M. D'Ambrosio, R. Melen, ‘Performance Analysis of ATM Switching Architectures: A Review,’ CSELT Technical Reports, vol. XX, No. 3, June 1992, pp. 265-281. [80] Y-C. Jeng, ‘Performance Analysis of a Packet Switch Based on a Single-Buffered Banyan Network,’ IEEE J. Select. Areas Comm., vol. SAC-1, pp. 1014-1021. [81] G. N. Higginbottom, ‘Performance Evaluation of Communication Networks,’ Boston: Artech House, 1998. [82] H. J. Siegel, ‘Interconnection Networks for Large-Scale Parallel Processing: Theory and case studies,’ New York: McGraw-Hill, 1990. [83] Leonard Kleinrock, ‘Queueing Systems,’ New York: John Wiley & Sons, 1975. [84] R. O. Onvural, ‘Asynchronous Transfer Mode Networks: Performance Issue,’ Boston: Artech House, 1994. [85] T. G. Robertazzi, ‘Computer Networks and Systems: Queueing Theory and Performance Evaluation,’ New York: Springer-Verlag, 1991.

141

[86] Joseph Y. Hui, ‘Switching and Traffic Theory for Integrated Broadband Networks,’ Kluwer Academic Publishers, 1992. [87] C. Kolias, Leonard Kleinrock, ‘Performance Analysis of Multiplane Nonblocking ATM Switches,’ IEEE Proc. GLOBECOM' 98, pp. 1781-1786. [88] O. J. Boxma, W. P. Groenendijk, ‘Waiting Times in Discrete-Time Cyclic-Service Systems,’ IEEE Trans. Comm., vol. 36, No. 2, Feb. 1988, pp. 164-170. [89] J. B. Evans, E. Duron, Yizhen Wang, ‘Analysis and Implementation of a Priority Knockout Switch,’ IEEE Proc. INFOCOM 93, Mar. 1993, pp. 1099-1106. [90] E. W. Zegura, ‘Evaluating Blocking Probability in Distributors,’ IEEE Proc. INFOCOM 93, Mar. 1993, pp. 1107-1116. [91] Myung J. Lee, David S. Ahn, ‘Cell Loss Analysis and Design Trade-Offs of Nonblocking ATM Switches with Nonuniform Traffic,’ IEEE/ACM Trans. Networking, vol. 3, No. 2, April 1995, pp. 199-210. [92] B.-S. Choe, H. Jonathan Chao, ‘Performance Analysis of A Large-Scale Multicast Output Buffered ATM Switch,’ Proc. IEEE INFOCOM'94, pp. 1472-1479. [93] Wesley W. Chu, ‘Buffer Behavior for Batch Poisson Arrivals and Single Constant Output,’ IEEE Trans. Comm. Tech., vol. COM-18, No. 5, Oct. 1970, pp. 613-618. [94] C. G. Kang, H. H. Tan, ‘Queueing Analysis of Explicit Priority Assignment Partial Buffer Sharing Schemes for ATM Networks,’ IEEE Proc. INFOCOM 93, Mar. 1993, pp. 810-819. [95] T.-H. Lee, S.-J. Liu, ‘Performance Analysis of a Large Scale ATM Switch with Input and Output Buffers,’ Proc. IEEE INFOCOM'94, pp. 1465-1471. [96] G. Bianchi, Jonathan S. Turner, ‘Improved Queueing Analysis of Shared Buffer Switching Networks,’ IEEE Proc. INFOCOM 93, Mar. 1993, pp. 1392-1399. [97] Yuji Oie, K. Kawahara, M. Murata, H. Miyahara, ‘Performance Analysis of Internally Unbuffered Large Scale ATM Switch with Bursty Traffic,’ IEEE Proc. INFOCOM 93, Mar. 1993, pp. 1270-1279. [98] S. Gianatti, A. Pattavina, ‘Performance Analysis of Shared-Buffered Banyan Networks under Arbitrary Traffic Patterns,’ IEEE Proc. INFOCOM 93, Mar. 1993, pp. 943-952. [99] J. F. Hayes, R. Breault, M. K. Mehmet-Ali, ‘Performance Analysis of a Multicast Switch,’ IEEE Trans. Comm., vol. 39, No. 4, April 1991, pp. 581-587.

142

[100] A. K. Choudhury, E. L. Hahne, ‘Buffer Management in a Hierarchical Shared Memory Switch,’ Proc. IEEE INFOCOM'94, pp. 1410-1419. [101] Bin Zhou, M. Atiquzzaman, ‘Performance of Output-Multibuffered Multistage Interconnection Networks under General Traffic Patterns,’ Proc. IEEE INFOCOM'94, pp. 1448-1455. [102] Hongxu Chen, J. Lambert, A. Pitsillides, ‘RC-BB Switch: a High Performance Switching Network for B-ISDN’, IEEE Proc. GLOBECOM'95, Nov. 1995. [103] Hongxu Chen, J. Lambert, A. Pitsillides, ‘Input Queuing Strategy for Batcher- Banyan Switching Network’, Proc. ATNAC'95, Dec. 1995. [104] Hongxu Chen, J. Lambert, ‘Design of a High Performance Multicast ATM Switch for B-ISDN’ , IEEE Proc. GLOBECOM' 98, Nov. 1998, pp. 1859-1864. [105] Xiu Shi Liang, ‘ Data Structures and Algorithms in C,’ Beijing: Qinghua University Publisher, 1993. [106] G. James, D. Burley, D. Clements, P. Duke, J. Searl, J. Wright, ‘Modern Engineering Mathematics,’ New York: Addison-Wesley, 1993. [107] ‘MPLS Tutorial,’ http://www2.rad.com/networks/2000/mpls. [108] Sadikin Djumin, ‘Gigabit Networking: High-Speed Routing and Switching,’ http://www.cse.ohio-state.edu/~jain/cis788-97/ftp/gigabit_nets/ [109] Shishir Agrawal, ‘IP SWITCHING,’ http://www.cse.ohio-state.edu/~jain/cis788-97/ip_switching/ [110] K.A.Hawick, H.A.James, M.Buchhorn, M.Rezny, ‘An ATM-based Distributed High Performance Computing System,’ Proc. HPCN Europe’97 Vienna, 28-30 April 1997. [111] J.A.Mathew, K.A.Hawick, ‘Applying ATM to Distributed and High Performance Computing on Local and Wide Area Networks,’ Technical Report DHPC-016, 14 August 1997. [112] ‘Switch-based Architecture vs Bus-based Architecture,’ http://h18002.www1.hp.com/alphaserver/technology/switch-based.html [113] Meina. Song, Junde. Song, and Renjie. Pi, ‘An Improved Multicast Traffic Scheduling Scheme in the Packet-Switching Systems,’ Proc. ATNAC 2003, Dec. 2003. [114] Ming-Huang Guo, Ruay-Shiung Chang, ‘Multicast ATM Switches: Survey and Performance Evaluation,’ Computer Communication Review, ACM SIGCOMM, April 1998.

143

Appendix A MATLAB program for calculating the numerical

results for the queuing model of the proposed switch The computer program implementation to calculate the numerical results for the

queuing model of the proposed switch architecture is given here. The program was

implemented using MATLAB.

#this is the beginning of the main program in MATLAB N=8; lm(1)=3; lm(2)=5; lm(3)=8; lm(4)=10; lm(5)=20; lm(6)=30; for i=1:3, rtn(i)=0.0; end for j=1:20, THS(j)=0.0; AQL(j)=0.0; PLOS(j)=0.0; xaxis(j)=0.0; end r=0.01; for j=1:20, if (j==1) xaxis(j)=0; else xaxis(j)=r; end rtn=nrmat(r,lm(6),N); THS(j)=rtn(1); AQL(j)=rtn(2); PLOS(j)=rtn(3); r=r+0.01; end #this is the end of the main program in MATLAB #functions are defined below function rtn = nrmat(x,y,z) for i=1:3, rtn(i)=0.0; end cnt=0; loop=0; sTS=0.0; for i=1:z, for j=1:(y+1), calinx(i,j)=cnt+1; cnt=cnt+1; end end for i=1:z,

144

if (y==1) xn(i)=1.0; else xn(i)=0.5; end end for loop=0:30, for i=1:(z*(y+1)), for j=1:(z*(y+1)), tp1(i,j)=0.0; tp2(i,j)=0.0; tp3(i,j)=0.0; tp4(i,j)=0.0; tp5(i,j)=0.0; w(i,j)=0.0; end end for i=1:(z-1), for j=0:y, inx=calinx(i+1,j+1); iny=calinx(i,j+1); nvl=fn(i,j); tp1(inx,iny)=i*en(nvl)*xn(nvl)*((1-x)^(ux(y-1)+ux(y-j)))*bb((z-i-1),0,x); end end for i=1:(z-1), for j=0:(y-1), inx=calinx(i+1,j+1); iny=calinx(i,j+2); nvl=fn(i,j); tp2(inx,iny)=i*en(nvl)*xn(nvl)*x*((1-x)^ux(y-1))*bb((z-i-1),0,x); end end for i=0:(z-1), for j=1:y, for k=i:(z-1), inx=calinx(i+1,j+1); iny=calinx(k+1,j); nvl=fn(i,j); tp3(inx,iny)=en(nvl)*((1-x)^ux(y-j))*bb((z-i-1),(k-i),x); end end end for i=0:(z-1), for j=0:(y-1), for k=i:(z-1), inx=calinx(i+1,j+1); iny=calinx(k+1,j+2); nvl=fn(i,j); if (nvl==0) tp4(inx,iny)=x*bb((z-i-1),(k-i),x)*(1-sn(nvl)+i*en(nvl)); else tp4(inx,iny)=x*bb((z-i-1),(k-i),x)*(1-sn(nvl)+i*en(nvl)*(1-xn(nvl)+xn(nvl)*x*(ux(y-1)+((z-k-1)/(k-i+1))*((1-x)^(ux(y-1)-1))))); end

145

end end end for i=0:(z-1), for j=0:y, for k=i:(z-1), inx=calinx(i+1,j+1); iny=calinx(k+1,j+1); nvl=fn(i,j); if (nvl==0) tp5(inx,iny)=(1-x*ux(y-j))*bb((z-i-1),(k-i),x)*(1-sn(nvl)+i*en(nvl))+ux(y-j)*ux(j)*en(nvl)*(x/(1-x*ux(y-j))); else tp5(inx,iny)=(1-x*ux(y-j))*bb((z-i-1),(k-i),x)*(1-sn(nvl)+i*en(nvl)*(1-xn(nvl)+xn(nvl)*x*(ux(y-1)+((z-k-1)/(k-i+1))*((1-x)^(ux(y-1)-1))))+ux(y-j)*ux(j)*en(nvl)*(x/(1-x*ux(y-j)))); end end end end for i=1:(z*(y+1)), for j=1:(z*(y+1)), if (i==j) w(i,j)=tp1(j,i)+tp2(j,i)+tp3(j,i)+tp4(j,i)+tp5(j,i); else w(i,j)=tp1(j,i)+tp2(j,i)+tp3(j,i)+tp4(j,i)+tp5(j,i)+1; end end end for i=1:(z*(y+1)), wk(i)=0.0; ws(i)=1.0; end B=ws'; X=w\B; wk=X'; TS=ThroughPut(y,z,wk,calinx); for i=1:z, inx=calinx(i,2); xn(i)=wk(inx)/sumWKX(i,y,wk,z,calinx); end dlTS=abs(TS-sTS); if (dlTS<=0.001) break; end sTS=TS; end avgql=averageQlnth(y,z,wk,calinx); pktloss = Reject(y,z,wk,calinx); rtn(1)=TS; rtn(2)=avgql; rtn(3)=pktloss; function S = ThroughPut(l,n,wx,cinx) S=0; for i=0:(n-1) for j=0:l,

146

inx=cinx(i+1,j+1); nvl=fn(i,j); S=S+wx(inx)*sn(nvl); end end function aql = averageQlnth(l,n,wx,cinx) aql=0.0; for i=0:(n-1), for j=1:l, inx=cinx(i+1,j+1); aql=aql+j*wx(inx); end end function ploss = Reject(l,n,wx,cinx) ploss=0.0; for i=0:(n-1), inx=cinx(i+1,l+1); ploss=ploss+wx(inx); end function b = bb(x,y,z) b=nchoosek(x,y)*(z^y)*((1-z)^(x-y)); function u = ux(x) if x==0 u=0; else u=1; end function s = sn(i) if (i==0) s=0; else s=1; end function e = en(i) if i<1 e=0; else e=sn(i)/i; end function swk = sumWKX(i,l,wx,n,cinx) swk=0.0; for j=2:(l+1), inx=cinx(i,j); swk=swk+wx(inx); end function n = fn(i,j) n=i+ux(j);

147

Appendix B Simulation program for proposed switch The simulation program implementation to calculate the performance measures

for the proposed switch architecture is given here. The program was compiled with a g++

compiler and run using a Linux operating system. #include <stdio.h> #include <MLCG.h> #include <Binomial.h> #include <DiscUnif.h> #define MAX_SIM (long int)1000000 #define MAX_QLNTH (long int)30 #define MAX_IN (long int)8 #define LAMBDA (double)0.01 #define SIM_TIMES (long int)1 /* global variable of queue and queue pointers */ int queue[MAX_IN][MAX_QLNTH], qptr[MAX_IN]; void fifo(int qinx) { int index=0; if (qptr[qinx]>0) { for (index=1; index<(qptr[qinx]+1); index++) { queue[qinx][index-1]=queue[qinx][index]; } index=qptr[qinx]; queue[qinx][index]=0; qptr[qinx]-=1; } else { queue[qinx][index]=0; qptr[qinx]=-1; //represent an empty queue } } void sim(double lambda) { double totalpkt=0.0, totalpass=0.0, totaloss=0.0; long int loop, simTimes=SIM_TIMES, pktinQ=0, qct=0; double loss[MAX_IN], pktpass[MAX_IN], meanQlnth[MAX_IN]; int goflag=1, inloop, token=0, tpos, inum, Anum=0; int qloss=0, chk, inchk[MAX_IN]; double num, throughput=0.0, celloss=0.0, mql=0.0; /* initialise each input queues */ for (loop=0; loop<MAX_IN; loop++) { for (inloop=0; inloop<MAX_QLNTH; inloop++) { queue[loop][inloop]=0; }

148

qptr[loop]=-1; //represent empty queue loss[loop]=0; inchk[loop]=0; pktpass[loop]=0; meanQlnth[loop]=0; } //create Binomial and Discrete time uniform RNG MLCG G(0,1); MLCG outG(0,1); Binomial TrafficGen(MAX_IN,lambda,&G); DiscreteUniform outPortNum(0,(MAX_IN-1),&outG); for (loop=0; loop<MAX_SIM; loop++) { num=TrafficGen(); /* Packet arrive in one time slot and follow Binomial distribution */ totalpkt+=num; for (Anum=0; Anum<num; Anum++) /* put packet in its associated queue */ { inum=(int)outPortNum(); /* packet arrive to a particular input queue follow uniform distribution */ if (Anum>0) //In one time slot, there is only one packaget for each input queue { for (chk=0; chk<Anum; chk++) { if (inum==inchk[chk]) { inum=(int)outPortNum(); chk=0; } } } inchk[Anum]=inum; //save the current input queue number for checking duplicated distribution to the same input queue in one time slot qloss=qptr[inum]; //count cell loss if (qloss == (MAX_QLNTH-1)) //queue is full { loss[inum]+=1; /* incoming packet is lost */ } else //put new packet into the input queue { queue[inum][qloss+1]=1; qptr[inum]+=1; } } for (qct=0; qct<MAX_IN; qct++) //count average queue length { meanQlnth[qct]+=(double)(qptr[qct]+1); } /* packet depart */ tpos=token; while(queue[tpos][0]==0)

149

{ tpos=(tpos+1)%MAX_IN; if (token==tpos) //there is no packaet to send { throughput+=1; goflag=0; break; } } if (goflag==1) { pktpass[tpos]+=1;

fifo(tpos); token=(tpos+1)%MAX_IN; /* token is given to next relay */ } else /* nobody has a packet to send */ { goflag=1; /* put depart flag back for next iteration */ } if (loop==(MAX_SIM-1)) { simTimes=simTimes-1; if (simTimes>0) loop=0; } } for (loop=0; loop<MAX_IN; loop++) { totalpass+=pktpass[loop]; totaloss+=loss[loop]; pktinQ+=qptr[loop]+1; mql+=meanQlnth[loop]; } printf("load %f, throughput %f, packet pass %f, packet loss %f, packet in queue %d, total packet %f\n", lambda, throughput, pktpass, loss, (qptr+1), totalpkt); mql=mql/(MAX_SIM*MAX_IN*SIM_TIMES); throughput=throughput/(MAX_SIM*SIM_TIMES); throughput=1.0-throughput; celloss=totaloss/totalpkt; printf("Throughput --- %f\n", throughput); printf("Mean Queue Length --- %f\n", mql); printf("Cell Loss --- %f\n", celloss); } int main() { int number=0; double load=LAMBDA; for (number=0; number<13; number++) {

150

sim(load); load+=0.01; } return 0; }

151

Appendix C Simulation program for input buffering switch The simulation program implementation to calculate the performance measures for the input buffering switch is given here. The program was compiled with a g++ compiler and run using a Linux operating system. #include <stdio.h> #include <MLCG.h> #include <Binomial.h> #include <Uniform.h> #define MAX_SIM (long int)1000000 #define MAX_QLNTH (long int)30 #define MAX_IN (long int)8 #define LAMBDA (double)0.01 #define SIM_TIMES (long int)1 /* global variable of queue and queue pointers */ int queue[MAX_IN][MAX_QLNTH], qptr[MAX_IN]; void fifo(int qinx) { int index=0;

if (qptr[qinx]>0) { for (index=1; index<(qptr[qinx]+1); index++) { queue[qinx][index-1]=queue[qinx][index]; } index=qptr[qinx]; queue[qinx][index]=-1; qptr[qinx]-=1; } else { queue[qinx][index]=-1; qptr[qinx]=-1; //represent an empty queue } } void sim(double lambda) { double totalpkt=0.0, totalpass=0.0, totaloss=0.0; long int loop, simTimes=SIM_TIMES, pktinQ=0, qct=0; double loss[MAX_IN], pktpass[MAX_IN], meanQlnth[MAX_IN]; int tmp, controller[MAX_IN][MAX_IN], inloop, token=0, inum, Anum=0; int destination=0, qloss=0, counter[MAX_IN], RDselect=0, inlet, chk, inchk[MAX_IN], outlet[MAX_IN]; double num, throughput=0.0, celloss=0.0, mql=0.0; /* initialise each input queues */ for (loop=0; loop<MAX_IN; loop++) { for (inloop=0; inloop<MAX_QLNTH; inloop++) { queue[loop][inloop]=-1;

152

} for (inlet=0; inlet<MAX_IN; inlet++) { controller[loop][inlet]=-1; } qptr[loop]=-1; loss[loop]=0; inchk[loop]=0; pktpass[loop]=0; counter[loop]=0; meanQlnth[loop]=0; outIdle[loop]=0; } //create Binomial and Discrete time uniform RNG MLCG outG(0,1); MLCG inG(0,1); MLCG selG(0,1); MLCG G(0,1); Binomial TrafficGen(MAX_IN,lambda,&G); Uniform outPortNum(0,(MAX_IN-1),&outG); Uniform inPortNum(0,(MAX_IN-1),&inG); Uniform RDGselect(0,(MAX_IN-1),&selG); for (loop=0; loop<MAX_SIM; loop++) { num=TrafficGen(); /* Packet arrive in one time slot and follow Binomial distribution */ totalpkt+=num; /* Allocate output port for those new arrivals */ for (Anum=0; Anum<num; Anum++) { outlet[Anum]=(int)outPortNum(); } for (Anum=0; Anum<num; Anum++) /* put packet in its associated queue */ { inum=(int)inPortNum(); /* packet arrive to a particular queue follow uniform distribution */ if (Anum>0) { for (chk=0; chk<Anum; chk++) { if (inum==inchk[chk]) { inum=(int)inPortNum(); chk=0; } } } inchk[Anum]=inum; qloss=qptr[inum]; if (qloss == (MAX_QLNTH-1)) { loss[inum]+=1; /* incomming packet is lost */ }

153

else { queue[inum][qloss+1]=outlet[Anum]; qptr[inum]+=1; } } /* end of putting packet into the queue */ for (qct=0; qct<MAX_IN; qct++) { meanQlnth[qct]+=(double)(qptr[qct]+1); } /* reset contention controller */ for (inlet=0; inlet<MAX_IN; inlet++) { counter[inlet]=0; } /* contention control */ /* counter is used to solve the contention that more than one packet are destined to the same out port controller is used to remember the input queue from where the packet want to go to this output port */ for (inlet=0; inlet<MAX_IN; inlet++) { if (queue[inlet][0]!=-1) /* queue is not empty and has a packet to send*/ { destination=queue[inlet][0]; tmp=counter[destination]; controller[destination][tmp]=inlet; counter[destination]+=1; } } /* end of contention control */ /* find out idle output port to count switch throughput */ for (chk=0; chk<MAX_IN; chk++) { if (counter[chk]==0) /* there is no packet destined to this output port */ { throughput+=1; } } /* Depart packet with random selection under uniform distribution */ for (chk=0; chk<MAX_IN; chk++) { if (counter[chk]==1) /* there is only one packet depart */ { tmp=controller[chk][0]; pktpass[tmp]+=1; fifo(tmp); }

154

if (counter[chk]>1) //more than one packet for the same output port { RDselect=(int)RDGselect()%counter[chk]; tmp=controller[chk][RDselect]; pktpass[tmp]+=1; fifo(tmp); } } if (loop==(MAX_SIM-1)) { simTimes=simTimes-1; if (simTimes>0) loop=0; } } for (loop=0; loop<MAX_IN; loop++) { totalpass+=pktpass[loop]; totaloss+=loss[loop]; pktinQ+=qptr[loop]+1; mql+=meanQlnth[loop]; } printf("load %f, throughput %f, packet pass %f, packet loss %f, packet in queue %d, total packet %f\n", lambda, throughput, pktpass, loss, (qptr+1), totalpkt); mql=mql/(double)MAX_SIM; mql=mql/(double)MAX_IN; mql=mql/(double)SIM_TIMES; throughput=throughput/(double)(SIM_TIMES*MAX_SIM); throughput=throughput/(double)MAX_IN; throughput=1.0-throughput; celloss=totaloss; celloss=celloss/totalpkt; printf("Throughput --- %f\n", throughput); printf("Mean Queue Length --- %f\n", mql); printf("Cell Loss --- %f\n", celloss); } int main() { int number=0; double load=LAMBDA; for (number=0; number<13; number++) { sim(load); load+=0.01; } return 0; }

155

Appendix D Simulation program for output buffering switch The simulation program implementation to calculate the performance measures for the output buffering switch is given here. The program was compiled with a g++ compiler and run using a Linux operating system. #include <stdio.h> #include <MLCG.h> #include <Binomial.h> #include <Uniform.h> #define MAX_SIM (long int)1000000 #define MAX_QLNTH (long int)30 #define MAX_IN (long int)8 #define LAMBDA (double)0.01 #define SIM_TIMES (long int)1 /* global variable of queue and queue pointer */ int queue[MAX_QLNTH], qptr=0; //inPktct[MAX_IN]; void fifo() { int index=0; if (qptr>0) { for (index=1; index<(qptr+1); index++) { queue[index-1]=queue[index]; } queue[qptr]=-1; qptr-=1; } else { queue[index]=-1; qptr=-1; } } void sim(double lambda) { long int loop, simTimes=SIM_TIMES; double loss=0.0, pktpass=0.0, meanQlnth=0.0; int inloop, inum, Anum=0; int qloss=0, chk, inchk[MAX_IN]; double totalpkt=0.0, num, throughput=0.0; /* initialise each input queues */ for (loop=0; loop<MAX_IN; loop++) { inchk[loop]=0; } for (inloop=0; inloop<MAX_QLNTH; inloop++) { queue[inloop]=-1;

156

qptr=-1; } //create Binomial and Discrete time uniform RNG MLCG outG(0,1); MLCG G(0,1); Binomial TrafficGen(MAX_IN,lambda,&G); Uniform outPortNum(0,(MAX_IN-1),&outG); for (loop=0; loop<MAX_SIM; loop++) { num=TrafficGen(); /* Packet arrive in one time slot and follow binomial distribution */ totalpkt+=num; for (Anum=0; Anum<num; Anum++) /* put packet in output queue */ { inum=(int)outPortNum(); /* packet arrive to a particular queue follow uniform distribution */ if (Anum>0) { for (chk=0; chk<Anum; chk++) { if (inum==inchk[chk]) { inum=(int)outPortNum(); chk=0; } } } inchk[Anum]=inum; qloss=qptr; if (qloss == (MAX_QLNTH-1)) { loss+=1; /* incomming packet is lost */ } else { queue[qloss+1]=inum; qptr+=1; } } /* end queuing packets */ meanQlnth+=(double)(qptr+1); /* packet depart */ if (queue[0]==-1) { throughput+=1; //no packet to send } else { pktpass+=1;

fifo(); } if (loop==(MAX_SIM-1))

157

{ simTimes=simTimes-1; if (simTimes>0) loop=0; } } printf("load %f, throughput %f, packet pass %f, packet loss %f, packet in queue %d, total packet %f\n", lambda, throughput, pktpass, loss, (qptr+1), totalpkt); throughput=throughput/(double)(SIM_TIMES*MAX_SIM); throughput=1.0-throughput; loss=loss/totalpkt; meanQlnth=meanQlnth/(double)MAX_SIM; meanQlnth=meanQlnth/(double)SIM_TIMES; printf("Throughput --- %f\n", throughput); printf("Mean Queue Length --- %f\n", meanQlnth); printf("Cell Loss --- %f\n", loss); } int main() { int number=0; double load=LAMBDA; for (number=0; number<13; number++) { sim(load); load+=0.01; } return 0; }

158

Appendix E Simulation program for proposed switch under

multicast traffic The simulation program implementation to calculate the performance measures for

the proposed switch under multicast traffic is given here. The program was compiled with

a g++ compiler and run using a Linux operating system. #include <stdio.h> #include <MLCG.h> #include <Binomial.h> #include <DiscUnif.h> #define MAX_SIM (long int)1000000 #define MAX_QLNTH (long int)50 #define MAX_IN (long int)8 #define LAMBDA (double)0.01 #define SIM_TIMES (long int)1 #define MULTICAST_LNTH (long int)50 /* global variable of queue and queue pointers */ int queue[MAX_IN][MAX_QLNTH], qptr[MAX_IN]; void fifo(int qinx) { int index=0; if (qptr[qinx]>0) { for (index=1; index<(qptr[qinx]+1); index++) { queue[qinx][index-1]=queue[qinx][index]; } index=qptr[qinx]; queue[qinx][index]=0; qptr[qinx]-=1; } else { queue[qinx][index]=0; qptr[qinx]=-1; //represent an empty queue } } void sim(double lambda) { double totalpkt=0.0, totalpass=0.0, totaloss=0.0; long int loop, simTimes=SIM_TIMES, pktinQ=0, qct=0; double loss[MAX_IN], pktpass[MAX_IN], meanQlnth[MAX_IN]; int goflag=1, inloop, token=0, tpos, inum, Anum=0; int qloss=0, multicast=1, chk, inchk[MAX_IN]; double num, throughput=0.0, celloss=0.0, mql=0.0; /* initialise each input queues */ for (loop=0; loop<MAX_IN; loop++)

159

{ for (inloop=0; inloop<MAX_QLNTH; inloop++) { queue[loop][inloop]=0; } qptr[loop]=-1; //represent empty queue loss[loop]=0; inchk[loop]=0; pktpass[loop]=0; meanQlnth[loop]=0; } //create Binomial and Discrete time uniform RNG MLCG G(0,1); MLCG outG(0,1); Binomial TrafficGen(MAX_IN,lambda,&G); DiscreteUniform outPortNum(0,(MAX_IN-1),&outG); for (loop=0; loop<MAX_SIM; loop++) { num=TrafficGen(); /* Packet arrive in one time slot and follow Binomial distribution */ totalpkt+=num; for (Anum=0; Anum<num; Anum++) /* put packet in its associated queue */ { inum=(int)outPortNum(); /* packet arrive to a particular input queue follow uniform distribution */ if (Anum>0) //In one time slot, there is only one packaget for each input queue { for (chk=0; chk<Anum; chk++) { if (inum==inchk[chk]) { inum=(int)outPortNum(); chk=0; } } } inchk[Anum]=inum; //save the current input queue number for checking duplicated distribution to the same input queue in one time slot qloss=qptr[inum]; //count cell loss if (qloss == (MAX_QLNTH-1)) //queue is full { loss[inum]+=1; /* incoming packet is lost */ } else //put new packet into the input queue { queue[inum][qloss+1]=1; qptr[inum]+=1; } } //start multicast packets generation if ((loop >= multicast*100000)&&(loop <= (multicast*100000+MULTICAST_LNTH)))

160

{ qloss=qptr[1]; //count cell loss if (qloss == (MAX_QLNTH-1)) //queue is full { loss[1]+=1; /* incoming packet is lost */ } else { queue[1][qloss+1]=1; qptr[1]+=1; } totalpkt+=1; if (loop == (multicast*100000+MULTICAST_LNTH)) multicast+=1; } for (qct=0; qct<MAX_IN; qct++) //count average queue length { meanQlnth[qct]+=(double)(qptr[qct]+1); } /* packet depart */ tpos=token; while(queue[tpos][0]==0) { tpos=(tpos+1)%MAX_IN; if (token==tpos) //there is no packaet to send { throughput+=1; goflag=0; break; } } if (goflag==1) { pktpass[tpos]+=1;

fifo(tpos); token=(tpos+1)%MAX_IN; /* token is given to next relay */ } else /* nobody has a packet to send */ { goflag=1; /* put depart flag back for next iteration */ } if (loop==(MAX_SIM-1)) { simTimes=simTimes-1; if (simTimes>0) loop=0; } } for (loop=0; loop<MAX_IN; loop++) {

161

totalpass+=pktpass[loop]; totaloss+=loss[loop]; pktinQ+=qptr[loop]+1; mql+=meanQlnth[loop]; } printf("load %f, throughput %f, packet pass %f, packet loss %f, packet in queue %d, total packet %f\n", lambda, throughput, pktpass, loss, (qptr+1), totalpkt); mql=mql/(MAX_SIM*MAX_IN*SIM_TIMES); throughput=throughput/(MAX_SIM*SIM_TIMES); throughput=1.0-throughput; celloss=totaloss/totalpkt; printf("Throughput --- %f\n", throughput); printf("Mean Queue Length --- %f\n", mql); printf("Cell Loss --- %f\n", celloss); } int main() { int number=0; double load=LAMBDA; for (number=0; number<13; number++) { sim(load); load+=0.01; } return 0; }

162

Appendix F Simulation program for output buffering switch

under multicast traffic The simulation program implementation to calculate the performance measures for

the output buffering switch under multicast traffic is given here. The program was

compiled with a g++ compiler and run using a Linux operating system. #include <stdio.h> #include <MLCG.h> #include <Binomial.h> #include <Uniform.h> #define MAX_SIM (long int)1000000 #define MAX_QLNTH (long int)50 #define MAX_IN (long int)8 #define LAMBDA (double)0.01 #define SIM_TIMES (long int)1 #define MULTICAST_LNTH (long int)50 /* global variable of queue and queue pointer */ int queue[MAX_QLNTH], qptr=0; //inPktct[MAX_IN]; void fifo() { int index=0; if (qptr>0) { for (index=1; index<(qptr+1); index++) { queue[index-1]=queue[index]; } queue[qptr]=-1; qptr-=1; } else { queue[index]=-1; qptr=-1; } } void sim(double lambda) { long int loop, simTimes=SIM_TIMES; double loss=0.0, pktpass=0.0, meanQlnth=0.0; int inloop, multicast=1, inum, Anum=0; int qloss=0, chk, inchk[MAX_IN]; double totalpkt=0.0, num, throughput=0.0; /* initialise each input queues */ for (loop=0; loop<MAX_IN; loop++) { inchk[loop]=0;

163

} for (inloop=0; inloop<MAX_QLNTH; inloop++) { queue[inloop]=-1; qptr=-1; } //create Binomial and Discrete time uniform RNG MLCG outG(0,1); MLCG G(0,1); Binomial TrafficGen(MAX_IN,lambda,&G); Uniform outPortNum(0,(MAX_IN-1),&outG); for (loop=0; loop<MAX_SIM; loop++) { num=TrafficGen(); /* Packet arrive in one time slot and follow binomial distribution */ totalpkt+=num; for (Anum=0; Anum<num; Anum++) /* put packet in output queue */ { inum=(int)outPortNum(); /* packet arrive to a particular queue follow uniform distribution */ if (Anum>0) { for (chk=0; chk<Anum; chk++) { if (inum==inchk[chk]) { inum=(int)outPortNum(); chk=0; } } } inchk[Anum]=inum; qloss=qptr; if (qloss == (MAX_QLNTH-1)) { loss+=1; /* incomming packet is lost */ } else { queue[qloss+1]=inum; qptr+=1; } } /* end queuing packets */ //start to generate multicast packets if ((loop >= multicast*100000)&&(loop <= (multicast*100000+MULTICAST_LNTH))) { qloss=qptr; if (qloss == (MAX_QLNTH-1)) { loss+=1; /* incomming packet is lost */

164

} else { queue[qloss+1]=1; qptr+=1; } totalpkt+=1; if (loop == (multicast*100000+MULTICAST_LNTH)) multicast+=1; } meanQlnth+=(double)(qptr+1); /* packet depart */ if (queue[0]==-1) { throughput+=1; //no packet to send } else { pktpass+=1;

fifo(); } if (loop==(MAX_SIM-1)) { simTimes=simTimes-1; if (simTimes>0) loop=0; } } printf("load %f, throughput %f, packet pass %f, packet loss %f, packet in queue %d, total packet %f\n", lambda, throughput, pktpass, loss, (qptr+1), totalpkt); throughput=throughput/(double)(SIM_TIMES*MAX_SIM); throughput=1.0-throughput; loss=loss/totalpkt; meanQlnth=meanQlnth/(double)MAX_SIM; meanQlnth=meanQlnth/(double)SIM_TIMES; printf("Throughput --- %f\n", throughput); printf("Mean Queue Length --- %f\n", meanQlnth); printf("Cell Loss --- %f\n", loss); } int main() { int number=0; double load=LAMBDA; for (number=0; number<13; number++) { sim(load); load+=0.01; } return 0; }

165

Appendix G Simulation program for proposed switch with priority

queue buffer control under multicast traffic The simulation program implementation to calculate the performance measures for

the proposed switch with priority queue buffer control under multicast traffic is given

here. The program was compiled with a g++ compiler and run using a Linux operating

system. #include <stdio.h> #include <MLCG.h> #include <Binomial.h> #include <DiscUnif.h> #define MAX_SIM (long int)1000000 #define MAX_QLNTH (long int)50 #define MAX_IN (long int)8 #define LAMBDA (double)0.01 #define SIM_TIMES (long int)1 #define MULTICAST_LNTH (long int)50 /* global variable of queue and queue pointers */ int queue[MAX_IN][MAX_QLNTH], qptr[MAX_IN]; void fifo(int qinx) { int index=0; if (qptr[qinx]>0) { for (index=1; index<(qptr[qinx]+1); index++) { queue[qinx][index-1]=queue[qinx][index]; } index=qptr[qinx]; queue[qinx][index]=0; qptr[qinx]-=1; } else { queue[qinx][index]=0; qptr[qinx]=-1; //represent an empty queue } } void sim(double lambda) { double totalpkt=0.0, totalpass=0.0, totaloss=0.0; long int loop, simTimes=SIM_TIMES, pktinQ=0, qct=0; double loss[MAX_IN], pktpass[MAX_IN], meanQlnth[MAX_IN]; int goflag=1, inloop, token=0, tpos, inum, Anum=0; int qloss=0, multicast=1, chk, inchk[MAX_IN], priority=88, priorityCnt=8; double num, throughput=0.0, celloss=0.0, mql=0.0;

166

/* initialise each input queues */ for (loop=0; loop<MAX_IN; loop++) { for (inloop=0; inloop<MAX_QLNTH; inloop++) { queue[loop][inloop]=0; } qptr[loop]=-1; //represent empty queue loss[loop]=0; inchk[loop]=0; pktpass[loop]=0; meanQlnth[loop]=0; } //create Binomial and Discrete time uniform RNG MLCG G(0,1); MLCG outG(0,1); Binomial TrafficGen(MAX_IN,lambda,&G); DiscreteUniform outPortNum(0,(MAX_IN-1),&outG); for (loop=0; loop<MAX_SIM; loop++) { num=TrafficGen(); /* Packet arrive in one time slot and follow Binomial distribution */ totalpkt+=num; for (Anum=0; Anum<num; Anum++) /* put packet in its associated queue */ { inum=(int)outPortNum(); /* packet arrive to a particular input queue follow uniform distribution */ if (Anum>0) //In one time slot, there is only one packaget for each input queue { for (chk=0; chk<Anum; chk++) { if (inum==inchk[chk]) { inum=(int)outPortNum(); chk=0; } } } inchk[Anum]=inum; //save the current input queue number for checking duplicated distribution to the same input queue in one time slot qloss=qptr[inum]; //count cell loss if (qloss == (MAX_QLNTH-1)) //queue is full { loss[inum]+=1; /* incoming packet is lost */ } else //put new packet into the input queue { queue[inum][qloss+1]=1; qptr[inum]+=1; } }

167

//start multicast packets generation if ((loop >= multicast*100000)&&(loop <= (multicast*100000+MULTICAST_LNTH))) { priority = 1; priorityCnt = 8; qloss=qptr[1]; //count cell loss if (qloss == (MAX_QLNTH-1)) //queue is full { loss[1]+=1; /* incoming packet is lost */ } else //put new packet into the input queue, question? need to handle qptr increment bigger than queue length { queue[1][qloss+1]=1; qptr[1]+=1; } totalpkt+=1; if (loop == (multicast*100000+MULTICAST_LNTH)) { multicast+=1; priority = 88; } } for (qct=0; qct<MAX_IN; qct++) //count average queue length { meanQlnth[qct]+=(double)(qptr[qct]+1); } /* packet depart */ tpos=token; while(queue[tpos][0]==0) { tpos=(tpos+1)%MAX_IN; if (token==tpos) //there is no packaet to send { throughput+=1; goflag=0; break; } } if (goflag==1) { pktpass[tpos]+=1; fifo(tpos); if ((tpos == priority)&&(priorityCnt>0)) { token = tpos; priorityCnt-=1; } else token=(tpos+1)%MAX_IN; /* token is given to next relay */ } else /* nobody has a packet to send */

168

{ goflag=1; /* put depart flag back for next iteration */ } if (loop==(MAX_SIM-1)) { simTimes=simTimes-1; if (simTimes>0) loop=0; } } for (loop=0; loop<MAX_IN; loop++) { totalpass+=pktpass[loop]; totaloss+=loss[loop]; pktinQ+=qptr[loop]+1; mql+=meanQlnth[loop]; } printf("load %f, throughput %f, packet pass %f, packet loss %f, packet in queue %d, total packet %f\n", lambda, throughput, pktpass, loss, (qptr+1), totalpkt); mql=mql/(MAX_SIM*MAX_IN*SIM_TIMES); throughput=throughput/(MAX_SIM*SIM_TIMES); throughput=1.0-throughput; celloss=totaloss/totalpkt; printf("Throughput --- %f\n", throughput); printf("Mean Queue Length --- %f\n", mql); printf("Cell Loss --- %f\n", celloss); } int main() { int number=0; double load=LAMBDA; for (number=0; number<13; number++) { sim(load); load+=0.01; } return 0; }

Documents

A high performance ATM switch architecture · 2016. 12. 2. · multicast packets. Unlike the usual design for the multicast ATM switch which concentrates on a cell copy network with