Cross-bar switch multiple microprocessor system

Cross-bar switch multiple microprocessor system Barry Wilkinson and Hamid Abachi describe a cross-bar switch multiple microprocessor system constructed as a prototype research vehicle

The main objective o f this work is to design and construct a small multiple microprocessor system based on the cross- bar switch architecture. The motives behind this are two- fold," firstly there seems to be great potential in the architecture, and secondly, few previous constructional projects employing this architecture have been reported. A system is presented employing three ZSO microprocessors in a master-slave configuration with a floppy disc based support system. The various design decisions are outlined, and an estimate is made o f the reliability o f the hardware. This work, in particular, has highlighted the flexibility and

simplicity o f the architecture, in addition to the obvious potential o f high speed o f operation. It is also found that parallel programs can easily map onto the system, though this is not dealt with here.

multimicroprocessors cross-bar switch

A computer system containing a number of microprocessor devices has become an attractive proposition because of the prospect of increased speed of operation, flexibility, greater reliability and fault tolerance. This had led to a number of computer architectures being proposed and a limited number of systems being constructed. The first notable system in the UK was the CYBA-M computer system now at UMIST, Manchester. This employed a single multi-port memory architecture and 16 8080-type microprocessors. An alternative structure, favoured by microprocessor manufacturers, is the single time-shared bus architecture. A number of standard buses have been devised capable of supporting multiple microprocessors. Examples are Multibus from Intel, Z-bus from Zilog and Versa-bus from Motorola. This paper is concerned with a third type of structure, known as a cross-bar switch architecture, and presents the details of a prototype system constructed.

The common feature of all these systems is that they are 'tightly coupled'; the processors are physically very close and the electrical interconnection is such to allow a high degree of interaction between processes at high speed. Com- plete computers interconnected through relatively slow serial links belong to a different class of distributed computer systems. The objectives of such systems are markedly different from closely-coupled systems and are often concerned with resource sharing and multi-user operation. In contrast, the tightly-coupled composite multiple microprocessor systems are also concerned with raw speed of operation and efficiency in a single-user environ-

Department of Electrical and Electronic Engineering, University College, Newport Road, Cardiff CF2 1TA, UK

ment. Additional factors are reliability, the nature of the computation and whether it naturally maps onto a multiple processor system.

C R O S S - B A R S W I T C H A R C H I T E C T U R E

In a cross-bar switch system, there are a number of processors and a number of memory modules. The connection between processors and memory modules is provided through electronic switches, usually arranged in a matrix form. At any instant, each processor may connect to a different memory module. One of the most important hardware characteristics of this architecture is the simultaneous transmission of data or instructions between processors and their selected memory modules. The system can also be re-configured to suit the tasks at hand. It may be possible to select more than one memory module for one processor thus expanding the immediate memory space. For this it is necessary for each memory module to use different addresses. Alternatively, memory modules can be given the same addresses which allows simultaneous copying of information into several memory modules.

S Y S T E M D E S I G N

The primary research objective of this work is to design and construct a multiple microprocessor system based on the cross-bar switch architecture and to investigate the inherent features of this system.

Firstly, decisions need to be made regarding the general operation and features to be incorporated. The system can be arranged into one of the following types1'2:

• master-slave • with separate execute for each processor • with symmetrical or anonymous treatment of all

processors.

The classical 'master-slave' approach is chosen here because of its hardware simplicity, ease of implementing the operating system and user software. All the cross-bar switches are controlled by one processor, the master processor. This processor also has responsibility for overall control of the system and delegating tasks to the slave processors. This implies that parallel processes are assigned statically to particular processors. Therefore, process and processor become inextricably linked.

The intention is to provide a hardware research vehicle of the minimum size consistent with investigating reasonable parallel computations. A two-processor system would not be capable of parallel computation with significant interaction between processes, especially as one processor is responsible for overall control in addition to any processing it may undertake. Consequently, this leads to a three-

vol 7 no 2 march 1983 0141-9331/83/100075-05 f~)3.00 © 1983 Butterworth & Co (Publishers) Ltd 75

processor system as the minimum size which is duly selected. The processor chosen is the Z80 microprocessor because good hardware and software support is available within the department of Electrical and Electronic Engineering at University College, Cardiff.

The minimum number of shared memory modules in a three-processor system is three, which allows three active processors to use different memory modules simultaneously. This necessitates a 3 x 3 cross-bar switch, or nine bus switches controlled by the processor designated as the master processor. It is clear that each of the processors must also be provided with local read/write memory, permanent bootstrap program memory, and input/output devices.

Input/output devices could be considered as a shared resource in the same position as the shared memory modules. This would be flexible but would increase the switch matrix size to at least 3 x 4. An alternative is to dedicate input/output to the master processor. This would lead to the traditional operating system concept of all input/output being controlled by the supervisory or master process(or). Here, input/output is provided locally on all processors so that a particular input/output device becomes inextricably linked with a particular processor. This eases testability of the processor systems as they become self supporting.

Direct control of the slave processors by the master, and communication between processors needs to be provided. The former can be achieved by programmed output circuitry controlling important inputs of the slaves, such as reset, halt and interrupt. Additionally, programmed input circuitry can be used to examine important status conditions of the slave processors and also the address lines. For slave- to-master communication, a dedicated interrupt mechanism is chosen for signaling from the slave to the master. Finally, it is anticipated that locations in the memory modules will be used for information transfers.

For program development and general support, a floppy disc based system is necessary. This will be called the support system.

S Y S T E M D E S C R I P T I O N

The overall block diagram is shown in Figure 1. One processor with local read/write memory, EPROM program

memory and input/output interfaces constitutes a complete microprocessor system. No internal signal buffering is necessary for this system and none is provided. There are three systems; the master and two slaves. The support system connects to the master system via a serial port on the master system. This is adequate as high speed transfers are not required. Programs are compiled or assembled on the support system and then loaded down-line through the master processor into the memory modules. A simple program is resident in the master processor bootstrap EPROM memory for this purpose. Also attached to the master processor are two PIO (programmed input/output) devices, an interrupt controller for master-slave control, and one PIO device to control the cross-bar switches. Each PIO device contains two 8-bit bidirectional ports. The allocation of the bits of each port is given in Table 1.

To connect the master and slave processor systems to the cross-bar switch, buffering is provided between the systems and the switches. The bus signals applied to the switches are as follows:

• 8 hi-directional data lines • 16 unidirectional address lines • 2 read/write control (MRD and MWR)

There is a further switch select signal applied to individual switches.

The cross-bar switch employs single-level switching, For a 3 x 3 matrix, multi-level switching is inappropriate. The switch design is significant because the number of lines and number of switch units necessitates a very low cost solution. There are three possibilities:

• TTL tristate gates • Open collector TTL gates • Analogue transmission switches

Analogue transmission switches have the advantage of being intrinsically bi-directional. Also, there is the possibility that a faulty open-circuit switch can be electrically replaced by three other switches in the matrix to connect between a selected processor and selected memory. However, the least expensive solution is to use open collector gates, such as SN74LS03, and this is the method adopted. For bi-directional lines, two open collector gates are connected back-to- back for each signal line. Each line is terminated with a low

I Support system

I S~Q~ I

J J J

II !1 Slave control Switch control

Slave 2

Figure 1. Block diagram of cross-bar switch system

Buffer

, xg Buffer

Buffer

D

request circuits ©

E-1

E:I

Cro~ -bar switches [-] ....

E]

E-1

E-1

E-1

76 microprocessors and microsystems

value pull-up resistor. For the processor termination, the value is 820 ~Q and for the memory termination, the value is 470 ~, the lowest values consistent with proper operation. Experiments have shown that these values are quite adequate for medium speed microprocessors. An alternative method would be to use active terminators. The calculated worstcase delay experienced by signals between a processor and a memory module is 57.5 ns for data signals in either direction and 69.5 ns for address signals, once selection is made.

The shared memory modules are buffered 4K byte units. Each module has a selectable address decode circuit. Norm- ally, each module is given non-overlapping addresses but the option of identical addresses is available. The local memory of each processor starts at address 000(H), the normal reset location and may extend to 2000(H). The three shared memory modules are normally given start addresses above this.

There are many possible slave control algorithms and hardware arrangements within the present architecture and components. One is to initialize the slave processors for mode 1 interrupts and to use unconditional direct jump instructions. In mode 1 interrupts, upon receipt of an interrupt, the processor fetches the next instruction from location 0038(H) in bootstrap memory. The interrupt line of each slave is driven by an output of a PIO device (Table 1) which allows the master to cause a slave to fetch the instruction in location 38 at will. An unconditional jump instruction in this location can redirect control tO a pre-

Table 1. Digital circuits

Device Number of Number used ~kp Total Xp gate in the system (failure

rate)

7425 2 2 1.56

74LS27 3 3 1.81

74LS32 74LS03 4 90 2.02 74LS00 75LS150

74LS04 74LS05 6 20 2.35 7408

75LS152 8 3 2.62

74LS244 10 21 2.85

74LS138 16 6 3.55

74LS107 17 2 3.55

74LS139 74LS245 18 7 3.57

74LS123 23 2 3.59

74LS75 24 2 3.93

74LS85 31 6 4.6

Z-80 CPU 2200 3 135.99

PIO 2200* 3 135.99

AM8519 2200* 1 135.99

UART 2200* 3 135.99

*Worstcase estima~s, actualnumbernotknown

3.12

5.43

181.80

47.00

7.86

59.85

21.30

7.10

24.99

7.18

7.86

27.6

407.97

407.97

135.99

407.97

Slave memory oddress

AI4 Ai3 - - A~2 - -

Monostoble

74LS75 = - - ~ ID E IQ 2D

• -30 3Q" '

r--4D 4

M-e I L

Monoetoble

gmn~t

ao

Interrupt I A3 A=E ~ request

~l Hold slave

Bz 485

i E> N

Figure 2. Slave interrupt request circuitry

determined location in the memory modules. Each memory module would be given identical start addresses (say 9000H) and the unconditional jump from location 38 to this start address. The actual memory module would be selected previously by the master. More complicated mechanisms are possible.

The interrupt circuitry between the master processoF and slave processors uses an AM9519 interrupt controller which provides priority capability. Slave processors generate interrupt requests for this device, which in turn produces a generalized interrupt request to the master processor. Circuitry is provided with each slave processor to generate a slave interrupt request whenever a memory address is produced by the slave processor which is outside the current memory module range. A simple jump, intentional or other- wise, to outside the current address space will cause this. Typically, a jump instruction would be placed at the end of slave processes.

Each slave interrupt request circuit is located after the buffer circuit. The implementation chosen is shown in Figure 2. When a memory address is produced during a memory access cycle, the most significant 4-bits of the address are compared with 4-bits stored in a register. If they are the same, no action is taken. If any of the address bits is different to the stored address bits, a slave interrupt request is generated. The subsequent interrupt request granted signal from the master processor causes the memory address bits to be stored. This will then be used for the next comparison, and an interrupt request signal will be generated only when an invalid address is introduced.

Immediately a slave request is generated, a signal, 'hold slave', causes the slave to enter wait cycles. This persists until an interrupt request granted signal is received from the master processor, whereupon the wait cycles are terminated and processing may continue. The invalid address causing the interrupt is available to the master processor through one of the PIO devices.

C O N S T R U C T I O N A view of the complete system is shown in Figure 3. The system is constructed to enable experiments to be con- ducted, signals to be monitored, and hardware changes to be made where necessary, as would be expected of a prototype system. The major functional units are placed on separate printed circuit boards, and the placement of the

vol 7 no 2 march 1983 77

Table 2. Memory components

Device Number of Number used klt Total kp bits in system

RAM 1024 6 I 1.1 66.6 2111

EPROM 8192 3 12.00 36 2708

RAM 4098 24 26.7 640.8 2114

Figure 3. View of system

printed circuit boards closely corresponds to the system block diagram.

Each microprocessor system is constructed on four small printed circuit boards of dimensions 3.5 x 4 x IA6 in (25.4 mm = I in). One board holds the processor and its own independent crystal-controlled clock, one holds read/ write memory, one holds EPROM program memory and one holds a complete asynchronous serial interface. This particular microprocessor system has lead to a computer system used within the Department for hardware and software teaching. It has the merit that the few hardware components are easy to recognize and to understand. It is also a very convenient test-bed for undergraduate projects. Finally, it is inexpensive.

Each of these systems connect to other circuitry, the interrupt circuitry, and buffer gates. The signal lines from the buffer gates run into and out of the cross-bar switch at 90 ° to each other on vero-board strips, with direct printed circuit board connectors mounted in a 3 x 3 matrix for the switch boards. This proved to be convenient for commission- ing and very easy to discuss during demonstrations to visitors. However, the overall construction was inadequate. The principal problem is the large number of interconnections in the switch matrix, which grows exponentially as the size of the system is expanded. Even a minimum 3 x 3 system with 26 inputs and 26 outputs plus a switch select line to each switch unit, requires 477 connections in the cross-bar switch. The original construction shown here with direct gold-plated printed circuit board connectors throughout proved to be unsatisfactory in a prototype system subject to constant experimentation, as the connectors soon become unreliable. These direct connectors were replaced by the type DIN41612 indirect connectors in the teaching computer system. This has proved totally satisfactory and would be adopted for any future development of the multiple microprocessor system.

HARDWARE RE LIABI LITY An estimate of the reliability of the system has been made. Tables 2, 3 and 4 give the failure rates of the various components used int he system calculated using the equations and data contained in the USA Military Handbook on Reliability of Electronic Equipment 3 . Assuming a series configuration, the worst case failure rate of the complete system is obtained by the summation of the various components. In the case, the system failure rate is 1732/106 h or a mean time between failures (MTBF) of roughly 600 h. Connectors have been ignored. The MTBF is the mean time between successive failures of a group of

Table 3. Analogue circuit

Device Number of Number of Failure Total Xp transistors devices used rate

in the system

Timer 555 23 3 4.52 13.56

Table 4. Programmed input/output device allocation

Input/ Device 1 Device 2 Device 3 Output outputs inputs outputs

A0 Slave I halt

A~ Slave I Slave 2 halt address A12-AIs

A2 A3

A4

As

A0

A7

Bo BI

82

B3 B4 Bs B6 B;

cross-bar switch control

Slave 2 address A12-Als

Slave 1 request granted

Slave 2 request granted

Slave 1 interrupt

Slave 2 interrupt

components. It gives an indication of the time of the first failure of the system and subsequent failures. The proba- bility of a constant failure rate system not having a single failure during the first operating period equal to the MTBF is 37 per cent 4 . Therefore a system is fairly likely to fail by the end of this period, or in our case within 600 h (25 days) if run continuously.

This would probably be unacceptable to the user. How-

78 microprocessors and microsystems

ever the failure may be any component of the system and one of the features of the system is that, given proper software and hardware support, the system should continue to operate irrespective of certain hardware failures. These tolerated failures generally result in a reduction of com- puting power and a decrease in operating speed. In the described system, part of the system must function correctly for the overall system to function correctly because of the master-slave architecture.

If we consider the system as composed of two parts, A and B, such that the system will continue to function, though slower, if B fails but not if A fails, then the failure rate of the system is given by the failure rate of A only. The essential part of our system could be regarded as the master processor, associated local memory and I/O parts and the shared memory together with the three bus switches connecting the shared memory to the master processor. The failure rate of this, the A part, and thus of the system is roughly 1100/106 h or a MTBF of 900 h (38 days). It may be that the system can degrade further with only one shared memory operating. In this case the failure rate of the system is roughly 500/106 h or a MTBF of 1900 h (80 days).

Finally, a considerable improvement can be made by employing military versions of components rather than commercial products used here. Military versions have failure rates improved by a factor of 300, giving the same improvement in the system failure rate. With military components, the system failure rate becomes 5.77/106 h or a MTBF of 173 000 h (20 years) for a system not tolerant of any failures.

Examining the failure rates of the subsystems gives an indication of the effects of increased system size on the failure rate and the likely failures. The switch unit has a failure rate equal to approximately one eighth of the failure of a single Z80 central processor system (processor, RAM, EPROM and I/O). For the described 3 x 3 system, the failure rate of the total switch matrix is slightly greater than that of one processor system. For a 4 x 4 system, the failure rate of four processor systems is double the failure rate of the sixteen switches. However, if the system is expanded, the number of switch units increases much more rapidly than the number of processor systems and a significant point is reached when the failure rate of the total switch matrix is equal to the failure of all the processor systems together. This occurs for an 8 x 8 system, where

the failure rate of the eight central processor systems is equal to the failure rate of the 64 switches. For a 16 x 16 system, the failure rate of the 16 processor systems is half the failure rate of the 256 switches.

If a switch fails, the effect on the rest of the system depends upon the mode of the failure, There are three possible output failure modes, open circuit, short circuit to 0V or permanent '0' output and short circuit to +5V. The last is most unlikely in an open-collector gate. A final possibility is an intermediate output level. If the output is open circuit, then all other functional units can still operate but the connection between the processor and memory is lost. This is the situation where analogue switches can be used to advantage, by providing other paths between the processor and memory. For the other faults, output short circuit to OV or a permanent '0', and output short circuit to 5 V, the associated memory module cannot be used by any processor. Input short-circuit faults similarly affect the processor.

CONCLUSIONS A cross-bar switch multiple microprocessor system has been designed and constructed. It has demonstrated that the architecture should not be dismissed as too expensive or unsuitable. On the contrary, the innate simplicity, flexibil ity, ease of construction and potential high speed of operation offered gives this architecture some distinct advantages. As the size of the system is expanded, hardware failures are more likely. However, it has been shown that the failure rate of the switch matrix dominates only in large systems.

REFERENCES 1 Abachi, H R 'Aspects of the design of a multimicro-

processor system', PhD thesis University College Cardiff (1981)

2 Enslow, P H Mu/tiprocessor and Parallel Processing J. Wiley and Sons, New York, (1974)

3 Military Handbook on Reliability Prediction of Elec- tronic Equipment USA report M IL-HDBK-217B, September, (1978)

4 Aspinall, D and Dagless, E 'Overview of a development environment' Microprocessor and Microsystems Vol 3 No 6 (1979) pp 301-305

vol 7 no 2 march 1983 79

Documents

Cross-bar switch multiple microprocessor system