A hierarchical architecture with independent professors for real-time systems

North-Holland 277 Microprocessing and Microprogramming 1 5 (1985) 277-287

A Hierarchical Architecture with Independent Processors for Real-Time Systems

L u c a R o d d a * , R o b e r t o S a v i o n i * a n d

G i a c o m o R. S e c h i * *

Istituto di Fisica Cosmica e Tecnologie Relative, Consigfio Nazionale delle Ricerche, Via Bassini, 15/a, 20133 Milano, Italy

"Real-time applications push computer and programming technology to its limits (and sometimes beyond). A real-time system is expected to monitor simultaneous activities with critical timing constraints continously and reliably. The consequences of system failure can be serious" [7].

Real-Time Systems have characteristics such that they represent the limits of present-day designing capabilities. For this reason, it is necessary to devise system architectures that are specifically designed for real-time applications. To meet the design constraints imposed effectively, it is necessary to use a design method and to carry out a careful analysis of the manu- facturing technology employed.

The architecture proposed here is based on the use of several processors, which make it possible to perform independent functions of the system concurrently. In order to define this architecture, reference was made to the local environment model. The system therefore consists of independent processors that communicate through the exchange of messages. Control on the operation of the whole system is performed by a supervisor, and is based on microprogramming principles.

The architecture described has numerous advantages, both in terms of performance and in terms of flexibility, modularity and diagnosability. The paper includes an application of this architecture to the design of an instrument for the acquisition and on-line pre-processing of data. Lastly, a performance evaluation study was carried out that made it possible to measure both effectiveness and efficiency of the system designed.

Keywords: Real-Time Systems, System Architectures, Multi- processors Architectures, Concurrent Processors, Pipeline Processors, Control Design Styles, Centralized Architectures, Design Methods, Performance Analysis, Microprogramming Applications, Data Processing.

1. Introduction

The distinctive aspects o f real-time systems have been incisively summed up by Brinch Hansen:

* Universit~ degli Studi di Milano; Dipartimento di Fisica ** C.N.R. Milano, IFCTR (IstitUto di Fisica Cosmica e Tecno-

Iogie Relative)

From the point o f view o f designers, a real-time system has the following characteristics [7, 9]:

- A real-time system interacts with the environ-

ment by identifying a large number o f events that occur at high speed. This involves the need to obtain high performances in terms of:

• Response time, which denotes the time between recognition o f an event and its management by the system;

• Data movement rate, which denotes the frequency with which the data are transferred to and from the system.

- A real-time system also has to respond to events that occur asynchronously in the environment . The system must therefore have a peak frequency (determined by lehe minimum time that has to

elapse between two cohsecutive data for them to be able to be distinct). This frequency has to be

much greater than the average. Otherwise, input data might get lost, or output data might not be significant.

- A real-time system has to be validable. That is, it

must be possible to evaluate and prove the correctness o f the system, both at the designing and at the operat ing stage.

- A real-time system must be modifiable. The system must be able to be brought up to date without altering its fundamenta l characteristics.

The critical nature o f what is listed above gives rise to the need to design systems specifically devoted to the solution o f these problems. This paper pre- sents a study of a general-purpose architecture for the designing of real-time data-acquisi t ion and control systems.

278 L. Rodda et aL / HierarchicalArchitecture

2. A Concurrent Architecture for Real-Time Systems

As stated in the introduction, the number of constraints to be met when designing real-time systems is particularly high. These constraints stem from the combination of two distinct classes: constraints relating to the class of real-time systems, and the constraints stemming from specific problems under ex- amination. To ensure that these constraints are complied with, we need a clearly defined design method.

2.1. The Design Method

Definition of a design method involves, above all, examining the concept of "design" itself. Subse- quently, we shall be describing the design aids used to obtain the desired properties.

As regards the first point, we based ourselves on recognition of designing as involving the definition and transformation of models. Since reality cannot be fully described, a model necessarily ignores certain aspects, and is an abstraction of it. By the term "abstract ion", we mean, here, "a simplified description of a system at a given level" [11]. We have therefore chosen to adopt the following definition

of design. By "design" we mean an activity whose objective

is the transition from one description of a given object to another. The first description is an estab- lished fact, and is a high-level description directly connected with the problem to be solved. The point of arrival is represented by a description at a lower level of abstraction. At this level, the object is described in a certain " target" technology, which actually makes it possible to construct it. According to the definition presented, design is, generally speaking, a complex activity. Consequently, various steps have to be taken to achieve the transition from the high-level description to the final one. Each of these steps consists in defining a model.

In this sense, a design step corresponds to a particular level of abstraction. From time to time, con- sideration is given to new implementation details until a description is obtained that expresses every- thing is necessary for the final implementation. To ensure that the design is correct, the method has to

impose constraints on possible choices made by the designer. In this sense, the process of subsequent re- finement has to be controlled by means of constraints that will reduce the number of choices that the designer can make in subdividing the system and in defining its architecture. To render these constraints effective, we introduce the concept of "architectural model".

System architecture is defined as the description of a system that identifies its components and the relationships between them. The main characteristics described by a system architecture are the control structure and the way of communication between subsystems.

By the term "architectural model" we mean an abstraction that, by means of a set of common characteristics, will identify a class of architectures that differ from each other in the specification of a number of details. An architectural model is useful for designing if it is sufficiently general to adapt itself to a significant class of systems. On the other hand, it must also be sufficiently restrictive to bind the designer in such a way that certain correctness characteristics pertaining to the architectural model remain valid when the latter is rendered specific.

The need to meet the constraints of diagnosability and modulari ty and to develop the design according to the design method described, leads to the suggestion that an architectural model based on micro-programming be used [10, 17].

2.2. Hierarchical Architecture

A real-time data-acquisition and control system should, generally speaking, perform a high number of independent functions. These functions must be subject to strict speed constraints in order to be able to respond to events that occur asynchronously in the environment.

In a real-time system based on one or more processors, the events to be managed are associated with a number of processes that must be activated when these events occur. I f another process is already active, then it is necessary to be able to switch from one process to another in good time. Conse- quently, an architecture based on a single processor calls for the definition of complex methods for the synchronization and selection of the functions to be

L. Rodda et aL / Hierarchical Architecture 279

performed. The critical parameters that affect response time

are context switching and latency time. Context switching involves the time and the overhead required for switching from one function to another; latency time is the time interval before switching becomes possible [9]. These two parameters have lower limits due to the need to complete the function in progress, or to save the computat ion status. In addition, the problem becomes practically insoluble if the management of exception is required, or, in general, if it is necessary to manage a high number of consecutive events. In such cases, the existence of a single processor involves a considerable increase in interrupt levels. This renders unacceptable the values assigned to the parameters mentioned for applications of the type dealt with here. The problem can be solved by exploiting the independent nature of the functions to be performed. In this sense, it is necessary to define an architecture with several processors that will make it possible to exploit the degree of concurrency proper to the functions to be performed. In other words, these processors must perform the required functions independently under the control of a processor entirely devoted to the synchronization and activation of those functions. Here, the high speeds required altogether are achieved through the performance in parallel of a

large number of simple functions, without increas- ing the operating speed to the limits imposed by technology.

The concurrent architecture proposed by us is based on the local environment model [3, 4, 16]. In this model, the system consists of a certain number of independent processors that communicate through the exchange of messages. As with the microprogrammed architectural model, there is the distinction between a supervisor and a group of controlled processors (Fig. 1).

The supervisor receives the messages which de- note the status of the other processors and the signals originating from the outside world. In accordance with these messages, the supervisor selects the activations for the controlled sub-systems. The controlled processors receive from the supervisor the control messages that select the function to be performed, and in response send the status messages. The time required to perform a single function may be seen as the time quantum - in the sense of the teory of discrete systems [2] for these systems; thus, the function is always completed before the sub-system is again influenced by outside orders.

Since the operation of the various sub-systems depend only on the activations and on the status signals - synchronized by registers to ensure that

u

Us I COMMANDSI E ~OMMUNICATIONS " I I //////~

/ I I I I r I i

PROCESSOR 1

I I

SUPERVISOR ~ I

t \

\ \ \ r \ \ \ \

s

PROCESSOR 2

E D X E T V E i R c N E A S L

Fig. 1. Hierarchical Architecture.

280 L. Rodda et aL / HierarchicalArchitecture

communication takes place as described in the discrete system m o d e l - effective independence of time, considered as an overall quantity, is obtained. We may use, for each sub-system, the clock rate most suited to its operation and to the conditions imposed by the environment. The advantages offered by this architecture are analogous to those offered by the microprogrammed architecture in terms of speed, diagnosability, modularity and adaptability. These advantages are based on the actual achieve- ment of independence and of concurrency between processors.

The architecture proposed makes it possible to employ operating methods (pipeline or parallel processing), depending on which, on each separate oc- casion, is more suited for solving the problem in question. In this way, it is possible to obtain considerable speed performances, thanks to the possibility of overlapping the functions performed by the various processors [12, 13].

3. An Application

In this section, we present an example of application of the architecture proposed. This example re- fers to the design of a typical instrument for the acquisition and on-line pre-processing of data relating to a scientific experiment or a control problem. The specifications of the problem concern the existence of a certain number of data sources, a highly-vari- able acquisition frequency, and the need to perform a wide variety of data-processing on the set of data acquired.

Two distinct problems may be identified: - The problem of acquiring and pre-processing the

data. In this case, it is necessary to process the data directly in the form in which they are sup- plied by the environment and to adapt to the frequency variations imposed by the environment. It is, therefore, necessary to absorb the irregularities in the temporal distribution of the data, check them for correctness, and transform them into a more abstract and compact form. In this way, we obtain a group of data purged of problems of synchronization with the environment and translated into a form suitable for subsequent processing.

- The problem of performing complex calculations on pre-processed data and of supplying the results of such calculations to the output. The characteristics of this problem are substantially different from those of the previous one. In fact, the data are already pre-processed, and thus freed of the problems connected with their temporal distribution. Consequently, it is important to have considerable power (in terms of computing and memory capability) as well as adequate languages and programming aids. To solve the overall problem proposed, it is

therefore useful to combine a true real-time system with a conventional processing system that has suitable output capabilities. In particular, the real-time system has been designed in accordance with the architecture presented in the previous section. The critical parameter in the design and evaluation of this system is the maximum acquisition frequency, both as an average and in terms of peak values. On the other hand, the response time is not decisive for evaluation of its performances. Consequently, it is particularly worthwhile employing a pipeline operating mode.

In this mode, we obtain the maximum performance in terms of throughput, although penalizing the system's response time to a greater extent. We may identify three main functions: the data acquisition, the pre-processing of all the data acquired, and the transfer of the results to the processing system.

The system was then subdivided into a control sub-system and a group of controlled sub-systems corresponding to these functions. We can then identify the following independent processors: - Supervisor - Input processor - Computing processor - Output processor

Communication between the processors takes place through the sending and receiving of messages (control, status, or data messages). Consequently, each of these processors has to have special I/O devices in order to perform this exchange of messages with the environment or with other processors.

Let us now look at the functions performed by the individual modules and their architecture.

L. Rodda et al. / Hierarchical Architecture 281

@ ®

®

(9

® ®

®

I GI~ N (

RESET

®

1[

tgg .'g S

I G L N E A C L T S I

® %

STATUS OF CONTROLLED PROCESSORS

,STATUS OF EXT. DEVICES

COMMANDS FROM THE USER

INTERNAL SYNCHRONIZATION

O ®

@

RESET ~ ~(~)

I lI__

I [I IF

ADDR.

MICROPROGRAM MEMORY

PIPELINE REG. ]

® @ ®

A D D R E S S C O N T R O L

AOD ,;

q~ A IO ID IR

CONDITION 1 SELECTION I

A.

C.

Co

Co

®

(~ : CONTR. SIGN. TO INT. MODULES

(~) : ACTIVATIONS TO PROCESSORS

(~) ' ACTIVATIONS TO EXT. DEVICES

~) : C O M M U N I C A T I O N S TO THE USEI~

Fig. 2. Architecture of the Supervisor.

3.1. Supervisor

According to the specifications of the hierarchical architecture presented, the supervisor has to select, for each combination of statuses and events, the suitable activations for the various processors. This selection takes place, according to the microprogrammed model, through a simple association of the status with the activations.

The architecture of the supervisor is shown in Fig. 2. Basically, it consists of a microprogrammed control unit whose purpose it is to perform the function illustrated. The micro-instruction is subdivided into four parts: - Activations to the processors. - Activations to external equipment. - Signals for the user. - Control signals to the modules of the supervisor.

Each part is, in its turn, subdivided into fields corresponding to the various modules or to the functions to be activated. The modules that make up the supervisor are: - The synchronization and reset unit. This controls

the acquisition of status messages originating from the other processors and of the requests

coming from the environment. It may also reset these signals or inhibit their acquisition at the request of the microinstruction. The address coding module. This performs the association between one group of status signals, selected in agreement with the microinstruction, and an address of the microprogram memory. The unit selecting the conditions. This enables selection of one of the status signals or of the external inputs, which has to be evaluated individual- ly. The address control module, which determines the address of the next microinstruction.

- The pipeline register, which permits overlapping between the performance of a microinstruction and the fetch of the next one.

3.2. Input Processor

The input processor has to perform the following functions: - Acquire a group of data from outside, controlling

communication with the external equipment in order to obtain a correct transfer. Make the group of data acquired available to the

282 L. Rodda et al. / HierarchicalArchitecture

processor to which it is connected - usually, therefore, to the computing processor. The input processor must permit the acquisition

of data with extremely high peak frequency; on the other hand, transfer to the computing processor must take place in a regular manner under the control of the supervisor. Consequently, the input processor must possess a temporary memorization capacity in order to offset the irregularities in the distribution of the input data. The acquisition and transfer of the data are two separate processor functions. In this way, the processor cannot acquire new data while it transfers data previously acquired.

Consequently, the acquisition frequency depends directly on the duration and conditions of the transfer. This contrasts with one of the constraints imposed: that is, it is not always possible to acquire data that are very close to each other in time. To overcome this problem, the input processor was subdivided into two independent modules. These two modules - input buffer 1 and input buffer 2 - are identical. In this case, while one of the buffers is busy transferring the data, the other can acquire a new group of data. The acquisition frequency is no longer influenced by the status of the processor. Each buffer consists of:

- A microprogrammed control unit. - A data memory with the appropriate circuitry.

- A loading unit for the microprograms. The structure of the microprogrammed control

unit is analogous to that of the supervisor. The controlled part consists of a memory for the data, with the appropriate addressing circuitry and a number of input and output registers. The purpose of the registers is to synchronize the data communications and to permit overlapping between the transfer operations and the memory accesses - that is, between data acquisition and writing in the memory, and between reading from the memory and the transfer of data.

The dimension of the set of data forming a single block varies from a single data item to a maximum number limited only by the dimensions of the memory.

The status messages sent by the input processor to the supervisor represent the combination of the statuses of the two modules, each of which may be in one of the following statuses:

- Buffer Busy (in acquisition): the buffer is busy acquiring a data block

- Buffer Full: the buffer has completed the acquisition of a data block.

- Buffer Busy (in transfer): the buffer is engaged in the transfer of the data block previously acquired. Wait: the buffer is inactive, until a subsequent acquisition request is received.

- Error states: these indicate that some error has been detected in the performance of the function requested. The command messages sent by the supervisor to

the input processor are subdivided in the same way into two parts corresponding to the two buffers, and cause modification of the status of the input processor.

3.3. Output Processor

The output processor has the same function as the input processor. The only differences are due to the fact that it acquires data from the computing processor and transfers them to the external computer, and to the existence of temporal constraints that are less critical compared with those considered for the input processor. Like the input processor, it is bro- ken down into two independent buffers, to permit overlapping between the acquisition and transfer of data.

3.4. Computing Processor

Its function is that of performing a certain group of operations on data in accordance with commands sent by the supervisor. The point of departure of the design of the processing unit is the need for: - High speed in performing the computations re-

quested. - High throughput.

- Independence of the speed on the operation selected and the data processed.

- Easy modification of the operations performed. The only precision limit is imposed by the dimensions of the I/O data. An A L U based on the use of tables for the per-

formance of the computat ions meets all these re- quirements. Thus, it is possible to precalculate the

L. Rodda et al. / Hierarchical Archi tecture 283

/

LOADING

U N I T

11 E

IDATA ~C.

I

\ / ,OAD

SIGNALS

I I

}~ M E M O R Y

I

i I '

I

I i

!

i_

S U P E R V I S O R / ~ \

TS C O N T R O L U N I T , IA

/ ..L ~. bONT,OIp~l~. ~ .Iy - - ~ I ~ A D O R E S S I I F" I ki J.lC--- 1

LADDRESS

I I [P, PEL,NE REG,STER I I~ I~ I q-I I

I

@

C O N T , 0 , ' I I l STATUS

A L U MEMORY

TABLES 2

i I I ! ~

..s.AK !~:1 IOATA

t

Fig. 3, Architecture of the Computing Processor.

values needed as results of the operations and then

to memorize them in a table. To perform a calculation, we only have to compute an address for the

table. Any unary or binary function can be obtained in

the same way and calls for the same time to be car-

ried out, however complex it may be. The dimension required for the table is determined only by the

dimension of the data acquired and transferred from the processor. It does not depend on the operation selected. The computing processor consists of a microprogrammed control unit, a controlled part, and a loading unit for the microprograms and the tables (Fig. 3). The function of the loading unit is to control the loading of the microprograms and of

the ALU tables by superintending the transfer of data f¥om the outside to the internal memory of the processor.

The microprogrammed control unit has the same structure as that presented for the other processors.

The controlled part consists of:

- Two data memories: one for the input data that have not yet been processed, and the other for the already processed output data.

- Some synchronization registers for the input and output data.

- Some pipeline registers to superimpose the com-

putation and reading and writing of the data in the memories.

- The ALU. This consists of two memories to store

the tables: a 256 × 16-bit memory for the unary functions and a 64k × 16-bit memory for the binary functions (the results of the computations are 16-bit values).

The performance of a computation consists simply

in reading one of the two memories, using the data item or the two data items - as addresses. It calls for only one clock cycle, whatever the complexity of the calculation. Thanks to the pipeline registers, the

result of the previous computation can be written in the memory, while a computation is being carried out. In this way, only a few operations have to be

284 L. Rodda et al. / Hierarchical Archi tecture

performed sequentially within a single cycle, and the duration of the clock cycle can be reduced. In addition, there is a pipelining of the computation, and one data item in each cycle can be processed.

The statuses that the computing processor can assume are as follows:

Cpu busy: the processor is busy and is not available for the acquisition of a new command.

- Wait: the processor is ready to acquire a new group of data from the input processor.

- Cpu ready: the processor is ready to process the group of data acquired.

- Cpu full: the processor has completed the processing of the group of data and is ready to transfer it to the output processor.

4. Conclusions

When our work was completed, we subjected the instrument described in the last section to a process of performance evaluation [8, 15]. This process was applied in order to ascertain whether or not the instrument responded to the performance constraints previously imposed. To this end, the basic perfor-

mance of the system was evaluated quantitatively in terms of:

Effectiveness or external performance measurements,

- Efficiency measurements. To evaluate the system, a detailed functional

model was defined. In this model, the processors are described in terms of functional units corresponding to the real electronic components. The logical characteristics of these components, described by the appropriated data sheets, were reproduced. In addition, the dependence of their operation on time, in terms of synchronization with the different clock phases, was described. Performance was measured b y counting the clock cycles required to perform the various functions of the system [! 4].

The model described was used by us with positive results in our design activity, and was in practice checked by comparing the results expected by it with those actually obtained from the am2900 E&LK board of the AMD, described in [1]. In order to obtain the necessary measurements, the model was used to set up a program simulating the system. Fig. 4 gives an example of the outputs obtained. The configuration used for the tests that follow

-*- Ciclo di clock : 661 Inp. Proc. IAcq.busy Inp. Proc. IFu]l Out. Proc. ITr~n.busy

-~- Ciclo di clock : 662 Inp. Proc. IAcq.busy Inp. Proc. IFull Out. Proc. ITran.busy Inp. Proc.2Acq.act.

-*- Ciclo d~ clock : 663 Inp. Proc. IAcq.busy Inp. Proc. IFull Out. Proc. ITran.busy Inp. Proc. ITren.act. Comp. Proc. Acq.act.

-*- Ciclo di clock : 664 Inp. Proc. IAcq.busy Inp. Proc. IFull Inp. Prcc.2Acq.bus~ Out. Proc. ITran.busy

-~- Ciclo di clock : 665 Inp. Proc. ITran.busy Inp. Proc.2Acq.busy Out. Proc. ITran.busy

-*- Ciclo di clock : 666 Inp. Proc. ITran.busy Inp. Proc.2Acq.busy Out. Proc. ITran.busy

-*- Ciclo di clock : 667 Inp. Proc. ITran.busy Inp. Proc.2Acq.busy Out. Proc. ITran.busy

Data ready 3 : I Data acq. 3 : 0 Datum 3 : 0011010110101111

Comp. Proc. Acq.busy



Fig. 4. Example of the Outputs of Simulation.

L. Rodda et al. / Hierarchical Architecture 285

provides for the acquisition of data pairs originating in parallel from two measuring instruments.

The parameters measured were the following:

A. Effectiveness Measurement 1. System throughput in terms of data pairs ac-

quired and processed in the time unit. 2. Maximum peak frequency absorbed by the sys-

tem.

3. Latency time: in other words, the time interval between data-ready signal and acquisition of the data pair.

B. Efficiency Measurements I. Utilization factors of processors: the percentage

of time during which the processors are active. 2. Efficiency of pipeline: this makes it possible to

evaluate the efficiency of the overall system, with particular reference to the balance of the pipeline architecture.

To carry out the performance evaluation process, some tens of tests were done with different work- loads. In particular, we evaluated the dependence of the parameters measured on the average frequency of the data acquired and on the dimensions of the group of data acquired in each block. In addition, we studied the influence on the measurements of the irregularities in the temporal data distribution.

Here we give the more significant results obtained. These results are organized, within the various sections, in tables and analytical functions.

4.1. EfJectiveness Measurement

4.1.1. System Throughput The value of this parameter is naturally equal to the average frequency of the incoming data, as long as all the data are processed by the system without los- ing any of them. Particular interest attaches to the maximum value of this parameter (or system capability), which is equal to the maximum value of the average acquisition frequency. This quantity depends on the dimensions of the set of data acquired in an individual block (Table 1). The analytical for- mula deduced from the data collected is as follows:

/i,~,,(d) = (15 + 24/d) --I clock cycles i

Table 1

Max imum Acquis i t ion Frequency

d f~a,(d) Dimension of (1)

Data Block

1 data item 2.56 x 10 2

5 data items 5.05 x 10 2

lOda ta i t ems 5.75 x 102

lO0da ta i t ems 6.56 x 10 2

lO00da ta i t ems 6.66 x 102

~_ 6.67 x 10 2

Corresponding Values

Clock - Clock =

4 MHz 8 MHz

103kHz 205 kHz

202 kHz 404 kHz

230 kHz 460 kHz

262 kHz 525 kHz

266 kHz 532 kHz

267 kHz 533 kHz Note: 1 ) Maximum frequency is expressed in clock cycles 1

(in which: d = dimension of the data block). The frequency increases with the value of d and very rapidly approaches the maximum value:

fn~x(oo) = 1/15 clock cycles 1.

This value corresponds to an acquisition frequency of 267 kHz (using an internal clock rate of 4 MHz) or 533 kHz (clock rate = 8 MHz).

The maximum acquisition frequency value corresponds to data blocks of infinite size and is determined by the time required by the computing processor to process a single data item. This value is, however, closely approximated to, even for fairly limited data-block dimensions.

4.1.2. Maximum Peak Frequency The minimum distance between two data pairs for purposes of correct a~quisition turned out to be equal to 4 clock cycles. Consequently, the peak frequency is 1 MHz (clock rate = 4 MHz) or 2 MHz (clock rate = 8 MHz). The quantity of data that can be acquired at this frequency is equal to twice the dimension (d) of the data block.

4.1.3. Latency Time Between data-ready signal and actual data acquisition there is normally an interval of 1 clock cycle (250 ns or 125 ns). The maximum value registered for this parameter is 4 cycles (1 its or 500 ns), during status changes in the input processor.

4.2. Utilization Factors

We define here some of the symbols used hereunder to compute the efficiency measurements [5].

286 L. Rodda et al. / HierarchicalArchitecture

f = average frequency of data acquired.

S(i) = average processing time of a data pair by the i-th processor.

U(i,f) = utilization factor of the i-th processor. This is equal to the percentage of time during which it is active and depends onf . Consequently, we have:

U(i,f) = f x S(i).

We associate with the index l the input processor; with the index 2, the computing processor; and with the index 3, the output processor. For each processor, a measurement was made of S(i), which has found to be constant; the utilization factor U(i) was then calculated.

4.2.1. Computing Processor With regard to this processor, we have:

s(2) = 15 clock cycles.

Consequently:

U(2 , f ) = 15 x f

The limit utilization value is naturally reached at the maximum acquisition frequency fmax (oo). This value is equal to 1 and corresponds to total processor utilization.

Table 2 gives the utilization factor values corresponding to the maximum frequencies obtainable for different data block dimensions.

4.2.2. Input and Output Processors In this case, we measured

S(1) -- S(3) = 8 clock cycles.

Consequently:

U(1 , f ) = U(3 , f ) = 8 × f

The limit value is

U(1,fmax(~)) = 0.53.

4.2.3. Supervisor As regards the supervisor, the utilization factor is difficult to define and is, moreover, a parameter of little significance. In fact, the supervisor always performs the same function. Consequently, it is always

Table 2 Utilization Factors

d U(2, fmax(d)) U(1,fmax(d)) = Dimension of U(3, fmax(d))

Data Block

1 data item 0.38 0.21 5 data items 0.76 0.40

10 data items 0.86 0.46 100 data items 0.98 0.52

1000 data items 0.998 0.53 oc 1.00 0.53

active and its utilization factor is, by definition, 1. However, it is important to stress its intervention

does not significantly reduce the performance of the system as a whole. The supervisor response time in- fluences the maximum acquisition frequency and the utilization factors of the different processors.

But this influence turns out to be proportional to lid and, as we have seen, is for practical purposes cancelled out in respect of fairly small data block dimensions.

4.2.4. Pipeline Efficiency Let us use the symbols given above and assume that

N = number of data pairs processed.

Pipeline efficiency is defined as follows [6, 13]:

U~,i=LmS(i) E(N) - m(~.,=t.,, S ( i ) + ( N - 1) x S(k))

in which m = total number of processors, and S(k) = Max(S(/)), i = 1,m.

The limit value E(oo) corresponds to processing steady state. In our case, we have:

Table 3 Pipeline Efficiency

N E(N) Number of data processed Pipeline Efficiency

1 data item 0.33 5 data items 0.57

10 data items 0.62 100 data items 0.68

1000 data items 0.688 oc 0.689

L. Rodda et aL / Hierarchical Architecture 287

E ( N ) =

E ( N ) =

N ~,= ,,~ S(i)

3 ( ~ i : , . ~ S ( i ) + ( N - l ) x S(2))

3 1 x N 3(31+(N-- 1) x 15)

Table 3 gives the pipeline efficiency values corresponding to different values of N.

References

[1 ]M. Annunziata, V. Nesci and G.R. Sechi, A Tool for Studying Microprogramming, Sigmicro Newsletter, VoI. 15, N.1 (March 1984) 20 25.

[2l M.A. Arbib, Theory of Abstract Automata (Prentice Hall, Englewood Cliffs, 1 969).

[3] M. Boari, A. Fantechi, A. Natali and M. Vanneschi, Mo- delli ad ambiente locale per la cooperazione tra processi: comunicazione e sincronizzazione, Rivista di Informatica, vol. XII, N.1-2 (1982) 5-38.

[4] R.E. Bryant and J.B. Dennis, Concurrent Programming (M.I.T. Technical Report, Cambridge, Massachussets, 1979).

[5] J.P. Buzen, Fundamental laws of Computer System Per- formance, in: P.P.S. Chen and M. Franklin, Eds., Pro- ceedings of the International Symposium on Computer Performances Modelling, Measurement and Evaluation (1976) 200 210.

[6] T.Ch Chen, Parallelism, Pipelining, and Computer Effi- ciency, Computer Design, 10, 1 (January 1971) 69 74.

[7] P.B. Hansen, Distributed Processes: A Concurrent Pro- gramming Concept, Communication of the ACM, vol. 21, N11 (November 1 978) 934-941.

[8] H. Hellerman and Th.F. Conroy, Computer Systems Per- formance (McGraw-Hill, New York, 1 975).

[9] H.J. Hindin and W.B. Rauch-Hindin, Real Time Systems,

Electronic Design (January 6, 1 983) 288 31 8. [1 0] S.S. Husson, Microprogramming Principles and Practice

(Prentice Hall, Englewood Cliffs, 1970). [11] JF. Isner, A Fortran Programming Methodology based

on Data Abstractions, Communications of the ACM, vol. 25, N.10 (October 1982) 686-697.

[12] S.P. Kartashev and S.I. Kartashev, Eds., Designing and Programming Modern Computers and Systems (Pren- tice Hall, Englewood Cliffs, 1 982).

[13] C.V. Ramamoorthy and H.F. Ly, Pipeline Architecture, ACM Computing Surveys, 9, 1 (March 1977) 61-102

[14] L. Rodda and G.R. Sechi, Simulation as a Design Tool, Microprocessing and Microprogramming, Vol. 1 5, No. 4 (April 1 985).

[15]L. Svobodova, Computer Performance Measurements and Evaluation Methods: Analysis and Applications (El- sevier, New York, 1 976).

[16] L. Svobodova, B. Liskov and D. Clark, Distribuited Com- puter Systems: Structure and Semantics (M.I.T. Techni- cal Report, Cambridge, Massachussets, 1 980).

[17] M.V. Wilkes, The best way to design an automatic calcu- lating machine, Report of the Manchester University Computer Inaugural Conference, Manchester, England (July 1951) (reprinted in: E.E. Swartzlander, Ed,, Com- puter Design Development, Principal Papers (Hayden Book, 1 976)).

Luca Rodda was born in Milan, Italy in 1960. He is currently attending a graduation course in Physics at the University of Milan. He is filing his thesis at Istituto di Fisica Cosmica e Tecnologie Relative of C.N.R. in Milan. His current interests include modelling and simulation of discrete systems, and design methodologies for electronic circuits. He is a student member of ACM, IEEE, and the Society for Computer Simula~ tion.

Rober to Savioni was born in Milan, Italy in 1958. He received his degree in Physics from University of Milan in 1 984. He graduated filing a thesis concerning a microprogrammed control architecture for applications in physics. His current interests include the study and development o f microprogrammed architectures for special-purpose systems.

G iacomo R. $echi was born in Melzo (MI), Italy, in 1943. He received his degree in Physics from University of Milan in 1968. He is a researcher in Computer Science at Istituto di Fisica Cosmica e Tecnologie Relative of C.N.R. (National Re- search Council) in Milan. His current interests include computer architectures, formal languages and theory of algo- rithms, design methodologies for reliable systems and Computer-Aided Design. He is a member of Euromicro, ACM, IEEE and SCS.

Documents

A hierarchical architecture with independent professors for real-time systems