Design of Embedded Systems: Formal Models, Validation, and

Design of Embedded Systems: FormalModels, Validation, and Synthesis

S. Edwards, L. Lavagno, E. A. Lee, and A. Sangiovanni-Vincentelli

November 5, 1999

Abstract

This paper addresses the design of reactive real-time embedded system-s. Such systems are often heterogeneous in implementation technologiesand design styles, for example by combining hardware ASICs with embed-ded software. The concurrent design process for such embedded systemsinvolves solving the specification, validation, and synthesis problems. Wereview the variety of approaches to these problems that have been taken.

1 Introduction

Reactive real-time embedded systems are pervasive in the electronics system in-dustry. Applications include vehicle control, consumer electronics, communica-tion systems, remote sensing, and household appliances. In such applications,specifications may change continuously, and time-to-market strongly affects suc-cess. This calls for the use of software programmable components with behaviorthat can be fairly easily changed. Such systems, which use a computer to per-form a specific function, but are neither used nor perceived as a computer, aregenerically known as embedded systems. More specifically, we are interested inreactive embedded systems. Reactive systems are those that react continuously totheir environment at the speed of the environment. They can be contrasted withinteractive systems, which react with the environment at their own speed, andtransformational systems, which take a body of input data and transform it into abody of output data [Ber89].

1

A large percentage of the world-wide market for micro-processors is filled bymicro-controllers that are the programmable core of embedded systems. In ad-dition to micro-controllers, embedded systems may consist of ASICs and/or fieldprogrammable gate arrays as well as other programmable computing units such asDigital Signal Processors (DSPs). Since embedded systems interact continuouslywith an environment that is analog in nature, there must typically be componentsthat perform A/D and D/A conversions. A significant part of the design problemconsists of deciding the software and hardware architecture for the system, as wellas deciding which parts should be implemented in software running on the pro-grammable components and which should be implemented in more specializedhardware.

Embedded systems often are used in life critical situations, where reliabilityand safety are more important criteria than performance. Today, embedded sys-tems are designed with an ad hoc approach that is heavily based on earlier expe-rience with similar products and on manual design. Use of higher level languagessuch as C helps somewhat, but with increasing complexity, it is not sufficient. For-mal verification and automatic synthesis of implementations are the surest ways toguarantee safety. However, both formal verification and synthesis from high lev-els of abstraction have been demonstrated only for small, specialized languageswith restricted semantics. This is at odds with the complexity and heterogeneityfound in typical embedded systems.

We believe that the design approach should be based on the use of one ormore formal models to describe the behavior of the system at a high level ofabstraction, before a decision on its decomposition into hardware and softwarecomponents is taken. The final implementation of the system should be made asmuch as possible using automatic synthesis from this high level of abstraction toensure implementations that are “correct by construction.” Validation (throughsimulation or verification) should be done as much as possible at the higher levelsof abstraction.

A typical hardware architecture for an embedded system is illustrated in Fig-ure 1. This type of architecture combines custom hardware with embedded soft-ware, lending a certain measure of complexity and heterogeneity to the design.Even within the software or hardware portions themselves, however, there is oftenheterogeneity. In software, control-oriented processes might be mixed under thesupervision of a multitasking real-time kernel running on a microcontroller. In ad-dition, hard-real-time tasks may run cooperatively on one or more programmableDSPs. The design styles used for these two software subsystems are likely to

2

system bus

ASIC microcontroller

control panelreal-timeoperating

system

controllerprocess

user interfaceprocess

programmableDSP

DSPassembly

codeprogrammable

DSP

dual-ported memory

DSPassembly

code

CODEC

hardware software

Figure 1: A typical reactive real-time embedded system architecture.

be quite different from one another, and testing the interaction between them isunlikely to be trivial.

The hardware side of the design will frequently contain one or more ASICs,perhaps designed using logic or behavioral synthesis tools. On the other hand, asignificant part of the hardware design most likely consists of interconnections ofcommodity components, such as processors and memories. Again, this time onthe hardware side, we find heterogeneity. The design styles used to specify andsimulate the ASICs and the interconnected commodity components are likely tobe quite different. A typical system, therefore, not only mixes hardware designwith software design, but also mixes design styles within each of these categories.

Most often the set of tasks that the system implements are not specified in arigorous and unambiguous fashion, so the design process requires several itera-tions to obtain convergence. Moreover, during the design process, the level ofabstraction, detail, and specificity in different parts of the design varies. To com-plicate matters further, the skill sets and design styles used by different engineerson the project are likely to be different. The net result is that during the designprocess, many different specification and modeling techniques will be used.

Managing the design complexity and heterogeneity is the key problem. We

3

believe that the use of formal models and high-level synthesis for ensuring safeand correct designs depends on understanding the interaction between diverse for-mal models. Only then can the simplicity of modeling required by verification andsynthesis be reconciled with the complexity and heterogeneity of real-world de-sign.

The concurrent design process for mixed hardware/software embedded sys-tems involves solving the following sub-problems: specification, validation, andsynthesis. Although these problems cannot be entirely separated, we deal withthem below in three successive sections.

2 Specification and modeling

The design process is often viewed as a sequence of steps that transforms a set ofspecifications described informally into a detailed specification that can be usedfor manufacturing. All the intermediate steps are characterized by a transforma-tion from a more abstract description to a more detailed one.

A designer can perform one or more steps in this process. For the designer,the “input” description is a specification, the final description of the design is animplementation. For example, a software designer may see a set of routines writ-ten in C as an implementation of her/his design even though several other stepsmay be taken before the design is ready for manufacturing. During this process,verification of the quality of the design with respect to the demands placed onits performance and functionality has to be carried out. Unfortunately, the de-scriptions of the design at its various stages are often informal and not logicallyconnected by a set of precise relationships.

We advocate a design process that is based on representations with precisemathematical meaning so that both the verification and the map from the initialdescription to the various intermediate steps can be carried out with tools of guar-anteed performance. Such an approach is standard in certain communities, wherelanguages with strong formal properties are used to ensure robust design. Ex-amples include ML [MTH90], dataflow languages (e.g. Lucid [WA85], Haskell[Dav92]) and synchronous languages (e.g., Lustre, Signal, Esterel [Hal93]).

There is a broad range of potential formalizations of a design, but most toolsand designers describe the behavior of a design as a relation between a set of inputsand a set of outputs. This relation may be informal, even expressed in naturallanguage. It is easy to find examples where informal specifications resulted in

4

unnecessary redesigns. In our opinion, a formal model of a design should consistof the following components:

1. A functional specification, given as a set of explicit or implicit relationswhich involve inputs, outputs and possibly internal (state) information.

�

2. A set of properties that the design must satisfy, given as a set of relationsover inputs, outputs, and states, that can be checked against the functionalspecification.

3. A set of performance indices that evaluate the quality of the design in termsof cost, reliability, speed, size, etc., given as a set of equations involving,among other things, inputs and outputs.

4. A set of constraints on performance indices, specified as a set of inequali-ties.

The functional specification fully characterizes the operation of a system, whilethe performance constraints bound the cost (in a broad sense). The set of prop-erties is redundant, in that in a properly constructed design, the functional spec-ification satisfies these properties. However, the properties are listed separatelybecause they are simpler and more abstract (and also incomplete) compared to thefunctional specification. A property is an assertion about the behavior, rather thana description of the behavior. It is an abstraction of the behavior along a particularaxis. For example, when designing a network protocol, we may require that thedesign never deadlock (this is also called a liveness property). Note that livenessdoes not completely specify the behavior of the protocol; it is instead a propertywe require our protocol to have. For the same protocol, we may require that anyrequest will eventually be satisfied (this is also called fairness). Again this doesnot completely specify the behavior of the protocol but it is a required property.

Given a formal model of the functional specifications and of the properties,we can classify properties in three groups:

1. Properties that are inherent in the model of computation (i.e., they can beshown formally to hold for all specifications described using that model).

�

We will define later on what we mean exactly by inputs, outputs and state information. Fornow, consider them as sequences of values.

5

2. Properties that can be verified syntactically for a given specification (i.e.,they can be shown to hold with a simple, usually polynomial-time, analysisof the specification).

3. Properties that must be verified semantically for a given specification (i.e.,they can be shown to hold by executing, at least implicitly, the specificationfor all inputs that can occur).

For example, consider the property of determinate behavior, i.e., the fact thatthe output of a system depends only on its inputs and not on some internal, hid-den choice. Any design described by a dataflow network (a formal model to bedescribed later) is determinate, and hence this property need not be checked. Ifthe design is represented by a network of FSMs, determinacy can be assessed byinspection of the state transition function. In some discrete event models (for ex-ample those embodied in Verilog and VHDL) determinacy is difficult to prove: itmust be checked by exhaustive simulation.

The design process takes a model of the design at a level of abstraction andrefines it to a lower one. In doing so, the designer must ensure that the propertiesat that level of abstraction are verified, that the constraints are satisfied, and thatthe performance indices are satisfactory. The refinement process involves alsomapping constraints, performance indices and properties to the lower level so thatthey can be computed for the next level down.

�

Figure 2 shows a key refinementstage in embedded system design. The more abstract specification in this case isan executable functional model that is closer to the problem level. The specifica-tion undergoes a synthesis process (which may be partly manual) that generatesa model of an implementation in hardware. That model itself may still be fairlyabstract, capturing for example only timing properties. In this example the modelis presumably used for hardware/software partitioning.

While figure 2 suggests a purely top-down process, any real design needsmore interaction between specification and implementation. Nonetheless, when adesign is complete, the best way to present and document it is top down. This isenough to require that the methodology support top-down design.

�

The refinement process can be defined formally once the models of the design are formallyspecified, see McMillan [McM93].

6

imperative FSMs dataflowdiscreteevent

compilersoftwaresynthesis

behavioralsynthesis

logicsynthesis

partitioning

processormodel

processormodel

HW simul.model

HW simul.model

Specification

Refinement

Implementation

decreasing abstraction

Figure 2: An example of a design refinement stage, which uses hardware andsoftware synthesis to translate a functional specification into a model of hardware.

7

2.1 Elements of a Model of Computation

A language is a set of symbols, rules for combining them (its syntax), and rulesfor interpreting combinations of symbols (its semantics). Two approaches to se-mantics have evolved, denotational and operational. A language can have both(ideally they are consistent with one another, although in practice this can be dif-ficult to achieve). Operational semantics, which dates back to Turing machines,gives meaning of a language in terms of actions taken by some abstract machine,and is typically closer to the implementation. Denotational semantics, first devel-oped by Scott and Strachey [Sto77], gives the meaning of the language in termsof relations.

How the abstract machine in an operational semantics can behave is a featureof what we call the model of computation underlying the language. The kinds ofrelations that are possible in a denotational semantics is also a feature of the mod-el of computation. Other features include communication style, how individualbehavior is aggregated to make more complex compositions, and how hierarchyabstracts such compositions.

A design (at all levels of the abstraction hierarchy from functional specifi-cation to final implementation) is generally represented as a set of components,which can be considered as isolated monolithic blocks, interacting with each otherand with an environment that is not part of the design. The model of computationdefines the behavior and interaction of these blocks.

In the sections that follow, we present a framework for comparing elementsof different models of computation, called the tagged-signal model, and use it tocontrast different styles of sequential behavior, concurrency, and communication.We will give precise definitions for a number of terms, but these definitions willinevitably conflict with standard usage in some communities. We have discoveredthat, short of abandoning the use of most common terms, no terminology can beconsistent with standard usage in all related communities. Thus we attempt toavoid confusion by being precise, even at the risk of being pedantic.

2.1.1 The Tagged-Signal Model

Two of the authors (Lee and Sangiovanni-Vincentelli) have proposed the tagged-signal model [LSV96], a formalism for describing aspects of models of com-putation for embedded system specification. It is denotational in the Scott andStrachey [Sto77] sense, and it defines a semantic framework (of signals and pro-

8

cesses) within which models of computation can be studied and compared. It isvery abstract—describing a particular model of computation involves imposingfurther constraints that make it more concrete.

The fundamental entity in the Tagged-Signal Model is an event—a value/tagpair. Tags are often used to denote temporal behavior. A set of events (an abstractaggregation) is a signal. Processes are relations on signals, expressed as sets of� -tuples of signals. A particular model of computation is distinguished by theorder it imposes on tags and the character of processes in the model.

Given a set of values�

and a set of tags � , an event is a member of �� ,i.e., an event has a tag and a value. A signal � is a set of events. A signal can beviewed as a subset of �� . A functional signal is a (possibly partial) functionfrom � to

�. The set of all signals is denoted � . A tuple of � signals is denoted� , and the set of all such tuples is denoted �� .

The different models of time that have been used to model embedded systemscan be translated into different order relations on the set of tags � in the taggedsignal model. In particular, in a timed system � is totally ordered, i.e., there is abinary relation on members of � such that if � � � � �� and � �� , then either� � �� or � � �� . In an untimed system, � is only partially ordered.

A process � with � signals is a subset of the set of all � -tuples of signals, ��for some � . A particular � � �� is said to satisfy the process if � � � . An � thatsatisfies a process is called a behavior of the process. Thus a process is a set ofpossible behaviors, or a relation between signals.

For many (but not all) applications, it is natural to partition the signals asso-ciated with a process into inputs and outputs. Intuitively, the process does notdetermine the values of the inputs, and does determine the values of the outputs.If � �� , then ��! � �#"%$ is a partition of �#� . A process with � inputs and � out-puts is a subset of � �&� " . In other words, a process defines a relation betweeninput signals and output signals. A � �'�(� $ -tuple � � � *)+" is said to satisfy � if� � � . It can be written � � � � �,� � � $ , where � �-� � is an � -tuple of input signalsfor process � and � �.� � " is an � -tuple of output signals for process � . If theinput signals are given by � �/� � , then the set 0 �21 � � �3� � � $-4 � �5� � "76 describesthe inputs, and 098:� is the set of behaviors consistent with the input � � .

A process ; is functional with respect to a partition if it is a single-valued,possibly partial, mapping from � to � " . That is, if � � �3� � � $ � ; and � � �3� �=< $ � ; ,then � � � �=< . In this case, we can write � � � ;>� � � $ , where ; ?�� >@ � "is a (possibly partial) function. Given the input signals, the output signals aredetermined (or there is unambiguously no behavior).

9

Consider, as a motivating example introducing these several mechanisms todenote temporal behavior, the problem of modeling a time-invariant dynamicalsystem on a computer. The underlying mathematical model, a set of differentialequations over continuous time, is not directly implementable on a digital com-puter, due to the double quantization of real numbers into finite bit strings, andof time into clock cycles. Hence a first translation is required, by means of anintegration rule, from the differential equations to a set of difference equations,that are used to compute the values of each signal with a given tag from the valuesof some other signals with previous and/or current tags.

If it is possible to identify several strongly connected components in the de-pendency graph

<, then the system is decoupled. It becomes then possible to go

from the total order of tags implicit in physical time to a partial order imposedby the depth-first ordering of the components. This partial ordering gives us somefreedom in implementing the integration rule on a computer. We could, for ex-ample, play with scheduling by embedding the partial order into the total orderamong clock cycles. It is often convenient, for example, to evaluate a componentcompletely, for all tags, before evaluating components that depend on it. Or it ispossible to spread the computation among multiple processors.

In the end, time comes back into the picture, but the double mapping, fromtotal to partial order, and back to total order again, is essential to

1. prove properties about the implementation (e.g., stability of the integrationmethod, a bound on the maximum execution time, . . . ),

2. optimize the implementation with respect to a given cost function (e.g., sizeof the buffers required to hold intermediate signals versus execution time,satisfaction of a constraint on the maximum execution time, . . . ),

2.1.2 State

Most models of computation include components with state, where behavior isgiven as a sequence of state transitions. In order to formalize this notion, let usconsider a process ; that is functional with respect to partition �� " $ . Let usassume for the moment that ; belongs to a timed system, in which tags are totallyordered. Then for any tuple of signals � , we can define �� to be a tuple of the(possibly empty) subset of the events in � with tags greater than � .

�A directed graph with a node for each signal, and an edge between two signals whenever the

equation for the latter depends on the former.

10

Two input signal tuples � � � � � are in relation�� (denoted �� $ � ��

� )if � �� implies ;>�� $ �� ; � � $ �� . This definition intuitively means thatprocess ; cannot distinguish between the “histories” of � and � prior to time � .Thus, if the inputs are identical after time � , then the outputs will also be identical.��

� is an equivalence relation, partitioning the set of input signal tuples intoequivalence classes for each � . Following a long tradition, we call these equiva-lence classes the states of ; . In the hardware community, components with onlyone state for each � are called combinational, while components with more thanone state for some � are called sequential. Note however that the term “sequential”is used in very different ways in other communities.

2.1.3 Decidability

Components with a finite number of states differ significantly from those with aninfinite number of states. For certain infinite-state models (those that are Turing-complete), many desirable properties are undecidable—they cannot be determinedin a finite amount of time for all systems. These properties include whether asystem will need more memory than is available, whether a system will halt, andhow fast a system will run. Hopcroft and Ullman [HU79] discuss these issues atlength.

Undecidability is not an insurmountable barrier, and decidability is not suffi-cient to answer all questions in practice (e.g., because the required run-time maybe prohibitive). Many successful systems have been designed using undecidablelanguages (i.e., those in which questions about some programs are undecidable).Although no algorithm can solve an undecidable problem for all systems, algo-rithms exist that can solve them for most systems. Buck’s Boolean Dataflowscheduler [Buc93], for example, can answer the halting and bounded memoryproblems for many systems specified in a Turing-complete dataflow model, al-though it does, necessarily, fail to reach a conclusion for some systems.

The non-terminating nature of embedded systems opens the possibility of us-ing infinite time to solve certain undecidable problems. Parks’ [Par95] scheduler,for example, will execute a potentially infinite-state system forever in boundedmemory if it is possible to do so. However, it does not answer the question of howmuch memory is needed or whether the program will eventually halt.

The classical von Neumann model of computation � is a familiar model ofsequential behavior. A memory stores the state and a processor advances the stateIt is formalized in the abstract model called random access machine or random access stored

11

through a sequence of memory operations. Most commonly-used programminglanguages (e.g., C, C++, Lisp, Pascal, FORTRAN) use this model of computation.Often, the memory is viewed as having an unbounded number of finite-valuedwords, which, when coupled with an appropriate choice of processor instructions,makes the model Turing complete � . Modern computer systems make this modelpractical by simulating unbounded memory with an elaborate hierarchy (registers,cache, RAM, hard disk). Few embedded systems, however, can currently affordsuch a scheme.

2.1.4 Concurrency and Communication

While sequential or combinational behavior is related to individual processes, em-bedded systems will typically contain several coordinated concurrent processes.At the very least, such systems interact with an environment that evolves inde-pendently, at its own speed. But it is also common to partition the overall modelinto tasks that also evolve more or less independently, occasionally (or frequently)interacting with one another.

Communication between processes can be explicit or implicit. In explicit com-munication, a sender process informs one or more receiver processes about somepart of its state. In implicit communication, two or more processes share a com-mon notion of state.

Time plays a larger role in embedded systems than in classical computation.In classical transformational systems, the correct result is the primary concern—when it arrives is less important (although whether it arrives, the termination ques-tion, is important). By contrast, embedded systems are usually real-time systems,where the time at which a computation takes place can be more important than thecomputation itself.

As we discussed above, different models of time become different order rela-tions on the set of tags � in the tagged signal model. Recall that in a timed system� is totally ordered, while in an untimed system � is only partially ordered. Im-plicit communication generally requires totally ordered tags, usually identifiedwith physical time.

The tags in a metric-time system have the notion of a “distance” between them,much like physical time. Formally, there exists a partial function

� ? � � � @program [SS63].�

Turing-completeness can be obtained also with a finite number of infinite-valued words.

12

�mapping pairs of tags to real numbers such that

� � � �,� � � $ �� ,� � � �3� � � $ � � � � �=� � � $ and� � � �3� � � $ � � � � � � � < $�� ,� � < $ .

A discrete-event system is a timed system where the tags in each signal areorder-isomorphic with the integers (for a two-sided system) or the natural numbers(for a one-sided system) [LSV96]. Intuitively, this means that any pair of orderedtags has a finite number of intervening tags.

Two events are synchronous if they have the same tag. Two signals are syn-chronous if each event in one signal is synchronous with an event in the othersignal and vice versa. A system is synchronous if every signal in the system issynchronous with every other signal in the system. A discrete-time system is asynchronous discrete-event system.

Synchronous/reactive languages (see e.g. [Hal93]) are synchronous in exactlythis sense. The set of tags in a behavior of the system denotes a global “clock” forthe system. Every signal conceptually has an event at every tag, although in somemodels this event could have a value denoting the absence of an event (calledbottom). At each clock tick, each process maps input values to output values.If cyclic communication is allowed, then some mechanism must be provided toresolve or prevent circular dependencies. One possibility is to constrain the outputvalues to have tags corresponding to the next tick. Another possibility (all toocommon) is to leave the result unspecified, resulting in nondeterminacy (or worse,infinite computation within one tick). A third possibility is to use fixed-pointsemantics, where the behavior of the system is defined as a set of events thatsatisfy all processes.

Concurrency in physical implementations of systems occurs through somecombination of parallelism, having physically distinct computational resources,and interleaving, sharing of a common physical resource. Mechanisms for achiev-ing interleaving vary widely, ranging from operating systems that manage contextswitches to fully-static interleaving in which concurrent processes are converted(compiled) into a single non-concurrent process. We focus here on the mecha-nisms used to manage communication between concurrent processes.

Parallel physical systems naturally share a common notion of time, accordingto the laws of physics. The time at which an event in one subsystem occurs hasa natural ordering relationship with the time at which an event occurs in anothersubsystem. Physically interleaved systems also share a natural common notion oftime.

Logical systems, on the other hand, need a mechanism to explicitly share anotion of time. Consider two imperative programs interleaved on a single pro-

13

cessor under the control of time-sharing operating system. Interleaving creates anatural ordering between events in the two processes, but this ordering is gener-ally unreliable, because it heavily depends on scheduling policy, system load andso on. Some synchronization mechanism is required if those two programs needto cooperate.

More generally, in logically concurrent systems, maintaining a coherent globalnotion of time as a total order on events, can be extremely expensive. Hence inpractice this is replaced whenever possible with an explicit synchronization, inwhich this total order is replaced by a partial order. Returning to the example oftwo processes running under a time-sharing operating system, we take precautionsto ensure an ordering of two events only if the ordering of these two events matters.

A variety of mechanisms for managing the order of events, and hence forcommunicating information between processes, has arisen. Some of the mostcommon ones are:

� Unsynchronized

In an unsynchronized communication, a producer of information and a con-sumer of the information are not coordinated. There is no guarantee that theconsumer reads valid information produced by the producer, and there is noguarantee that the producer will not overwrite previously produced data be-fore the consumer reads the data. In the tagged-signal model, the repositoryfor the data is modeled as a process, and the reading and writing events haveno enforced ordering relationship between their tags.

� Read-modify-write

Commonly used for accessing shared data structures, this strategy locks adata structure between a read and write from a process, preventing any otheraccesses. In other words, the actions of reading, modifying, and writing areatomic (indivisible). In the tagged-signal model, the repository for the datais modeled as a process where events associated with this process are totallyordered (resulting in a globally partially ordered model). The read-modify-write is modeled as a single event.

� Unbounded FIFO buffered

This is a point-to-point communication strategy, where a producer generatesa sequence of data tokens and consumer consumes these tokens, but onlyafter they have been generated. In the tagged-signal model, this is a simple

14

connection where the signal on the connection is constrained to have totallyordered tags. The tags model the ordering imposed by the FIFO model. Ifthe consumer implements blocking reads, then it imposes a total order onevents at all its input signals. This model captures essential properties ofboth Kahn process networks and dataflow [Kah74].

� Bounded FIFO buffered

In this case, the data repository is modeled as a process that imposes or-dering constraints on its inputs (which come from the producer) and theoutputs (which go to the consumer). Each of the input and output signalsare internally totally ordered. The simplest case is where the size of thebuffer is one, in which case the input and output events must be interleavedso that each output event lies between two input events. Larger buffers im-pose a maximum difference (often called synchronic distance) between thenumber of input and output events.

Note that some implementations of this communication mechanism may notreally block the writing process when the buffer is full, thus requiring somehigher level of flow control to ensure that this never happens, or that it doesnot cause any harm.

� Rendezvous

In the simplest form of rendezvous, implemented for example in Occam andLotos, a single writing process and a single reading process must simulta-neously be at the point in their control flow where the write and the readoccur. It is a convenient communication mechanism, because it has the se-mantics of a single assignment, in which the writer provides the right-handside, and the reader provides the left-hand side. In the tagged-signal mod-el, this is imposed by events with identical tags [LSV96]. Lotos offers, inaddition, multiple rendezvous, in which one among multiple possible com-munications is non-deterministically selected. Multiple rendezvous is moreflexible than single rendezvous, because it allows the designer to specifymore easily several “expected” communication ports at any given time, butit is very difficult and expensive to implement correctly.

Of course, various combinations of the above models are possible. For exam-ple, in a partially unsynchronized model, a consumer of data may be required to

15

Transmitters Receivers Buffer Blocking Blocking SingleSize Reads Writes Reads

Unsynchronized many many one no no noRead-Modify-Write many many one yes yes noUnbounded FIFO one one unbounded yes no yesBounded FIFO one one bounded yes maybe yesSingle Rendezvous one one one yes yes yesMultiple Rendezvous one one one no no yes

Table 1: A comparison of concurrency and communication schemes.

wait until the first time a producer produces data, after which the communicationis unsynchronized.

The essential features of the concurrency and communication styles describedabove are presented in Table 1. These are distinguished by the number of trans-mitters and receivers (e.g., broadcast versus point-to-point communication), thesize of the communication buffer, whether the transmitting or receiving processmay continue after an unsuccessful communication attempt (blocking reads andwrites), and whether the result of each write can be read at most once (singlereads).

2.2 Common Models of Computation

We are now ready to use the scheme developed in the previous Section to classifyand analyze several models of computation that have been used to describe em-bedded systems. We will consider issues such as ease of modeling, efficiency ofanalysis (simulation or formal verification), automated synthesizability, optimiza-tion space versus over-specification, and so on.

2.2.1 Discrete-Event

Time is an integral part of a discrete-event model of computation. Events usuallycarry a totally-ordered time stamp indicating the time at which the event occurs.A DE simulator usually maintains a global event queue that sorts events by timestamp.

Digital hardware is often simulated using a discrete-event approach. The Ver-ilog language, for example, was designed as an input language for a discrete-event

16

simulator. The VHDL language also has an underlying discrete-event model ofcomputation.

Discrete-event modeling can be expensive—sorting time stamps can be time-consuming. Moreover, ironically, although discrete-event is ideally suited to mod-eling distributed systems, it is very challenging to build a distributed discrete-event simulator. The global ordering of events requires tight coordination betweenparts of the simulation, rendering distributed execution difficult.

Discrete-event simulation is most efficient for large systems with large, fre-quently idle or autonomously operating sections. Under discrete-event simulation,only the changes in the system need to be processed, rather than the whole sys-tem. As the activity of a system increases, the discrete-event paradigm becomesless efficient because of the overhead inherent in processing time stamps.

Simultaneous events, especially those arising from zero-delay feedback loops,present a challenge for discrete-event models of computation. In such a situation,events may need to be ordered, but are not.

Consider the discrete-event system shown in Figure 3. Process B has zerodelay, meaning that its output has the same time stamp as its input. If process Aproduces events with the same time stamp on each output, there is ambiguity aboutwhether B or C should be invoked first, as shown in Figure 3(a).

Suppose B is invoked first, as shown in Figure 3(b). Now, depending on thesimulator, C might be invoked once, observing both input events in one invocation,or it might be invoked twice, processing the events one at a time. In the latter case,there is no clear way to determine which event should be processed first.

The addition of delta delay makes such nondeterminacy easier to prevent, butdoes not avoid it completely. It introduces a two-level model of time in which eachinstant of time is broken into (a potentially infinite number of) totally-ordereddelta steps. The simulated time reported to the user, however, does not includedelta information. A “zero-delay” process in this model actually has delta delay.For example, Process B would have delta delay, so firing A followed by B wouldresult in the situation in Figure 3(c). The next firing of C will see the event from Aonly; the firing after that will see the (delay-delayed) event from B.

Other simulators, including the DE simulator in Ptolemy [BHLM94], attemptto statically analyze data precedences within a single time instant. Such prece-dence analysis is similar to that done in synchronous languages (Esterel, Lustre,and Signal) to ensure that simultaneous events are processed deterministically. Itdetermines a partial ordering of events with the same time stamp by examiningdata precedences.

17

A B C�

�

A B C��

(a) (b)

A B C� ��

A B C� ��

(c) (d)

Figure 3: Simultaneous events in a discrete-event system. (a) Process A producesevents with the same time stamp. Should B or C be fired next? (b) Zero-delayprocess B has fired. How many times should C be fired? (c) Delta-delay processB has fired; C will consume A’s output next. (d) C has fired once; it will fire againto consume B’s output.

18

Adding a feedback loop from Process C to A in Figure 3 would create aproblem if events circulate through the loop without any increment in time s-tamp. The same problem occurs in synchronous languages, where such loops arecalled causality loops. No precedence analysis can resolve the ambiguity. In syn-chronous languages, the compiler may simply fail to compile such a program.Some discrete-event simulators will execute the program nondeterministically,while others support tighter control over the sequencing through graph annota-tions.

2.2.2 Communicating Finite State Machines

Finite State Machines (FSMs) are an attractive model for embedded systems. Theamount of memory required by such a model is always decidable, and is often anexplicit part of its specification. Halting and performance questions are alwaysdecidable since each state can, in theory, be examined in finite time. In practice,however, this may be prohibitively expensive.

A traditional FSM consists of:

� a set of input symbols (the Cartesian product of the sets of values of theinput signals),

� a set of output signals (the Cartesian product of the sets of values of theoutput signals),

� a finite set of states with a distinguished initial state,

� an output function mapping inputs and states to outputs, and

� a next-state function mapping inputs and states to (next) states.

The input to such a machine is a sequence of input symbols, and the output is asequence of output symbols.

Traditional FSMs are good for modeling sequential behavior, but are impracti-cal for modeling concurrency or memory because of the so-called state explosionproblem. A single machine mimicking the concurrent execution of a group of ma-chines has a number of states equal to the product of the number of states of eachmachine. A memory has as many states as the number of values that can be storedat each location raised to the power of the number of locations. The number of

19

states alone is not always a good indication of complexity, but it often has a strongcorrelation.

Harel advocated the use of three major mechanisms that reduce the size (andhence the visual complexity) of finite automata for modeling practical system-s [HLN ) 90b]. The first one is hierarchy, in which a state can represent an en-closed state machine. That is, being in a particular state � has the interpretationthat the state machine enclosed by � is active. Equivalently, being in state � meansthat the machine is in one of the states enclosed by � . Under the latter interpreta-tion, the states of � are called “or states.” Or states can exponentially reduce thecomplexity (the number of states) required to represent a system. They compactlydescribe the notion of preemption (a high-priority event suspending or “killing” alower priority task), that is fundamental in embedded control applications.

The second mechanism is concurrency. Two or more state machines are viewedas being simultaneously active. Since the system is in one state of each parallelstate machine simultaneously, these are sometimes called “and states.” They alsoprovide a potential exponential reduction in the size of the system representation.

The third mechanism is non-determinism. While often non-determinism issimply the result of an imprecise (maybe erroneous) specification, it can be anextremely powerful mechanism to reduce the complexity of a system model byabstraction. This abstraction can either be due to the fact that the exact func-tionality must still be defined, or that it is irrelevant to the properties currentlyconsidered of interest. E.g., during verification of a given system componen-t, other components can be modeled as non-deterministic entities to compactlyconstrain the overall behavior. A system component can also be described non-deterministically to permit some optimization during the implementation phase.Non-determinism can also provide an exponential reduction in complexity.

These three mechanisms have been shown in [DH94b] to cooperate synergis-tically and orthogonally, to provide a potential triple exponential reduction in thesize of the representation with respect to a single, flat deterministic FSM

�.

Harel’s Statecharts model uses a synchronous concurrency model (also calledsynchronous composition). The set of tags is a totally ordered countable set thatdenotes a global “clock” for the system. The events on signals are either produced

�The exact claim in [DH94b] was that “and” type non-determinism (in which all non-

deterministic choices must be successful), rather than hierarchical states, was the third sourceof exponential reduction together with “or” type non-determinism and concurrency. Hierarchicalstates, on the other hand, were shown in that paper to be able to simulate “and” non-determinismwith only a polynomial increase in size.

20

by state transitions or inputs. Events at a tick of the clock can trigger state transi-tions in other parallel state machines at the same clock. Unfortunately, Harel leftopen some questions about the semantics of causality loops and chains of instan-taneous (same tick) events, triggering a flurry of activity in the community thathas resulted in at least twenty variants of Statecharts [vdB94].

Most of these twenty variants use the synchronous concurrency model. How-ever, for many applications, the tight coordination implied by the synchronousmodel is inappropriate. In response to this, a number of more loosely coupledasynchronous FSM models have evolved, including behavioral FSMs [TW95],SDL process networks [TW95], and codesign FSMs [CGH ) 94a].

A model that is closely related to FSMs is Finite Automata. FAs emphasize theacceptance or rejection of a sequence of inputs rather than the sequence of outputsymbols produced in response to a sequence of input symbols. Most notions, suchas composition and so on, can be naturally extended from one model to the other.

In fact, any of the concurrency models described in this paper can be usefullycombined with FSMs. In the Ptolemy project [BHLM94], FSMs are hierarchicallynested with dataflow, discrete-event, or synchronous/reactive models [CKL95].The nesting is arbitrarily deep and can mix concurrency models at different levelsof the hierarchy. This very flexible model is called “*charts,” pronounced “starcharts,” where the asterisk is meant to suggest a wildcard.

2.2.3 Synchronous/Reactive

In a synchronous model of computation, all events are synchronous, i.e., all sig-nals have events with identical tags. The tags are totally ordered, and globallyavailable. Simultaneous events (those in the same clock tick) may be totally or-dered, partially ordered, or unordered, depending on the model of computation.Unlike the discrete-event model, all signals have events at all clock ticks, simpli-fying the simulator by requiring no sorting. Simulators that exploit this simplifi-cation are called cycle-based or cycle-driven simulators. Processing all events ata given clock tick constitutes a cycle. Within a cycle, the order in which eventsare processed may be determined by data precedences, which define microstep-s. These precedences are not allowed to be cyclic, and typically impose a partialorder (leaving some arbitrary ordering decisions to the scheduler). Cycle-basedmodels are excellent for clocked synchronous circuits, and have also been appliedsuccessfully at the system level in certain signal processing applications.

A cycle-based model is inefficient for modeling systems where events do not

21

occur at the same rate in all signals. While conceptually such systems can bemodeled (using, for example, special tokens to indicate the absence of an event),the cost of processing such tokens is considerable. Fortunately, the cycle-basedmodel is easily generalized to multirate systems. In this case, every � th event inone signal aligns with the events in another.

A multirate cycle-based model is still somewhat limited. It is an excellentmodel for synchronous signal processing systems where sample rates are relatedby constant rational multiples, but in situations where the alignment of events indifferent signals is irregular, it can be inefficient.

The more general synchronous/reactive model is embodied in the so-calledsynchronous languages [BB91]. Esterel [BS91] is a textual imperative languagewith sequential and concurrent statements that describe hierarchically-arrangedprocesses. Lustre [HCRP91] is a textual declarative language with a dataflow fla-vor and a mechanism for multirate clocking. Signal [BG90] is a textual relationallanguage, also with a dataflow flavor and a more powerful clocking system. Ar-gos [Mar91], a derivative of Harel’s Statecharts [Har87], is a graphical languagefor describing hierarchical finite state machines. Halbwachs [Hal93] gives a goodsummary of this group of languages.

The synchronous/reactive languages describe systems as a set of concurrently-executing synchronized modules. These modules communicate through signalsthat are either present or absent in each clock tick. The presence of a signal iscalled an event, and often carries a value, such as an integer. The modules are re-active in the sense that they only perform computation and produce output eventsin instants with at least one input event.

Every signal in these languages is conceptually (or explicitly) accompaniedby a clock signal, which has meaning relative to other clock signals and definesthe global ordering of events. Thus, when comparing two signals, the associatedclock signals indicate which events are simultaneous and which precede or followothers. In the case of Signal and Lustre, clocks have complex interrelationships,and a clock calculus allows a compiler to reason about these ordering relationshipsand to detect inconsistencies in the definition. Esterel and Argos have simplerclocking schemes and focus instead on finite-state control.

Most of these languages are static in the sense that they cannot request ad-ditional storage nor create additional processes while running. This makes themwell-suited for bounded and speed-critical embedded applications, since their be-havior can be extensively analyzed at compile time. This static property makes asynchronous program finite-state, greatly facilitating formal verification.

22

Verifying that a synchronous program is causal (non-contradictory and deter-ministic) is a fundamental challenge with these languages. Since computation inthese languages is delay-free and arbitrary interconnection of processes is possi-ble, it is possible to specify a program that has either no interpretation (a contradic-tion where there is no consistent value for some signal) or multiple interpretations(some signal has more than one consistent value). Both situations are undesir-able, and usually indicate a design error. A conservative approach that checks forcausality problems structurally flags an unacceptably large number of programsas incorrect because most will manifest themselves only in unreachable programstates. The alternative, to check for a causality problem in any reachable state,can be expensive since it requires an exhaustive check of the state space of theprogram.

In addition to the ability to translate these languages into finite-state descrip-tions, it is possible to compile these languages directly into hardware. Techniquesfor translating both Esterel [Ber91] and Lustre [RH92] into hardware have beenproposed. The result is a logic network consisting of gates and flip-flops that canbe optimized using traditional logic synthesis tools. To execute such a system insoftware, the resulting network is simply simulated. The technique is also the ba-sis to perform more efficiently causality checks, by means of implicit state spacetraversal techniques [SBT96].

2.2.4 Dataflow Process Networks

In dataflow, a program is specified by a directed graph where the nodes (calledactors) represent computations and the arcs represent totally ordered sequences(called streams) of events (called tokens). In figure 4(a), the large circles representactors, the small circle represents a token and the lines represent streams. Thegraphs are often represented visually and are typically hierarchical, in that anactor in a graph may represent another directed graph. The nodes in the graphcan be either language primitives or subprograms specified in another language,such as C or FORTRAN. In the latter case, we are mixing two of the models ofcomputation from figure 2, where dataflow serves as the coordination languagefor subprograms written in an imperative host language.

Dataflow is a special case of Kahn process networks [Kah74, LP95]. In a Kahnprocess network, communication is by unbounded FIFO buffering, and processesare constrained to be continuous mappings from input streams to output streams.“Continuous” in this usage is a topological property that ensures that the program

23

is determinate [Kah74]. Intuitively, it implies a form of causality without time;specifically, a process can use partial information about its input streams to pro-duce partial information about its output streams. Adding more tokens to the inputstream will never result in having to change or remove tokens on the output streamthat have already been produced. One way to ensure continuity is with blockingreads, where any access to an input stream results in suspension of the process ifthere are no tokens. One consequence of blocking reads is that a process cannottest an input channel for the availability of data and then branch conditionally toa point where it will read a different input.

In dataflow, each process is decomposed into a sequence of firings, indivisiblequanta of computation. Each firing consumes and produces tokens. Dividing pro-cesses into firings avoids the multitasking overhead of context switching in directimplementations of Kahn process networks. In fact, in many of the signal pro-cessing environments, a major objective is to statically (at compile time) schedulethe actor firings, achieving an interleaved implementation of the concurrent mod-el of computation. The firings are organized into a list (for one processor) or setof lists (for multiple processors). Figure 4(a) shows a dataflow graph, and Fig-ure 4(b) shows a single processor schedule for it. This schedule is a list of firingsthat can be repeated indefinitely. One cycle through the schedule should returnthe graph to its original state (here, state is defined as the number of tokens oneach arc). This is not always possible, but when it is, considerable simplificationresults [BML96]. In many existing environments, what happens within a firingcan only be specified in a host language with imperative semantics, such as Cor C++. In the Ptolemy system [BHLM94], it can also consist of a quantum ofcomputation specified with any of several models of computation, such as FSMs,a synchronous/reactive subsystem, or a discrete-event subsystem [CHL96].

A useful formal device is to constrain the operation of a firing to be functional,i.e., a simple, stateless mapping from input values to output values. Note, how-ever, that this does not constrain the process to be stateless, since it can maintainstate in a self-loop: an output that is connected back to one of its inputs. An initialtoken on this self-loop provides the initial value for the state.

Many possibilities have been explored for precise semantics of dataflow co-ordination languages, including Karp and Miller’s computation graphs [KM66],Lee and Messerschmitt’s synchronous dataflow graphs [LM87], Lauwereins etal.’s cyclo-static dataflow model [LWAP94, BELP94], Kaplan et al.’s Process-ing Graph Method (PGM) [K ) 87], Granular Lucid [Jag92], and others [Ack82,CG89, CH71, SBKB90]. Many of these limit expressiveness in exchange for for-

24

A C

B

D

(a)

A B C D

(b)

Figure 4: (a) A dataflow process network (b) A single-processor static schedulefor it

mal properties (e.g., provable liveness and bounded memory).Synchronous dataflow (SDF) and cyclo-static dataflow require processes to

consume and produce a fixed number of tokens for each firing. Both have theuseful property that a finite static schedule can always be found that will returnthe graph to its original state. This allows for extremely efficient implementation-s [BML96]. For more general dataflow models, it is undecidable whether such aschedule exists [Buc93].

A looser model of dataflow is the tagged-token model, in which the partialorder of tokens is explicitly carried with the tokens [AG82]. A significant advan-tage of this model is that while it logically preserves the FIFO semantics of thechannels, it permits out-of-order execution.

Some examples of graphical dataflow programming environments intendedfor signal processing (including image processing) are Khoros [RW91], and P-tolemy [BHLM94].

2.2.5 Other models

Another commonly used partially ordered concurrency model is based on ren-dezvous. Two or more concurrent sequential processes proceed autonomously,but at certain points in their control flow, coordinate so that they are simultane-

25

ously at specified points. Rendezvous has been developed into elaborate processcalculi (e.g., Hoare’s CSP [Hoa78] and Milner’s CCS [Mil89]). It has also beenimplemented in the Occam and Lotos programming languages. Ada also uses ren-dezvous, although the implementation is stylistically quite different, using remoteprocedure calls rather than more elementary synchronization primitives.

Rendezvous-based models of computation are often called synchronous. How-ever, by the definition we have given, they are not synchronous. Events are par-tially ordered, not totally ordered, with rendezvous points imposing the partialordering constraints.

No discussing of concurrent models of computation would be complete with-out mentioning Petri nets [Pet81, Rei85]. Petri nets are, in their basic form, neitherTuring complete nor finite state. They are interesting as uninterpreted model forseveral very different classes of problems, including some relevant to embeddedsystem design (e.g., process control, asynchronous communication, scheduling,. . . ). Many questions about Petri nets can be answered in finite time. Moreover, alarge user community has developed a large body of theoretical results and prac-tical design aids and methods based on them. In particular, partial order-basedverification methods (e.g. [Val92], [God90], [McM93]) are one possible answerto the state explosion problem plaguing FSM-based verification techniques.

2.3 Languages

The distinction between a language and its underlying model of computation isimportant. The same model of computation can give rise to fairly different lan-guages (e.g., the imperative Algol-like languages C, C++, Pascal, and FORTRAN).Some languages, such as VHDL and Verilog, support two or more models ofcomputation

�

.The model of computation affects the expressiveness of a language — which

behaviors can be described in the language, whereas the syntax affects compact-ness, modularity, and reusability. Thus, for example, object-oriented propertiesof imperative languages like C++ are more a matter of syntax than a model ofcomputation.

The expressiveness of a language is an important issue. At one extreme, a�

They directly support the Imperative model within a process, and the Discrete Event model a-mong processes. They can also support Extended Finite State Machines under suitable restrictionsknown as the “synthesizable subset”.

26

language that is not expressive enough to specify a particular behavior is clearlyunsuitable, but the other extreme also raises problems. A language that is tooexpressive often raises the complexity of analysis and synthesis. In fact, for veryexpressive languages, many analysis and synthesis problems become undecidable:no algorithm will solve all problem instances in finite time.

A language in which a desired behavior cannot be represented succinctly isalso problematic. The difficulty of solving analysis and synthesis problems is atleast linear in the size of the problem description, and can be as bad as severaltimes exponential, so choosing a language in which the desired behavior of thesystem is compact can be critical.

A language may be very incomplete and/or very abstract. For example, it mayspecify only the interaction between computational modules, and not the com-putation performed by the modules. Instead, it provides an interface to a hostlanguage that specifies the computation, and is called a coordination language(examples include Linda [CG89], Granular Lucid [Jag92], and Ptolemy domain-s [BHLM94]). Or the language may specify only the causality constraints of theinteractions without detailing the interactions themselves nor providing an inter-face to a host language. In this case, the language is used as a tool to prove proper-ties of systems, as done, for example, in process calculi [Hoa78, Mil89] and Petrinets [Pet81, Rei85]. In still more abstract modeling, components in the system arereplaced with nondeterminate specifications that give constraints on the behavior,but not the behavior itself. Such abstraction provides useful simplifications thathelp formal verification.

2.4 Heterogeneous Models of Computation

The variety of models of computation that have been developed is only partial-ly due to immaturity in the field. It appears that different models fundamentallyhave different strengths and weaknesses, and that attempts to find their commonfeatures result in models that are very low level, difficult to use. These low levelmodels (such as Dijkstra’s P/V systems [Dij68]) provide a good theoretical foun-dation, but not a good basis for design.

Thus we are faced with two alternatives in designing complex, heterogeneoussystems. We can either use a single unified approach and suffer the consequences,or we can mix approaches. To use the unified approach today we could choosebetween VHDL and C for a mixed hardware and software design, doing the entiredesign in one or the other (i.e. specifying the software in VHDL or the hardware

27

in C). Or worse, we could further bloat the VHDL language by including a subsetdesigned for software specification (e.g. by making Ada a subset of VHDL). In thealternative that we advocate, we mix approaches while keeping them conceptuallydistinct, for example by using both VHDL and C in a mixed hardware/softwaredesign.

The key problem in the mixed approach, then, is to define the semantics of theinteraction of fundamentally different models of computation. It is not simply aproblem of interfacing languages. It is easy, for example, to provide a mechanismfor calling C procedures from VHDL. But what does it mean if two concurrentVHDL entities call C procedures that interact? The problem is exacerbated by thelack of agreed-upon semantics for C or VHDL.

Studying the interaction semantics of mixed models of computation is themain objective of the Ptolemy project [BHLM94]. There, a hierarchical frame-work is used, where a specification in one model of computation can contain aprimitive that is internally implemented using another model of computation. Theobject-oriented principle of information hiding is used to isolate the models fromone another as much as possible.

3 Validation

Validation loosely refers to the process of determining that a design is correct.Simulation remains the main tool to validate a model, but the importance of formalverification is growing, especially for safety-critical embedded systems. Althoughstill in its infancy, it shows more promise than verification of arbitrary systems,such as generic software programs, because embedded systems are often specifiedin a more restricted way. For example, they are often finite-state.

Many safety properties (including deadlock detection) can be detected in atime-independent way using existing model checking and language containmen-t methods (see, e.g., Kurshan [Kur94] and Burch et al. [BCMD90]). Unfortu-nately, verifying most temporal properties is much more difficult (Alur and Hen-zinger [AH92] provide a good summary). Much more research is needed beforethis is practical.

28

3.1 Simulation

Simulating embedded systems is challenging because they are heterogeneous. Inparticular, most contain both software and hardware components that must besimulated at the same time. This is the co-simulation problem.

The basic co-simulation problem is reconciling two apparently conflicting re-quirements:

� to execute the software as fast as possible, often on a host machine that maybe faster than the final embedded CPU, and certainly is very different fromit; and

� to keep the hardware and software simulations synchronized, so that theyinteract just as they will in the target system.

One approach, often taken in practice, is to use a general-purpose softwaresimulator (based, e.g., on VHDL or Verilog) to simulate a model of the targetCPU, executing the software program on this simulation model. Different modelscan be employed, with a tradeoff between accuracy and performance:

� Gate-level models

These are viable only for small validation problems, where either the pro-cessor is a simple one, or very little code needs to be run on it, or both.

� Instruction-set architecture (ISA) models augmented with hardware inter-faces

An ISA model is a standard processor simulator (often written in C) aug-mented with hardware interface information for coupling to a standard logicsimulator.

� Bus-functional models

These are hardware models only of the processor interface; they cannotrun any software. Instead, they are configured (programmed) to make theinterface appear as if software were running on the processor. A stochasticmodel of the processor and of the program can be used to determine the mixof bus transactions.

� Translation-based models

29

paper hardware software synchronizationsimul. simul. mechanism

Gupta [GJM92] logic custom bus-cycle custom single simul.Rowson [Row94] logic commercial host-compiled handshakeWilson [Wil94] logic commercial host-compiled handshakeThomas [TAS93] logic commercial host-compiled handshaketen Hagen (1) [tHM93] logic commercial host-compiled handshaketen Hagen (2) [tHM93] cycle-based cycle-counting tagged messagesKalavade (1) [KL92] logic custom host-compiled single simul.Kalavade (2) [KL92] logic custom ISA single simul.Lee [KL92] logic custom host-compiled single simul.Sutarwala [SP94] logic commercial ISA on HW simul. single simul.

Table 2: A comparison of co-simulation methods.

These convert the code to be executed on a processor into code that can beexecuted natively on the computer doing the simulation. Preserving timinginformation and coupling the translated code to a hardware simulator arethe major challenges.

When more accuracy is required, and acceptable simulation performance isnot achievable on standard computers, designers sometimes resort to emulation.In this case, configurable hardware emulates the behavior of the system beingdesigned.

Another problem is the accurate modeling of a controlled electromechanicalsystem, which is generally governed by a set of differential equations. This oftenrequires interfacing to an entirely different kind of simulator.

3.1.1 Co-simulation Methods

In this section, we present a survey of some of the representative co-simulationmethods, summarized in Table 2. A unified approach, where the entire system istranslated into a form suitable for a single simulator, is conceptually simple, butcomputationally inefficient. Making better use of computational resources oftenmeans distributing the simulation, but synchronization of the processes becomesa challenge.

The method proposed by Gupta et al. [GJM92] is typical of the unified ap-proach to co-simulation. It relies on a single custom simulator for hardware and

30

software that uses a single event queue and a high-level, bus-cycle model of thetarget CPU.

Rowson [Row94] takes a more distributed approach that loosely links a hard-ware simulator with a software process, synchronizing them with the standardinterprocess communication mechanisms offered by the host operating system.One of the problems with this approach is that the relative clocks of software andhardware simulation are not synchronized. This requires the use of handshakingprotocols, which may impose an undue burden on the implementation. This mayhappen, for example, because hardware and software would not need such hand-shaking since the hardware part runs in reality much faster than in the simulation.

Wilson [Wil94] describes the use of a commercial hardware simulator. In thisapproach, the simulator and software compiled on the host processor interact viaa bus-cycle emulator inside the hardware simulator. The software and hardwaresimulator execute in separate processes and the two communicate via UNIX pipes.Thomas et al. [TAS93] take a similar approach.

Another approach keeps track of time in software and hardware independently,using various mechanisms to synchronize them periodically. For example, ten Ha-gen et al. [tHM93] describe a two-level co-simulation environment that combinesa timed and untimed level. The untimed level is used to verify time-independentproperties of the system, such as functional correctness. At this level, softwareand hardware run independent of each other, passing messages whenever needed.This allows the simulation to run at the maximum speed, while taking full advan-tage of the native debugging environments both for software and for hardware.The timed level is used to verify time-dependent properties, requiring the defini-tion of time in hardware and software. In hardware, time can be measured eitheron the basis of clock cycles (cycle-based simulation, assuming synchronous oper-ation) for maximum performance, or on the basis of estimated or extracted timinginformation for maximum precision. In software, on the other hand, time can bemeasured either by profiling or clock cycle counting information for maximumperformance, or by executing a model of the CPU for maximum precision. Theauthors propose two basic mechanisms for synchronizing time in hardware andsoftware.

1. Software is the master and hardware is the slave. In this case, software de-cides when to send a message, tagged with the current software clock cycle,to the hardware simulator. Depending on the relation between software andhardware time, the hardware simulator can either continue simulation un-

31

til software time or back-up the simulation to software time (this requirescheckpointing capabilities, which few hardware simulators currently have).

2. Hardware is the master and software is the slave. In this case, the hardwaresimulator directly calls communication procedures which, in turn, call usersoftware code.

Kalavade and Lee [KL92] and Lee and Rabaey [LR93] take a similar ap-proach. The simulation and design environment Ptolemy [BHLM94] is used toprovide an interfacing mechanism between different domains. In Ptolemy, objectsdescribed at different levels of abstraction and using different semantic models arecomposed hierarchically. Each abstraction level, with its own semantic model, isa “domain” (e.g., dataflow, discrete-event). Atomic objects (called “stars”) are theprimitives of the domain (e.g., dataflow operators, logic gates). They can be usedeither in simulation mode (reacting to events by producing events) or in synthesismode (producing software or a hardware description). “Galaxies” are collection-s of instances of stars or other galaxies. An instantiated galaxy can belong to adomain different than the instantiating domain. Each domain includes a sched-uler, which decides the order in which stars are executed, both in simulation andin synthesis. For synthesis, it must be possible to construct the schedule statical-ly. Whenever a galaxy instantiates a galaxy belonging to another domain (typicalin co-simulation), Ptolemy provides a mechanism called a “wormhole” for the t-wo schedulers to communicate. The simplest form of communication is to passtime-stamped events across the interface between domains, with the appropriatedata-type conversion.

Kalavade and Lee [KL92] perform co-simulation at the specification level byusing a dataflow model and at the implementation level by using an ISA proces-sor model augmented with the interfaces within a hardware simulator, both builtwithin Ptolemy.

Lee and Rabaey [LR93] simulate the specification by using concurrent pro-cesses communicating via queues within a timed model (the Ptolemy communi-cating processes domain). The same message exchanging mechanism is retainedin the implementation (using a mix of microprocessor-based boards, DSPs, andASICs), thus performing co-simulation of one part of the implementation with asimulation model of the rest. For example, the software running on the micropro-cessor can also be run on a host computer, while the DSP software runs on theDSP itself.

32

Sutarwala and Paulin [SP94] describe an environment coupled with a retar-getable compiler [LMP94b] for cycle-based simulation of a user-definable DSParchitecture. The user only provides a description of the DSP structure and func-tionality, while the environment generates a behavioral bus-cycle VHDL modelfor it, which can then be used to run the code on a standard hardware simulator.

3.2 Formal Verification

Formal verification is the process of mathematically checking that the behavior ofa system, described using a formal model, satisfies a given property, also describedusing a formal model. The two models may or may not be the same, but must sharea common semantic interpretation. The ability to carry out formal verification isstrongly affected by the model of computation, which determines decidability andcomplexity bounds. Two distinct types of verification arise:

� Specification Verification: checking an abstract property of a high-levelmodel. An example: checking whether a protocol modeled as a networkof communicating FSMs can ever deadlock.

� Implementation Verification: checking if a relatively low-level model cor-rectly implements a higher-level model or satisfies some implementation-dependent property. For example: checking whether a piece of hardwarecorrectly implements a given FSM, or whether a given dataflow networkimplementation on a given DSP completely processes an input sample be-fore the next one arrives.

Implementation verification for hardware is a relatively well-developed area,with the first industrial-strength products beginning to appear. For example,most logic synthesis systems have a mechanism to verify a gate-level im-plementation against a set of Boolean equations or an FSM, to detect bugsin the synthesis software.

While simulation could fall under these definitions (if the property is “the be-havior under this stimulus is as expected”), the term formal verification is usuallyreserved for checking properties of the system that must hold for all or a broadclass of inputs. The properties are traditionally broken into two classes:

� Safety properties, which state that no matter what inputs are given, and nomatter how non-deterministic choices are resolved inside the system mod-

33

el, the system will not get into a specific undesirable configuration (e.g.,deadlock, emission of undesired outputs, etc.)

� Liveness properties, which state that some desired configuration will be vis-ited eventually or infinitely often (e.g., expected response to an input, etc.)

More complex checks, such as the correct implementation of a specification,can usually be done in terms of those basic properties. For example, Dill [Dil88]describes a method to define and check correct implementation for asynchronouslogic circuits in an automata-theoretic framework.

In this section we only summarize the major approaches that have been or canbe applied to embedded system verification. These can be roughly divided intothe following classes:

� Theorem proving methods provide an environment that assists the designerin carrying out a formal proof of specification or implementation correct-ness. The assistance can be either in the form of checking the correctnessof the proof, or in performing some steps of the proof automatically (e.g.,Gordon and Melham’s HOL [GM92], the Boyer-Moore system [BKM95]and PVS [ORS92]). The main problems with this approach are the unde-cidability of some higher order logics and the large size of the search spaceeven for decidable logics.

� Finite automata methods restrict the power of the model in order to automateproofs. A Finite Automaton, in its simplest form, consists of a set of states,connected by a set of edges labeled with symbols from an alphabet. Variouscriteria can be used to define which finite or infinite sequences of symbolsare “accepted” by the automaton. The set of accepted sequences is generallycalled the language of the automaton. The main verification methods usedin this case are language containment and model checking.

– In language containment, both the system and the property to be ver-ified are described as a synchronous composition of automata. Theproof is carried out by testing whether the language of one is containedin the language of the other (Kurshan’s approach is typical [Kur94]).One particularly simple case occurs when comparing a synchronousFSM with its hardware implementation. Then both automata are on fi-nite strings, and the proof of equivalence can be performed by travers-ing the state space of their product [CBM89].

34

– Simulation relations are an efficient sufficient (i.e., conservative) crite-rion to establish language containment properties between automata,originating from the process algebraic community ([Mil89, Hoa78]).Informally, a simulation relation is a relation

�between the states

of the two automata such that for each pair of states � � �� in�

, foreach symbol labeling an edge from � , the pair of next states under thatsymbol is also in

�. This relation can be computed much more quick-

ly than the exact language containment test (that in the case of non-deterministic automata requires an exponential determinization step),and hence can be used as a fast heuristic check.

If two automata simulate each other, then there exists a bisimulationrelation between them, that is also a sufficient criterion for languageequivalence. This criterion can be used, for example, to heuristical-ly minimize non-deterministic automata (while the exact procedure isagain exponential).

– In model checking (see, e.g., [CES86, QS82, BCMD90, McM93]), thesystem is modeled as a synchronous or asynchronous composition ofautomata, and the property is described as a formula in some temporallogic [Pnu77, MP92]. The proof is again carried out by traversing thestate space of the automaton and marking the states that satisfy theformula.

� Infinite automata methods can deal with infinite state spaces when someminimization to a finite form is possible. One example of this class are theso-called timed automata ([AD90]), in which a set of real-valued clocks isused to measure time. Severe restrictions are applied, in order to make thismodel decidable. Clocks can only be tested, started, and reset as part of theedge labels of a finite automaton. Also, clocks can only be compared againstinteger values and initialized to integer values. In this case, it is possible toshow that only a finite set of equivalence class representatives is sufficientto represent exactly the behavior of the timed automaton ([CC77, AD90]).McManis and Varaiya [MV94] introduced the notion of suspension, whichextends the class of systems that can be modeled with variations of timedautomata. It is then possible, in principle, to verify timing constraint satis-faction by using preemptive scheduling, which allows a low-priority processto be stopped in the middle of a computation by a high-priority one.

35

The main obstacles to the widespread application of finite automata-basedmethods are the inherent complexity of the problem, and the difficulty for de-signers, generally accustomed to simulation-based models, to formally model thesystem or its properties. The synchronous composition of automata, which is thebasis of all known automata-based methods, is inherently sensitive to the numberof states in the component automata, since the size of the total state space is theproduct of the sizes of the component state spaces.

Abstraction is the most promising technique to tackle this problem, generallyknown as state-space explosion. Abstraction replaces (generally requiring exten-sive user intervention) some system components with simpler versions, exhibitingnondeterministic behavior. Nondeterminism is used to reduce the size of the statespace without losing the possibility of verifying the desired property. The basicidea is to build provably conservative approximations of the exact behavior of thesystem model, such that the complexity of the verification is lower, but no falsepositive results are possible. I.e., the verification system may say that the approxi-mate model does not satisfy the property, while the original one did, thus requiringa better approximation, but it will never say that the approximate model satisfiesthe property, while the original one did not [Bur92, CGL94].

In particular, the systematic application of formal verification techniques s-ince the early stages of a design may lead to a new definition of “optimal” size fora module (apart from those currently in use, that are generally related to humanunderstanding, synthesis or compilation). A “good” leaf-level module must besmall enough to admit verification, and large enough to possess interesting verifi-able properties. The possibility of meaningfully applying abstraction would alsodetermine the appropriate size and contents of modules at the upper levels of thehierarchy.

Another interesting family of formal verification techniques, useful for hetero-geneous systems with multiple concurrent agents, is based on the notion of par-tial ordering between computations in an execution of a process network. Directuse of available concurrency information can be used during the verification pro-cess to reduce the number of explicitly explored states ([McM93, God90, Val92]).Some such methods are based on the so-called “Mazurkiewicz traces,” in which a“trace” is an equivalence class of sequences of state transitions where concurrenttransitions are permuted [Maz84, dSdS95].

Model checking and language containment have been especially useful in ver-ifying the correctness of protocols, which are particularly well-suited to the finiteautomaton model due to their relative data independence. One may claim that

36

these two (closely related) paradigms represent about the only solutions to thespecification verification problem that are currently close to industrial applicabil-ity, thanks to:

� The development of extremely efficient implicit representation methods forthe state space, based on Binary Decision Diagrams ([Bry86], [CBM89]),that do not require to represent and store every reachable state of the mod-eled system explicitly.

� The good degree of automation, at least of the property satisfaction or lan-guage containment checks themselves (once a suitable abstraction has beenfound by hand).

� The good match between the underlying semantics (state-transition objects)and the finite-state behavior of digital systems.

The verification problem becomes much more difficult when one must takeinto account either the actual value of data and the operations performed on them,or the timing properties of the system. The first problem can be tackled by firstassuming equality of arithmetic functions with the same name used at differen-t levels of modeling (e.g., specification and implementation, see Burch and Dil-l [BD94]) and then separately verifying that a given piece of hardware implementscorrectly a given arithmetic function (see Bryant [BC95]). The timing verificationproblem for sequential systems, on the other hand, still needs to be formulatedin a way that permits the solution of practical problems in a reasonable amountof space and time. One possibility, proposed almost simultaneously by [BSV92]and [AIKY93], is to incrementally add timing constraints to an initially untimedmodel, rather than immediately building the full-blown timed automaton. Thisaddition should be done iteratively, to gradually eliminate all “false” violationsof the desired properties due to the fact that some timing properties of the mod-el have been ignored. The iteration can be shown to converge, but the speed ofconvergence still depends heavily on the ingenuity of the designer in providing“hints” to the verification system about the next timing information to consider.

As with many young technologies, optimism about verification techniques ini-tially led to excessive claims about their potential, particularly in the area of soft-ware verification, where the term “proving programs” was broadly touted. Formany reasons, including the undecidability of many verification problems and thefact that verification can only be as good as the properties the designer specifies,

37

this optimism has been misplaced. Berry has suggested using the term “automaticbug detection” in place of “verification” to underscore that it is too much to hopefor a conclusive proof of any nontrivial design. Instead, the goal of verificationshould be a technology that will help designers preventing problems in deployedsystems.

4 Synthesis

By “synthesis,” we mean broadly a stage in the design refinement where a moreabstract specification is translated into a less abstract specification, as suggestedin Figure 2. For embedded systems, synthesis is a combination of manual andautomatic processes, and is often divided into three stages: mapping to architec-ture, in which the general structure of an implementation is chosen; partitioning,in which the sections of a specification are bound to the architectural units; andhardware and software synthesis, in which the details of the units are filled out.

We informally distinguish between software synthesis and software compila-tion, according to the type of input specification. The term software compilationis generally associated with an input specification using C- or Pascal-like imper-ative, generally non-concurrent, languages. These languages have a syntax andsemantics that is very close to that of the implementation (assembly or executablecode). In some sense, they already describe, at a fairly detailed level, the de-sired implementation of the software. We will use the term software synthesis todenote an optimized translation process from a high-level specification that de-scribes the function that must be performed, rather than the way in which it mustbe implemented. Examples of software synthesis can be, for example, the C orassembly code generation capabilities of Digital Signal Processing graphical pro-gramming environments such as Ptolemy ([BHLM90]), of graphical FSM designenvironments such as StateCharts ([HLN ) 90a]), or of synchronous programmingenvironments such as Esterel, Lustre and Signal ([Hal93]).

Recently, higher and higher level synthesis approaches have started to appear.One particularly promising technique for embedded systems is supervisory con-trol, pioneered by Ramadge and Wonham ([RW89]). While most synthesis meth-ods start from an explicit model of how the system that is being designed mustbehave, supervisory control describes what it must achieve. It cleverly combinesa classical control system view of the world with automata-theoretic techniques,to synthesize a control algorithm that is, in some sense, optimum.

38

Supervisory control distinguishes between the plant (an abstraction of thephysical system that must be controlled) and the controller (the embedded systemthat must be synthesized). Given a finite-automaton model of the plant (possiblyincluding limitations on what a controller can do) and of the expected behavior ofthe complete system (plant plus controller), it is possible to determine:

� if a finite-state controller satisfying that specification exists, and

� a “best” finite-state controller, under some cost function (e.g., minimumestimated implementation cost).

Recent papers dealing with variations on this problem are, for example, [HWT92,BSSV95].

4.1 Mapping from Specification to Architecture

The problem of architecture selection and/or design is one of the key aspects ofthe design of embedded systems. Supporting the designer in choosing the rightmix of components and implementation technologies is essential to the success ofthe final product, and hence of the methodology that was used to design it. Gener-ally speaking, the mapping problem takes as input a functional specification andproduces as output an architecture and an assignment of functions to architecturalunits.

An architecture is generally composed of:

� hardware components (e.g., microprocessors, microcontrollers, memories,I/O devices, ASICs, and FPGAs),

� software components (e.g., an operating system, device drivers, procedures,and concurrent programs), and

� interconnection media (e.g., abstract channels, busses, and shared memo-ries).

Partitioning determines which parts of the specification will be implemented onthese components, while their actual implementation will be created by softwareand hardware synthesis.

The cost function optimized by the mapping process includes a mixture oftime, area, component cost, and power consumption, where the relative impor-tance depends heavily on the type of application. Time cost may be measured

39

either as execution time for an algorithm, or as missed deadlines for a soft real-time system � . Area cost may be measured as chip, board, or memory size. Thecomponents of the cost function may take the form of a hard constraint or a quan-tity to be minimized.

Current synthesis-based methods almost invariably impose some restriction-s on the target architecture in order to make the mapping problem manageable.For example, the architecture may be limited to a library of pre-defined com-ponents due to vendor restrictions or interfacing constraints. Few papers havebeen published on automating the design of, say, a memory hierarchy or an I/Osubsystem based on standard components. Notable exceptions to this rule arepapers dealing with retargetable compilation (e.g., Theissinger et al. [TSV94]),or with a very abstract formulation of partitioning for co-design (e.g., Kumaret al. [KAJW92, KAJW93], Prakash and Parker [PP91], and Vahid and Gajs-ki [VG92]). The structure of the application-specific hardware components, onthe other hand, is generally much less constrained.

Often, the communication mechanisms are also standardized for a given method-ology. Few choices, often closely tied to the communication mechanism used atthe specification level, are offered to the designer. Nonetheless, some work hasbeen done on the design of interfaces (e.g., Chou et al. [CWB94]).

4.2 Partitioning

Partitioning is a problem with any design using more than one component. Itis a particularly interesting problem in embedded systems because of the hetero-geneous hardware/software mixture. Partitioning methods can be classified, asshown in Table 3, according to four main characteristics:

� the specification model(s) supported,

� the granularity,

� the cost function, and

� the algorithm.�Real-time systems, and individual timing constraints within such systems, are classified as

soft or hard according to whether missing a deadline just degrades the system performance orcauses a catastrophic failure.

40

Explored algorithm classes include greedy heuristics, clustering methods, it-erative improvement, and mathematical programming.

So far, there seems to be no clear winner among partitioning methods, part-ly due to the early stage of research in this area, and partly due to the intrinsiccomplexity of the problem, which seems to preclude an exact formulation with arealistic cost function in the general case.

Ernst et al. [EH92, HBE93, HEHB94] use a graph-based model, with nodescorresponding to elementary operations (statements in C*, a C-like language ex-tended with concurrency). The cost is derived:

� by profiling, aimed at discovering the bottlenecks that can be eliminatedfrom the initial, all-software partition by moving some operations to hard-ware;

� by estimating the closeness between operations, including control locality(the distance in number of control nodes between activations of the sameoperation in the control flow graph), data locality (the number of commonvariables among operations), and operator closeness (the similarities, e.g.,an add and a subtract are close); and

� by estimating the communication overhead incurred when blocks are movedacross partitions. This is approximated by the (static) number of data itemsexchanged among partitions, assuming a simple memory-mapped commu-nication mechanism between hardware and software.

Partitioning is done in two loops. The inner loop uses simulated annealing,with a quick estimation of the gain derived by moving an operation between hard-ware and software, to improve an initial partition. The outer loop uses synthesisto refine the estimates used in the inner loop.

Olokutun et al. [OHLR94] perform performance-driven partitioning workingon a block-by-block basis. The specification model is a hardware descriptionlanguage. This allows them to use synthesis for hardware cost estimation, andprofiling of a compiled-code simulator for software cost estimation. Partitioningis done together with scheduling, since the overall goal is to minimize responsetime in the context of using emulation to speed up simulation. An initial parti-tion is obtained by classifying blocks according to whether or not they are syn-thesizable, and whether or not the communication overhead justifies a hardwareimplementation. This determines some blocks which must either go into software

41

paper model granularity cost function algorithmHenkel [HEHB94] CDFG (C*) operation profiling (SW) hand (outer)

synthesis and sim. anneal.similarity (HW) (inner)communication cost

Olokutun [OHLR94] HDL task profiling (SW) Kernighansynthesis (HW) and Lin

Kumar [KAJW93] set-based task profiling math. prog.Hu [HDMT94] task list task profiling branch

sched. analysis and boundVahid [VG92] acyclic DFG operation profiling (SW) Mixed

processor cost (HW) Integer-Linearcommunication cost Prog.

Barros (1) [BRX93] Unity (HDL) operation similarity clusteringconcurr./sequenc.

Barros (2) [BS94] Occam operation similarity clusteringhierarchy concurr./sequenc.

hierarchyKalavade [KL94] acyclic DFG operation schedulability heuristic with

look-aheadAdams [AST93] HDL (?) task profiling (SW) hand

synthesis (HW)Eles [EPD94] VHDL task profiling sim. anneal.Luk [LW94] Ruby (HDL) operation rate matching hand

hierarchySteinhausen [SCG ) 93] CDFG (HDL, C) operation profiling handBen Ismail [IAJ94] communicating task ? hand

processesAntoniazzi [ABFS94] FSMs task ? handChou [CWB94] timing diagram operation time (SW) area (HW) min-cutGupta [GJM94] CDFG (HDL) operation time heuristic

Table 3: A comparison of partitioning methods.

42

or hardware. Uncommitted blocks are assigned to hardware or software startingfrom the block which has most to gain from a specific choice. The initial partitionis then improved by a Kernighan and Lin-like iterative swapping procedure.

Kumar et al. [KAJW92, KAJW93], on the other hand, consider partitioningin a very general and abstract form. They use a complex, set-based representationof the system, its various implementation choices and the various costs associ-ated with them. Cost attributes are determined mainly by profiling. The systembeing designed is represented by four sets: available software functions; hard-ware resources; communications between the (software and/or hardware) units;and functions to be implemented, each of which can be assigned a set of softwarefunctions, hardware resources and communications. This means that the givensoftware runs on the given hardware and uses the given communications to im-plement the function. The partitioning process is followed by a decomposition ofeach function into virtual instruction sets, followed by design of an implementa-tion for the set using the available resources, and followed again by an evaluationphase.

D’Ambrosio et al. [DH94a, HDMT94] tackle the problem of choosing a setof processors on which a set of cooperating tasks can be executed while meet-ing real-time constraints. They also use a mathematical formulation, but pro-vide an optimal solution procedure by using branch-and-bound. The cost of asoftware partition is estimated as a lower and an upper bound on processor uti-lization. The upper bound is obtained by rate-monotonic analysis (see Liu andLayland [LL73]), while the lower bound is obtained by various refinements of thesum of task computation times divided by task periods. The branch-and-boundprocedure uses the bounds to prune the search space, while looking for optimalassignments of functions to components, and satisfying the timing constraints.Other optimization criteria can be included beside schedulability, such as responsetimes to tasks with soft deadlines, hardware costs, and expandability, which favorssoftware solutions.

Barros et al. [BRX93] use a graph-based fine-grained representation, witheach unit corresponding to a simple statement in the Unity specification language.They cluster units according to a variety of sometimes vague criteria: similaritybetween units, based on concurrency (control and data independence), sequencing(control or data dependence), mutual exclusion, and vectorization of a sequenceof related assignments. They cluster the units to minimize the cost of cuts in theclustering tree, and then improve the clustering by considering pipelining oppor-tunities, allocations done at the previous stage, and cost savings due to resource

43

sharing.Kalavade and Lee [KL94] use an acyclic dependency graph derived from a

dataflow graph to simultaneously map each node (task) to software or hardwareand schedule the execution of the tasks. The approach is heuristic, and can give anapproximate solution to very large problem instances. To guide the search process,it uses both critical path information and the suitability of a node to hardwareor software. For example, bit manipulations are better suited to hardware whilerandom accesses to a data structure are better suited to software.

Vahid, Gajski et al. [VG92, GNRV96] perform graph-based partitioning of avariable-grained specification. The specification language is SpecCharts, a hier-archical model in which the leaves are “states” of a hierarchical Statecharts-likefinite state machine. These “states” can contain arbitrarily complex behavioralVHDL processes, written in a high-level specification style. Cost function es-timation is done at the leaf level. Each level is assigned an estimated numberof I/O pins, an estimated area (based on performing behavioral, RTL and logicsynthesis in isolation), and an estimated execution time (obtained by simulatingthat initial implementation, and considering communication delay as well). Thearea estimate can be changed if more leaves are mapped onto the same physicalentity, due to potential sharing. The cost model is attached to a graph, in whichnodes represent leaves and edges represent control (activation/deactivation) anddata (communication) dependencies. Classical clustering and partitioning algo-rithms are then applied, followed by a refinement phase. During refinement, eachpartition is synthesized, to get better area and timing estimates, and “peripheral”graph nodes are moved among partitions greedily to reduce the overall cost. Thecost of a given partition is a simple weighted sum of area, pin, chip count, andperformance constraint satisfaction measures.

Steinhausen et al. [SCG ) 93, TSV94, WCR94] describe a complete co-synthesisenvironment in which a CDFG representation is derived from an array of specifi-cation formats, such as Verilog, VHDL and C. The CDFG is partitioned by hand,based on the results of profiling, and then mapped onto an architecture that can in-clude general-purpose micro-processors, ASIPs (application-specific instructionprocessor, software-programmable components designed ad hoc for an applica-tion), and ASICs (application-specific integrated circuits). An interesting aspectof this approach is that the architecture itself is not fixed, but synthesis is driven bya user-defined structural description. ASIC synthesis is done with a commercialtool, while software synthesis, both for general-purpose and specialized proces-sors, is done with an existing retargetable compiler developed by Hoogerbrugge

44

et al. [HC94].Ben Ismail et al. [IAJ94] and Voss et al. [VIJK94] start from a system speci-

fication described in SDL ([SSR89]). The specification is then translated into theSolar internal representation, based on a hierarchical interconnection of commu-nicating processes. Processes can be merged and split, and the hierarchy can bechanged by splitting, moving and clustering of subunits. The sequencing of theseoperations is currently done by the user.

Finally, Chou et al. [CWB94] and Walkup and Borriello [WB93] describe aspecialized, scheduling-based algorithm for interface partitioning. The algorithmis based on a graph model derived from a formalized timing diagram. Nodes rep-resent low-level events in the interface specification. Edges represent constraints,and can either be derived from causality links in the specification, or be addedduring the partitioning process (for example to represent events that occur on thesame wire, and hence should be moved together). The cost function is time forsoftware and area for hardware. The algorithm is based on a min-cut procedureapplied to the graph, in order to reduce congestion. Congestion in this case isdefined as software being required to produce events more rapidly than the targetprocessor can do, which implies the need for some hardware assistance.

4.3 Hardware and Software Synthesis

After partitioning (and sometimes before partitioning, in order to provide cost es-timates) the hardware and software components of the embedded system must beimplemented. The inputs to the problem are a specification, a set of resources andpossibly a mapping onto an architecture. The objective is to realize the specifica-tion with the minimum cost.

Generally speaking, the constraints and optimization criteria for this step arethe same as those used during partitioning. Area and code size must be tradedoff against performance, which often dominates due to the real-time characteris-tics of many embedded systems. Cost considerations generally suggest the useof software running on off-the-shelf processors, whenever possible. This choice,among other things, allows one to separate the software from the hardware syn-thesis process, relying on some form of pre-designed or customized interfacingmechanism.

One exception to this rule are authors who propose the simultaneous designof a computer architecture and of the program that must run on it (e.g., Menezet al. [MABC92], Marwedel [Mar93], and Wilberg et al. [WCR94]). Since the

45

designers of general-purpose CPUs face different problems than the designer-s of embedded systems, we will only consider those authors who synthesize anApplication-Specific Instruction Processor (ASIP, [Pau95]) and the micro-codethat runs on it. The designer of a general-purpose CPU must worry about back-ward compatibility, compiler support, and optimal performance for a wide varietyof applications, whereas the embedded system designer must worry about additionof new functionality in the future, user interaction, and satisfaction of a specificset of timing constraints.

Note that by using an ASIP rather than a standard Application Specific In-tegrated Circuit (ASIC), which generally has very limited programming capa-bilities, the embedded system designer can couple some of the advantages ofhardware and software. For example, performance and power consumption canbe improved with respect to a software implementation on a general-purposemicro-controller or DSP, while flexibility can be improved with respect to a hard-ware implementation. Another method to achieve the same goal is to use repro-grammable hardware, such as Field Programmable Gate Arrays. FPGAs can bereprogrammed either off-line (just like embedded software is upgraded by chang-ing a ROM), or even on-line (to speed up the algorithm that is currently beingexecuted).

The hardware synthesis task for ASICs used in embedded systems (whetherthey are implemented on FPGAs or not) is generally performed according to theclassical high-level and logic synthesis methods. These techniques have beenworked on extensively; for example, recent books by De Micheli [Mic94], De-vadas, Gosh and Keutzer [DGK94], and Camposano and Wolf [CW91] describethem in detail. Marwedel and Goossens [MG95] present a good overview of codegeneration strategies for DSPs and ASIPs.

The software synthesis task for embedded systems, on the other hand, is arelatively new problem. Traditionally, software synthesis has been regarded withsuspicion, mainly due to excessive claims made during its infancy. In fact, theproblem is much more constrained for embedded systems compared to general-purpose computing. For example, embedded software often cannot use virtualmemory, due to physical constraints (e.g., the absence of a swapping device),to real-time constraints, and to the need to partition the specification betweensoftware and hardware. This severely limits the applicability of dynamic taskcreation and memory allocation. For some highly critical applications even theuse of a stack may be forbidden, and everything must be dealt with by pollingand static variables. Algorithms also tend to be simpler, with a clear division into

46

cooperating tasks, each solving one specific problem (e.g., digital filtering of agiven input source, protocol handling over a channel, and so on). In particular, theproblem of translating cooperating finite-state machines into software has beensolved in a number of ways.

Software synthesis methods proposed in the literature can be classified, asshown in Table 4, according to the following general lines:

� the specification formalism,

� interfacing mechanisms (at the specification and the implementation levels),

� when the scheduling is done, and

� the scheduling method.

Almost all software synthesis methods perform some sort of scheduling—sequencing the execution of a set of originally concurrent tasks. Concurrent tasksare an excellent specification mechanism, but cannot be implemented as suchon a standard CPU. The scheduling problem (reviewed e.g. by Halang and S-toyenko [HS91]) amounts to finding a linear execution order for the elementaryoperations composing the tasks, so that all the timing constraints are satisfied. De-pending on how and when this linearization is performed, scheduling algorithmscan be classified as:

� Static, where all scheduling decisions are made at design- or compile-time.

� Quasi-static, where some scheduling decisions are made at run-time, someat compile-time.

� Dynamic, where all decision are made at run-time.

Dynamic schedulers take many forms, but in particular they are distinguishedas preemptive or non-preemptive, depending on whether a task can be interrupt-ed at arbitrary points. For embedded systems, there are compelling motivationsfor using static or quasi-static scheduling, or at least for minimizing preemptivescheduling in order to minimize scheduling overhead and to improve reliabilityand predictability. There are, of course, cases in which preemption cannot beavoided, because it is the only feasible solution to the problem instance ([HS91]),but such cases should be carefully analyzed to limit preemption to a minimum.

47

paper model interface constraint schedulinggranularity algorithm

Cochran [Coc92] task list none task RMA (runtime)Chou [CWB94] task list synthesized task heuristic (static)

operationGupta [GJM94] CDFG ? operation heuristic

with look-ahead(static + runtime)

Chiodo [CGH ) 94b] task list synthesized task RMA (runtime)Menez [MABC92] CDFG ? operation exhaustive

Table 4: A comparison of software scheduling methods.

Many static scheduling methods have been developed. Most somehow con-struct a precedence graph and then apply or adapt classical methods. We referthe reader to Bhattacharyya et al. [BML96] and Sih and Lee [SL93b, SL93a] as astarting point for scheduling of dataflow graphs.

Many approaches to software synthesis for embedded systems divide the com-putation into cooperating tasks that are scheduled at run time. This scheduling canbe done

1. either by using classical scheduling algorithms,

2. or by developing new techniques based on a better knowledge of the do-main. Embedded systems with fairly restricted specification paradigms arean easier target for specialized scheduling techniques than fully general al-gorithms written in an arbitrary high-level language.

The former approach uses, for example, Rate Monotonic Analysis (RMA [LL73])to perform schedulability analysis. In the pure RMA model, tasks are invoked pe-riodically, can be preempted, have deadlines equal to their invocation period, andsystem overhead (context switching, interrupt response time, and so on) is negli-gible. The basic result by Liu and Layland states that under these hypotheses, ifa given set of tasks can be successfully scheduled by a static priority algorithm,then it can be successfully scheduled by sorting tasks by invocation period, withthe highest priority given to the task with the shortest period.

The basic RMA model must be augmented to be practical. Several resultsfrom the real-time scheduling literature can be used to develop a scheduling en-

48

vironment supporting process synchronization, interrupt service routines, contextswitching time, deadlines different from the task invocation period, mode changes(which may cause a change in the number and/or deadlines of tasks), and parallelprocessors. Parallel processor support generally consists of analyzing the schedu-lability of a given assignment of tasks to processors, providing the designer withfeedback about potential bottlenecks and sources of deadlocks.

Chou et al. [CWB94] advocate developing new techniques based on a betterknowledge of the domain. The problem they consider is to find a valid scheduleof processes specified in Verilog under given timing constraints. This approach,like that of Gupta et al. described below, and unlike classical task-based schedul-ing methods, can take into account both fine-grained and coarse-grained timingconstraints. The specification style chosen by the authors uses Verilog construct-s that provide structured concurrency with watchdog-style preemption. In thisstyle, multiple computation branches are started in parallel, and some of them(the watchdogs) can “kill” others upon occurrence of a given condition. A setof “safe recovery points” is defined for each branch, and preemption is allowedonly at those points. Timing constraints are specified by using modes, which rep-resent different “states” for the computation as in SpecCharts, e.g., initialization,normal operation and error recovery. Constraints on the minimum and maximumtime separation between events (even of the same type, to describe occurrencerates) can be defined either within a mode or among events in different modes.Scheduling is performed within each mode by finding a cyclic order of operationswhich preserves I/O rates and timing constraints. Each mode is transformed intoan acyclic partial order by unrolling, and possibly splitting (if it contains parallelloops with harmonically unrelated repetition counts). Then the partial order islinearized by using a longest-path algorithm to check feasibility and assign starttimes to the operations.

The same group describes in [CB94] a technique for device driver synthesis,targeted towards microcontrollers with specialized I/O ports. It takes as input aspecification of the system to be implemented, a description of the function andstructure of each I/O port (a list of bits and directions), and a list of communicationinstructions. It can also exploit specialized functions such as parallel/serial andserial/ parallel conversion capabilities. The algorithm assigns communications inthe specification to physical entities in the micro-controller. It first tries to usespecial functions, then assigns I/O ports, and finally resorts to the more expensivememory-mapped I/O for overflow communications. It takes into account resourceconflicts (e.g. among different bits of the same port), and allocates hardware

49

components to support memory-mapped I/O. The output of the algorithm is anetlist of hardware components, initialization routines and I/O driver routines thatcan be called by the software generation procedure whenever a communicationbetween software and hardware must take place.

Gupta et al. [GJM92, GJM94, GM94] develop a scheduling method for reac-tive real-time systems. The input specification is a Control and DataFlow Graph(CDFG) derived from a C-like HDL called Hardware-C. The cost model takes in-to account the processor type, the memory model, and the instruction executiontime. The latter is derived bottom-up from the CDFG by assigning a processorand memory-dependent cost to each leaf operation in the CDFG. Some operationshave an unbounded execution time, because they are either data-dependent loop-s or synchronization (I/O) operations. Timing constraints are basically data rateconstraints on externally visible Input/ Output operations. Bounded-time opera-tions within a process are linearized by a heuristic method (the problem is knownto be NP-complete). The linearization procedure selects the next operation to beexecuted among those whose predecessors have all been scheduled, according to:whether or not their immediate selection for scheduling can cause some timingconstraint to be missed, and a measure of “urgency” that performs some limitedtiming constraint lookahead. Unbounded-time operations, on the other hand, areimplemented by a call to the runtime scheduler, which may cause a context switchin favor of another more urgent thread.

Chiodo et al. [CGH ) 95] also propose a software synthesis method from ex-tended asynchronous Finite State Machines (called Co-design Finite State Ma-chines, CFSMs). The method takes advantage of optimization techniques fromthe hardware synthesis domain. It uses a model based on multiple asynchronous-ly communicating CFSMs, rather than a single FSM, enabling it to handle systemswith widely varying data rates and response time requirements. Tasks are orga-nized with different priority levels, and scheduled according to classical run-timealgorithms like RMA. The software synthesis technique is based on a very sim-ple CDFG, representing the state transition and output functions of the CFSM.The nodes of the CDFG can only be of two types: TEST nodes, which evaluatean expression and branch according to its result, and ASSIGN nodes, which e-valuate an expression and assign its result to a variable. The authors develop amapping from a representations of the state transition and output functions us-ing Binary Decision Diagrams ([Bry86]) to the CDFG form, and can thus use abody of well-developed optimization techniques to minimize memory occupationand/or execution time. The simple CDFG form permits also an easy and relatively

50

accurate prediction of software cost and performance, based on cost assignmen-t to each CDFG node ([SSV96]). The cost (code and data memory occupation)and performance (clock cycles) of each node type can be evaluated with a gooddegree of accuracy, based on a handful of system-specific parameters (e.g., thecost of a variable assignment, of an addition, of a branch). These parameters canbe derived by compiling and running a few carefully designed benchmarks on thetarget processor, or on a cycle-accurate emulator or simulator.

Liem et al. [LMP94b] tackle a very different problem, that of retargetablecompilation for a generic processor architecture. They focus their optimizationtechniques towards highly asymmetric processors, such as commercial DigitalSignal Processors (in which, for example, one register may only be used for mul-tiplication, another one only for memory addressing, and so on). Their registerassignment scheme is based on the notion of classes of registers, describing whichtype of operation can use which register. This information is used during CDFGcovering with processor instructions [LMP94a] to minimize the number of movesrequired to save registers into temporary locations.

Marwedel [Mar93] also uses a similar CDFG covering approach. The sourcespecification can be written in VHDL or in the Pascal-like language Mimola. Thepurpose is micro-code generation for Very Long Instruction Word (VLIW) pro-cessors, and in this case the instruction set has not been defined yet. Rather, aminimum encoding of the control word is generated for each control step. Con-trol steps are allocated using an As Soon As Possible policy (ASAP, meaning thateach micro-operation is scheduled to occur as soon as its operands have been com-puted, compatibly with resource utilization conflicts). The control word containsall the bits necessary to steer the execution units in the specified architecture toperform all the micro-operations in each step. Register allocation is done in orderto minimize the number of temporary locations in memory due to register spills.

Tiwari et al. [TMW94] describe a software analysis (rather than synthesis)method aimed at estimating the power consumption of a program on a given pro-cessor. Their power consumption model is based on the analysis of single in-structions, addressing modes, and instruction pairs (a simple way of modelingthe effect of the processor state). The model is evaluated by running benchmarkprograms for each of these characteristics, and measuring the current flow to andfrom the power and ground pins.

51

5 Conclusions

In this paper we outlined some important aspects of the design process for em-bedded systems, including specification models and languages, simulation, formalverification, partitioning and hardware and software synthesis.

The design process is iterative—a design is transformed from an informal de-scription into a detailed specification usable for manufacturing. The specificationproblem is concerned with the representation of the design at each of these steps;the validation problem is to check that the representation is consistent both withina step and between steps; and the synthesis problem is to transform the designbetween steps.

We argued that formal models are necessary at each step of a design, and thatthere is a distinction between the language in which the design is specified andits underlying model of computation. Many models of computation have beendefined, due not just to the immaturity of the field but also to fundamental dif-ferences: the best model is a function of the design. The heterogeneous nature ofmost embedded systems makes multiple models of computation a necessity. Manymodels of computation are built by combining three largely orthogonal aspects:sequential behavior, concurrency, and communication.

We presented an outline of the tagged-signal model [LSV96], a frameworkdeveloped by two of the authors to contrast different models of computation. Thefundamental entity in the model is an event (a value/tag pair). Tags usually denotetemporal behavior, and different models of time appear as structure imposed onthe set of all possible tags. Processes appear as relations between signals (setsof events). The character of such a relation follows from the type of process itdescribes.

Simulation and formal verification are two key validation techniques. Mostembedded systems contain both hardware and software components, and it is achallenge to efficiently simulate both components simultaneously. Using sepa-rate simulators for each is often more efficient, but synchronization becomes achallenge.

Formal verification can be roughly divided into theorem proving methods, fi-nite automata methods, and infinite automata methods. Theorem provers general-ly assist designers in constructing a proof, rather than being fully automatic, butare able to deal with very powerful languages. Finite-automata schemes represen-t (either explicitly or implicitly) all states of the system and check properties onthis representation. Infinite-automata schemes usually build finite partitions of the

52

state space, often by severely restricting the input language.In this paper, synthesis refers to a step in the design refinement process where

the design representation is made more detailed. This can be manual and/orautomated, and is often divided into mapping to architecture, partitioning, andcomponent synthesis. Automated architecture mapping, where the overall systemstructure is defined, often restricts the result to make the problem manageable.Partitioning, where sections of the design are bound to different parts of the sys-tem architecture, is particularly challenging for embedded systems because of theelaborate cost functions due to their heterogeneity. Assigning an execution or-der to concurrent modules, and finding a sequence of instructions implementing afunctional module are the primary challenges in software synthesis for embeddedsystems.

6 Acknowledgements

Edwards and Lee participated in this study as part of the Ptolemy project, whichis supported by the Advanced Research Projects Agency and the U.S. Air Force(under the RASSP program, contract F33615-93-C-1317), the State of Califor-nia MICRO program, and the following companies: Cadence, Dolby, Hitachi,LG Electronics, Mitsubishi, Motorola, NEC, Philips, and Rockwell. Lavagnoand Sangiovanni-Vincentelli were partially supported by grants from Cadence,Magneti Marelli, Daimler-Benz, Hitachi, Consiglio Nazionale delle Ricerche, theMICRO program, and SRC.

References

[ABFS94] S. Antoniazzi, A. Balboni, W. Fornaciari, and D. Sciuto. A method-ology for control-dominated systems codesign. In Proceedings ofthe International Workshop on Hardware-Software Codesign, 1994.

[Ack82] W. B. Ackerman. Data flow languages. Computer, 15(2), 1982.

[AD90] R. Alur and D. Dill. Automata for Modeling Real-Time Systems. InAutomata, Languages and Programming: 17th Annual Colloquium,volume 443 of Lecture Notes in Computer Science, pages 322–335,1990. Warwick University, July 16-20.

53

[AG82] Arvind and K. P. Gostelow. The U-Interpreter. Computer, 15(2),1982.

[AH92] R. Alur and T.A. Henzinger. Logics and models of real time: A sur-vey. In J.W. de Bakker, C. Huizing, W.P. de Roever, and G. Rozen-berg, editors, Real-Time: Theory in Practice. REX Workshop Pro-ceedings, 1992.

[AIKY93] R. Alur, A. Itai, R. Kurshan, and M. Yannakakis. Timing verificationby successive approximation. In Proceedings of the Computer AidedVerification Workshop, pages 137–150, 1993.

[AST93] J.K. Adams, H. Schmitt, and D.E. Thomas. A model and methodol-ogy for hardware-software codesign. In Proceedings of the Interna-tional Workshop on Hardware-Software Codesign, October 1993.

[BB91] A. Benveniste and G. Berry. The synchronous approach to reactiveand real-time systems. Proceedings of the IEEE, 79(9):1270–1282,1991.

[BC95] R.E. Bryant and Y-A Chen. Verification of arithmetic circuits withBinary Moment Diagrams. In Proceedings of the Design AutomationConference, pages 535–541, 1995.

[BCMD90] J. Burch, E. Clarke, K. McMillan, and D. Dill. Sequential circuitverification using symbolic model checking. In Proceedings of theDesign Automation Conference, pages 46–51, 1990.

[BD94] J.R. Burch and D.L. Dill. Automatic verification of pipelined mi-croprocessor control. In Proceedings of the Sixth Workshop onComputer-Aided Verification, pages 68–80, 1994.

[BELP94] G. Bilsen, M. Engels, R. Lauwereins, and J. A. Peperstraete. Staticscheduling of multi-rate and cyclo-static DSP applications. In Proc.1994 Workshop on VLSI Signal Processing. IEEE Press, 1994.

[Ber89] G. Berry. Information Processing, volume 89, chapter Real Timeprogramming: Special purpose or general purpose languages, pages11–17. North Holland-Elsevier Science Publishers, 1989.

54

[Ber91] G. Berry. A hardware implementation of pure Esterel. In Proceed-ings of the International Workshop on Formal Methods in VLSI De-sign, January 1991.

[BG90] A. Benveniste and P. Le Guernic. Hybrid dynamical systems the-ory and the SIGNAL language. IEEE Transactions on AutomaticControl, 35(5):525–546, May 1990.

[BHLM90] J. Buck, S. Ha, E.A. Lee, and D.G. Masserschmitt. Ptolemy: aframework for simulating and prototyping heterogeneous systems.Interntional Journal of Computer Simulation, special issue on Sim-ulation Software Development, January 1990.

[BHLM94] J. T. Buck, S. Ha, E. A. Lee, and D. G. Messerschmitt. Ptole-my: A framework for simulating and prototyping heterogeneoussystems. Int. Journal of Computer Simulation, 4(155):155–182,April 1994. Special issue on simulation software development.http://ptolemy.eecs.berkeley.edu/papers/JEurSim.ps.Z.

[BKM95] R.S. Boyer, M. Kaufmann, and J.S. Moore. The Boyer-Moore the-orem prover and its interactive enhancement. Computers & Mathe-matics with Applications, pages 27–62, January 1995.

[BML96] S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. Software Synthesisfrom Dataflow Graphs. Kluwer Academic Press, Norwood, Mass,1996.

[BRX93] E. Barros, W. Rosenstiel, and X. Xiong. Hardware/software parti-tioning with UNITY. In Proceedings of the International Workshopon Hardware-Software Codesign, October 1993.

[Bry86] R. Bryant. Graph-based algorithms for boolean function manipula-tion. IEEE Transactions on Computers, C-35(8):677–691, August1986.

[BS91] F. Boussinot and R. De Simone. The ESTEREL language. Proceed-ings of the IEEE, 79(9), 1991.

[BS94] E. Barros and A. Sampaio. Towards provably correct hard-ware/software partitioning using OCCAM. In Proceedings of the

55

International Workshop on Hardware-Software Codesign, October1994.

[BSSV95] M. Di Benedetto, A. Saldanha, and A. Sangiovanni-Vincentelli.Strong model matching for finite state machines. In Proceedingsof the Third European Control Conference, September 1995.

[BSV92] F. Balarin and A. Sangiovanni-Vincentelli. A verification strategyfor timing-constrained systems. In Proceedings of the Fourth Work-shop on Computer-Aided Verification, pages 148–163, 1992.

[Buc93] J. T. Buck. Scheduling Dynamic Dataflow Graphs with BoundedMemory Using the Token Flow Model. PhD thesis, University ofCalifornia, Berkeley, 1993. Dept. of EECS, Tech. Report UCB/ERL93/69.

[Bur92] J. R. Burch. Automatic Symbolic Verification of Real-Time Concur-rent Systems. PhD thesis, Carnegie Mellon University, August 1992.

[CB94] P. Chou and G. Borriello. Software scheduling in the co-synthesis ofreactive real-time systems. In Proceedings of the Design AutomationConference, June 1994.

[CBM89] O. Coudert, C. Berthet, and J. C. Madre. Verification of Sequen-tial Machines Using Boolean Functional Vectors. In IMEC-IFIP In-t’l Workshop on Applied Formal Methods for Correct VLSI Design,pages 111–128, November 1989.

[CC77] P. Cousot and R. Cousot. Abstract interpretation: a unified latticemodel for static analysis of programs by construction or approxima-tion of fixpoints. In 4th ACM Symp. on Principles of ProgrammingLanguages, Los Angeles, January 1977.

[CES86] E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verificationof finite-state concurrent systems using temporal logic specification-s. ACM TOPLAS, 8(2), 1986.

[CG89] N. Carriero and D. Gelernter. Linda in context. Comm. of the ACM,32(4):444–458, April 1989.

56

[CGH ) 94a] M. Chiodo, P. Giusto, H. Hsieh, A. Jurecska, L. Lavagno, andA. Sangiovanni-Vincentelli. A formal methodology for hard-ware/software codesign of embedded systems. IEEE Micro, August1994.

[CGH ) 94b] M. Chiodo, P. Giusto, H. Hsieh, A. Jurecska, L. Lavagno, andA. Sangiovanni-Vincentelli. Hardware/software codesign of embed-ded systems. IEEE Micro, 14(4):26–36, August 1994.

[CGH ) 95] M. Chiodo, P. Giusto, H. Hsieh, A. Jurecska, L. Lavagno, andA. Sangiovanni-Vincentelli. Synthesis of software programs fromCFSM specifications. In Proceedings of the Design AutomationConference, June 1995.

[CGL94] E. Clarke, O. Grumberg, and D. Long. Model checking and abstrac-tion. ACM Transactions on Programming Languages and Systems,16(5):1512–1542, September 1994.

[CH71] F. Commoner and A. W. Holt. Marked directed graphs. Journal ofComputer and System Sciences, 5:511–523, 1971.

[CHL96] W.-T. Chang, S.-H. Ha, and E. A. Lee. Heterogeneous simulation- mixing discrete-event models with dataflow. J. on VLSI SignalProcessing, 1996. to appear.

[CKL95] W.-T. Chang, A. Kalavade, and E. A. Lee. Effective heterogeneousdesign and cosimulation. In NATO Advanced Study Institute Work-shop on Hardware/Software Codesign, Lake Como, Italy, June 1995.http://ptolemy.eecs.berkeley.edu/papers/effective.

[Coc92] M. Cochran. Using the rate monotonic analysis to analyze theschedulability of ADARTS real-time software designs. In Proceed-ings of the International Workshop on Hardware-Software Code-sign, September 1992.

[CW91] R. Camposano and W. Wolf, editors. High-level VLSI synthesis. K-luwer Academic Publishers, 1991.

[CWB94] P. Chou, E.A. Walkup, and G. Borriello. Scheduling for reactivereal-time systems. IEEE Micro, 14(4):37–47, August 1994.

57

[Dav92] A. Davie. An introduction to functional programming systems usingHaskell. Cambridge University Press, 1992.

[DGK94] S. Devadas, A. Ghosh, and K. Keutzer. Logic synthesis. McGraw-Hill, 1994.

[DH94a] J.G. D’Ambrosio and X.B. Hu. Configuration-level hard-ware/software partitioning for real-time embedded systems. InProceedings of the International Workshop on Hardware-SoftwareCodesign, 1994.

[DH94b] D. Drusinski and D. Harel. On the power of bounded concurrency. I.Finite automata. Journal of the Association for Computing Machin-ery, 41(3):517–539, May 1994.

[Dij68] E. Dijkstra. Cooperating sequential processes. In E. F. Genuys,editor, Programming Languages. Academic Press, New York, 1968.

[Dil88] D.L. Dill. Trace Theory for Automatic Hierarchical Verification ofSpeed-Independent Circuits. The MIT Press, Cambridge, Mass.,1988. An ACM Distinguished Dissertation 1988.

[dSdS95] M. L. de Souza and R. de Simone. Using partial orders for verifyingbehavioral equivalences. In Proceedings of CONCUR ’95, 1995.

[EH92] R. Ernst and J. Henkel. Hardware-software codesign of embeddedcontrollers based on hardware extraction. In Proceedings of the In-ternational Workshop on Hardware-Software Codesign, September1992.

[EPD94] P. Eles, Z. Peng, and A. Doboli. VHDL system-level specificationand partitioning in a hardware/software cosynthesis environment. InProceedings of the International Workshop on Hardware-SoftwareCodesign, September 1994.

[GJM92] R. K. Gupta, C. N. Coelho Jr., and G. De Micheli. Synthesis andsimulation of digital systems containing interacting hardware andsoftware components. In Proceedings of the Design AutomationConference, June 1992.

58

[GJM94] R. K. Gupta, C. N. Coelho Jr., and G. De Micheli. Program imple-mentation schemes for hardware-software systems. IEEE Computer,pages 48–55, January 1994.

[GM92] M.J.C. Gordon and T.F. Melham, editors. Introduction to HOL: atheorem proving environment for higher order logic. Cambridge U-niversity Press, 1992.

[GM94] R.K. Gupta and G. De Micheli. Constrained software generationfor hardware-software systems. In Proceedings of the InternationalWorkshop on Hardware-Software Codesign, 1994.

[GNRV96] D. D. Gajski, S. Narayan, L. Ramachandran, and F. Vahid. Sys-tem design methodologies: aiming at the 100 h design cycle. IEEETransactions on VLSI, 4(1), March 1996.

[God90] P. Godefroid. Using partial orders to improve automatic verificationmethods. In E.M Clarke and R.P. Kurshan, editors, Proceedings ofthe Computer Aided Verification Workshop, 1990. DIMACS Seriesin Discrete Mathematica and Theoretical Computer Science, 1991,pages 321-340.

[Hal93] N. Halbwachs. Synchronous Programming of Reactive Systems. K-luwer Academic Publishers, 1993.

[Har87] D. Harel. Statecharts: A visual formalism for complex systems. Sci.Comput. Program., 8:231–274, 1987.

[HBE93] J. Henkel, T. Benner, and R. Ernst. Hardware generation and par-titioning effects in the COSYMA system. In Proceedings of theInternational Workshop on Hardware-Software Codesign, October1993.

[HC94] J. Hoogerbrugge and H. Corporaal. Transport-triggering vs.operation-triggering. In 5th International Conference on CompilerConstruction, April 1994.

[HCRP91] N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The syn-chronous data flow programming language LUSTRE. Proceedingsof the IEEE, 79(9):1305–1319, 1991.

59

[HDMT94] X. Hu, J.G. D’Ambrosio, B. T. Murray, and D-L Tang. Codesignof architectures for powertrain modules. IEEE Micro, 14(4):48–58,August 1994.

[HEHB94] J. Henkel, R. Ernst, U. Holtmann, and T. Benner. Adaptation of par-titioning and high-level synthesis in hardware/software co-synthesis.In Proceedings of the International Conference on Computer-AidedDesign, November 1994.

[HLN ) 90a] D. Harel, H. Lachover, A. Naamad, A. Pnueli, et al. STATEM-ATE: a working environment for the development of complex re-active systems. IEEE Transactions on Software Engineering, 16(4),April 1990.

[HLN ) 90b] D. Harel, H. Lachover, A. Naamad, A. Pnueli, M. Politi, R. Sherman,A. Shtull-Trauring, and M. Trakhtenbrot. Statemate: A workingenvironment for the development of complex reactive systems. IEEETrans. on Software Engineering, 16(4), April 1990.

[Hoa78] C. A. R. Hoare. Communicating sequential processes. Communica-tions of the ACM, 21(8), 1978.

[HS91] W.A. Halang and A.D. Stoyenko. Constructing predictable real timesystems. Kluwer Academic Publishers, 1991.

[HU79] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory,Languages, and Computation. Addison-Wesley, 1979.

[HWT92] G. Hoffmann and H. Wong-Toi. Symbolic synthesis of supervisorycontrollers. In American Control Conference, Chicago, June 1992.

[IAJ94] T.B. Ismail, M. Abid, and A.A. Jerraya. COSMOS: a codesign ap-proach for communicating systems. In Proceedings of the Interna-tional Workshop on Hardware-Software Codesign, 1994.

[Jag92] R. Jagannathan. Parallel execution of GLU programs. In 2ndInternational Workshop on Dataflow Computing, Hamilton Island,Queensland, Australia, May 1992.

60

[K ) 87] D. J. Kaplan et al. Processing graph method specification version1.0. The Naval Research Laboratory, Washington D.C., December1987.

[Kah74] G. Kahn. The semantics of a simple language for parallel program-ming. In Proc. of the IFIP Congress 74. North-Holland PublishingCo., 1974.

[KAJW92] S. Kumar, J. H. Aylor, B. W. Johnson, and W. A. Wulf. A frameworkfor hardware/software codesign. In Proceedings of the InternationalWorkshop on Hardware-Software Codesign, September 1992.

[KAJW93] S. Kumar, J. H. Aylor, B. Johnson, and W. Wulf. Exploringhardware/software abstractions and alternatives for codesign. InProceedings of the International Workshop on Hardware-SoftwareCodesign, October 1993.

[KL92] A. Kalavade and E. A. Lee. Hardware/software co-design using P-tolemy – a case study. In Proceedings of the International Workshopon Hardware-Software Codesign, September 1992.

[KL94] A. Kalavade and E.A. Lee. A global criticality/local phase drivenalgorithm for the constrained hardware/software partitioning prob-lem. In Proceedings of the International Workshop on Hardware-Software Codesign, 1994.

[KM66] R. M. Karp and R. E. Miller. Properties of a model for parallelcomputations: Determinacy, termination, queueing. SIAM Journal,14:1390–1411, November 1966.

[Kur94] R. P. Kurshan. Automata-Theoretic Verification of Coordinating Pro-cesses. Princeton University Press, 1994.

[LL73] C. Liu and J.W Layland. Scheduling algorithms for multipro-gramming in a hard real-time environment. Journal of the ACM,20(1):44–61, January 1973.

[LM87] E. A. Lee and D. G. Messerschmitt. Synchronous data flow. IEEEProceedings, September 1987.

61

[LMP94a] C. Liem, T. May, and P. Paulin. Instruction set matching and selec-tion for DSP and ASIP code generation. In European Design andTest Conference, February 1994.

[LMP94b] C. Liem, T. May, and P. Paulin. Register assignment through re-source classification for ASIP microcode generation. In Proceed-ings of the International Conference on Computer-Aided Design,November 1994.

[LP95] E. A. Lee and T. M. Parks.Dataflow process networks. Proceedings of the IEEE, May 1995.http://ptolemy.eecs.berkeley.edu/papers/processNets.

[LR93] S. Lee and J.M. Rabaey. A hardware-software co-simulation en-vironment. In Proceedings of the International Workshop onHardware-Software Codesign, October 1993.

[LSV96] E. A. Lee and A. Sangiovanni-Vincentelli. The tagged signal model- a preliminary version of a denotational framework for comparingmodels of computation. Technical report, Electronics Research Lab-oratory, University of California, Berkeley, CA 94720, May 1996.

[LW94] W. Luk and T. Wu. Towards a declarative framework for hardware-software codesign. In Proceedings of the International Workshop onHardware-Software Codesign, 1994.

[LWAP94] R. Lauwereins, P. Wauters, M. Ade, and J. A. Peperstraete. Geomet-ric parallelism and cyclostatic dataflow in GRAPE-II. In Proc. 5thInt. Workshop on Rapid System Prototyping, Grenoble, France, June1994.

[MABC92] G. Menez, M. Auguin, F Boeri, and C. Carriere. A partitioning algo-rithm for system-level synthesis. In Proceedings of the InternationalConference on Computer-Aided Design, November 1992.

[Mar91] F. Maraninchi. The Argos language: Graphical representation ofautomata and description of reactive systems. In Proc. of the IEEEWorkshop on Visual Languages, Kobe, Japan, October 1991.

62

[Mar93] P. Marwedel. Tree-based mapping of algorithms to predefined struc-tures. In Proceedings of the International Conference on Computer-Aided Design, November 1993.

[Maz84] A. Mazurkiewicz. Traces, histories, graphs: Instances of a processmonoid. In M. P. Chytil and V. Koubek, editors, Proc. Conf. onMathematical Foundations of Computer Science, volume 176 of L-NCS. Springer-Verlag, 1984.

[McM93] K. McMillan. Symbolic model checking. Kluwer Academic, 1993.

[MG95] P. Marwedel and G. Goossens, editors. Code generation for embed-ded processors. Kluwer Academic Publishers, 1995.

[Mic94] G. De Micheli. Synthesis and optimization of digital circuits.McGraw-Hill, 1994.

[Mil89] R. Milner. Communication and Concurrency. Prentice-Hall, Engle-wood Cliffs, NJ, 1989.

[MP92] Z. Manna and A. Pnueli. The temporal logic of reactive and concur-rent systems. Springer-Verlag, 1992.

[MTH90] R. Milner, M. Tofte, and R. Harper. The definition of Standard ML.MIT Press, 1990.

[MV94] J. McManis and P. Varaiya. Suspension automata: a decidableclass of hybrid automata. In Proceedings of the Sixth Workshop onComputer-Aided Verification, pages 105–117, 1994.

[OHLR94] K. Olokutun, R. Helaihel, J. Levitt, and R. Ramirez. A software-hardware cosynthesis approach to digital system simulation. IEEEMicro, 14(4):48–58, August 1994.

[ORS92] S. Owre, J.M. Rushby, and N. Shankar. PVS: a prototype verificationsystem. In 11th International Conference on Automated Deduction.Springer-Verlag, June 1992.

[Par95] T. M. Parks. Bounded Scheduling of Process Networks. PhD thesis,University of California, Berkeley, December 1995. Dept. of EECS,Tech. Report UCB/ERL 95/105.

63

[Pau95] P. Paulin. DSP design tool requirements for embedded systems: atelecommunications industrial perspective. Journal of VLSI SignalProcessing, 9(1-2):22–47, January 1995.

[Pet81] J. L. Peterson. Petri Net Theory and the Modeling of Systems.Prentice-Hall Inc., Englewood Cliffs, NJ, 1981.

[Pnu77] A. Pnueli. The temporal logics of programs. In Proceedings of the��

Annual Symposium on Foundations of Computer Science. IEEEPress, May 1977.

[PP91] S. Prakash and A. Parker. Synthesis of application-specific multi-processor architectures. In Proceedings of the Design AutomationConference, June 1991.

[QS82] J. P. Queille and J. Sifakis. Specification and verification of concur-rent systems in Cesar. In International Symposium on Programming.LNCS 137, Springer Verlag, April 1982.

[Rei85] W. Reisig. Petri Nets: An Introduction. Springer-Verlag, 1985.

[RH92] F. Rocheteau and N. Halbwachs. Implementing reactive program-s on circuits: A hardware implementation of LUSTRE. In Real-Time, Theory in Practice, REX Workshop Proceedings, volume 600of LNCS, pages 195–208, Mook, Netherlands, June 1992. Springer-Verlag.

[Row94] J. Rowson. Hardware/software co-simulation. In Proceedings of theDesign Automation Conference, pages 439–440, 1994.

[RW89] P. J. Ramadge and W. M. Wonham. The control of discrete eventsystems. Proceedings of the IEEE, 77(1), January 1989.

[RW91] J. Rasure and C. S. Williams. An integrated visual language andsoftware development environment. Journal of Visual Languagesand Computing, 2:217–246, 1991.

[SBKB90] P. A. Suhler, J. Biswas, K. M. Korner, and J. C. Browne. Tdfl: Atask-level dataflow language. J. on Parallel and Distributed Systems,9(2), June 1990.

64

[SBT96] T. R. Shiple, G. Berry, and H. Touati. Constructive analysis of cycliccircuits. In Proceedings of the European Design and Test Confer-ence, March 1996.

[SCG ) 93] U. Steinhausen,R. Camposano, H Gunther, P. Ploger, M. Theissinger, et al. System-synthesis using hardware/software codesign. In Proceedings of theInternational Workshop on Hardware-Software Codesign, October1993.

[SL93a] G. C. Sih and E. A. Lee. A compile-time scheduling heuristic forinterconnection-constrained heterogeneous processor architectures.IEEE Trans. on Parallel and Distributed Systems, 4(2), February1993.

[SL93b] G. C. Sih and E. A. Lee. Declustering: A new multiprocessorscheduling technique. IEEE Trans. on Parallel and Distributed Sys-tems, 4(6):625–637, June 1993.

[SP94] S. Sutarwala and P. Paulin. Flexible modeling environment for em-bedded systems design. In Proceedings of the International Work-shop on Hardware-Software Codesign, 1994.

[SS63] J.C. Shepherdson and H. E. Sturgis. Computability of recursivefunctions. Journal of the ACM, 10(2):217–255, 1963.

[SSR89] S. Saracco, J. R. W. Smith, and R. Reed. Telecommunications Sys-tems Engineering Using SDL. North-Holland - Elsevier, 1989.

[SSV96] K. Suzuki and A. Sangiovanni-Vincentelli. Efficient software per-formance estimation methods for hardware/software codesign. InProceedings of the Design Automation Conference, 1996.

[Sto77] J. E. Stoy. Denotational Semantics: The Scott-Strachey Approach toProgramming Language Theory. The MIT Press, Cambridge, MA,1977.

[TAS93] D.E. Thomas, J.K. Adams, and H. Schmitt. A model and method-ology for hardware-software codesign. IEEE Design and Test ofComputers, 10(3):6–15, September 1993.

65

[tHM93] K. ten Hagen and H. Meyr. Timed and untimed hardware/softwarecosimulation: application and efficient implementation. In Proceed-ings of the International Workshop on Hardware-Software Code-sign, October 1993.

[TMW94] V. Tiwari, S. Malik, and A. Wolfe. Power analysis of embeddedsoftware: a first step towards software power minimization. IEEETransactions on VLSI Systems, 2(4):437–445, December 1994.

[TSV94] M. Theissinger, P. Stravers, and H. Veit. CASTLE: an interactiveenvironment for hardware-software co-design. In Proceedings ofthe International Workshop on Hardware-Software Codesign, 1994.

[TW95] W. Takach and A. Wolf. An automaton model for scheduling con-straints in synchronous machines. IEEE Tr. on Computers, 44(1):1–12, January 1995.

[Val92] A. Valmari. A stubborn attack on state explosion. Formal Methodsin System Design, 1(4):297–322, 1992.

[vdB94] M. von der Beeck. A comparison of statecharts variants. In Proc. ofFormal Techniques in Real Time and Fault Tolerant Systems, volume863 of LNCS, pages 128–148. Springer-Verlag, 1994.

[VG92] F. Vahid and D. G. Gajski. Specification partitioning for systemdesign. In Proceedings of the Design Automation Conference, June1992.

[VIJK94] M. Voss, T. Ben Ismail, A.A. Jerraya, and K-H. Kapp. Towards atheory for hardware-software codesign. In Proceedings of the Inter-national Workshop on Hardware-Software Codesign, 1994.

[WA85] W. Wadge and E.A. Ashcroft. Lucid, the dataflow programminglanguage. Academic Press, 1985.

[WB93] E. Walkup and G. Borriello. Automatic synthesis of device driversfor hardware-software codesign. In Proceedings of the InternationalWorkshop on Hardware-Software Codesign, October 1993.

66

[WCR94] J. Wilberg, R. Camposano, and W. Rosenstiel. Design flow forhardware/software cosynthesis of a video compression system. InProceedings of the International Workshop on Hardware-SoftwareCodesign, 1994.

[Wil94] J. Wilson. Hardware/software selected cycle solution. In Proceed-ings of the International Workshop on Hardware-Software Code-sign, 1994.

67

Documents

Design of Embedded Systems: Formal Models, Validation, and