Upload
doandiep
View
223
Download
4
Embed Size (px)
Citation preview
UNIVERSITY OF CALIFORNIA
Los Angeles
AMPIRE:
Asynchronous Microprocessor with Instruction Retry
A thesis submitted in partial satisfaction of the
requirements for the degree Master of Science
in Computer Science
by
Chia-Chi Chao
1995
! Copyright by
Chia-Chi Chao
1995
The thesis of Chia-Chi Chao is approved.
Milos D. Ercegovac
David A. Rennels
Yuval Tamir, Committee Chair
University of California, Los Angeles
1995
ii
Table of Contents
Chapter One " Introduction ............................................................................. 11.1. Asynchronous Design ............................................................................ 11.2. Fault Tolerance ...................................................................................... 31.3. Scope of Thesis ...................................................................................... 4
Chapter Two " Previous Work ........................................................................ 62.1. Handshake Protocols .............................................................................. 62.2. Asynchronous Processors and Circuits .................................................. 82.3. Micro Rollback ...................................................................................... 11
Chapter Three " Processor Architecture ...................................................... 143.1. Instruction Set ........................................................................................ 143.2. Processor Overview ............................................................................... 163.3. Normal Operation .................................................................................. 16
3.3.1. Instruction Fetch .......................................................................... 163.3.2. Instruction Issue ........................................................................... 203.3.3. Instruction Execution .................................................................. 21
3.3.3.1. ALU ............................................................................... 213.3.3.2. Data Memory Load/Store .............................................. 223.3.3.3. Flow Control .................................................................. 23
3.3.4. Register File Access .................................................................... 253.4. Fault Tolerance ...................................................................................... 26
3.4.1. Sequence Number and Check Vector .......................................... 263.4.2. Error Detection ............................................................................ 293.4.3. Instruction Log and Validation .................................................... 303.4.4. Rollback ....................................................................................... 313.4.5. Delayed Write Buffer .................................................................. 32
3.5. Synchronization ..................................................................................... 343.5.1. Module Level .............................................................................. 343.5.2. Processor Level ........................................................................... 34
Chapter Four " Behavioral Modeling ............................................................ 36
iii
4.1. Building Blocks ...................................................................................... 374.1.1. Tri-State Bus ................................................................................ 374.1.2. Muller C-Element ........................................................................ 374.1.3. Buffer ........................................................................................... 384.1.4. Reset ............................................................................................ 394.1.5. Arbiter .......................................................................................... 404.1.6. Rollback ....................................................................................... 41
4.2. Processor Modules ................................................................................. 444.2.1. Queues ......................................................................................... 444.2.2. Checkers ...................................................................................... 454.2.3. Memories ..................................................................................... 454.2.4. Program Counter ......................................................................... 474.2.5. Arithmetic Logic Unit ................................................................. 474.2.6. Controller ..................................................................................... 48
4.2.6.1. Instruction Issuing Unit ................................................. 484.2.6.2. Reservation Table .......................................................... 52
4.2.7. Delayed Write Buffers ................................................................. 534.2.7.1. REGDWB ...................................................................... 534.2.7.2. MEMDWB ..................................................................... 55
4.2.8. Instruction Log ............................................................................ 564.3. Putting the Processor Together .............................................................. 58
4.3.1. Parameters and Timing ................................................................ 584.3.2. Top-Level Wiring and Testing Module ...................................... 59
Chapter Five " Behavioral Simulation ........................................................... 605.1. Instruction Fetch/Issue and Queue Operations ...................................... 605.2. Delayed Write Buffers and Checker Arbitration ................................... 645.3. Arbitration for R_bus and Out-of-Order Completion ............................ 675.4. Fault Detection and Rollback ................................................................. 705.5. Running a Real Program ........................................................................ 76
Chapter Six " Hardware Design and Gate-Level Simulation ................... 786.1. C-Elements ............................................................................................. 786.2. Arbiter .................................................................................................... 806.3. Register Completion Detector ................................................................ 81
iv
6.4. Instruction Queue ................................................................................... 836.5. Data Queue ............................................................................................. 85
6.5.1. Sequence Number Comparator ................................................... 856.5.2. DQ Control .................................................................................. 876.5.3. DQ Simulation ............................................................................. 89
6.6. Fault Simulation with Gate-Level Modules ........................................... 93
Chapter Seven " Conclusion ............................................................................ 95
Appendix A " Verilog Simulation Code ......................................................... 98
Appendix B " AMPIRE Assembler ................................................................. 196B.1. Assembly Code Format ......................................................................... 196B.2. Assembler Source Code ........................................................................ 196
Bibliography .......................................................................................................... 206
v
List of Figures
2.1. 2-Phase Handshake Protocol ................................................................ 62.2. 4-Phase Handshake Protocol ................................................................ 62.3. Delay-Insensitive 4-Phase Handshake Protocol ................................... 82.4. Request Generation for Double-Rail Data ........................................... 82.5. MIPS as a Function of Supply Voltage for Caltech Processor ............ 92.6. Handshake Model for Berkeley DSP ................................................... 102.7. Micro Rollback, Restoring a Saved Snapshot ...................................... 112.8. Register File with Support for Micro Rollback ................................... 123.1. AMPIRE Instruction Format ................................................................ 143.2. AMPIRE Block Diagram ..................................................................... 183.3. Sequence Number Windows ................................................................ 273.4. Check Vector for ALU Operation ........................................................ 283.5. Validation and Rollback Signal Flow Diagram ................................... 313.6. Asynchronous Delayed Write Buffer for Register File ....................... 333.7. Halt and Rollback Sequence ................................................................ 354.1. Data Transfer Sequence ....................................................................... 435.1. Test 1 with Slow IMEM ....................................................................... 615.2. Test 1 with Fast IMEM ........................................................................ 615.3. Test 1 with Fast IMEM, Close-Up ....................................................... 615.4. Test 1, Detailed Activities .................................................................... 635.5. Test 2 with All Delays=1 ..................................................................... 645.6. Test 2 with Long DMEMWrite Cycle ................................................ 655.7. Test 2 with Slow Memory Checker ..................................................... 665.8. Test 2, Showing Arbitration of Checkers ............................................. 675.9. Test 3 with Slow DMEM ..................................................................... 685.10. Test 3 with Fast DMEM ....................................................................... 695.11. Test 4, All Faults .................................................................................. 705.12. Test 4, CKI Fault .................................................................................. 715.13. Test 4, CKF Fault ................................................................................. 725.14. Test 4, ALU CKR Fault ....................................................................... 735.15. Test 4, CKM Fault ................................................................................ 745.16. Test 4, DMEM CKR Fault ................................................................... 75
vi
5.17. Test 4, Rollback Sequence ................................................................... 755.18. Test 5, Find the Largest Number ......................................................... 776.1. 3-Input C-Element ................................................................................ 786.2. 14-Input C-Element .............................................................................. 796.3. 2-Input C-Element with Clear .............................................................. 796.4. Arbiter for R_bus ................................................................................. 806.5. Gate-Level Arbiter ............................................................................... 816.6. D-Latch ................................................................................................ 816.7. D-Latch with Set/Reset ........................................................................ 826.8. Register Completion Detector .............................................................. 826.9. Control for IQ Buffer ........................................................................... 846.10. Gate-Level IQ ...................................................................................... 846.11. Sequence Number Comparator ............................................................ 866.12. Borrow Circuits, Full and Half ............................................................ 866.13. XOR Gate ............................................................................................. 876.14. Difference (SUB) Circuit ..................................................................... 876.15. Control for DQ Buffer .......................................................................... 886.16. Test 6, Gate-Level DQ with DLY_CKI_CHK=48 .............................. 906.17. Test 6, Gate-Level DQ with DLY_CKI_CHK=52 .............................. 916.18. Test 7, Gate-Level DQ with DLY_DMEM_RD=8 ............................. 926.19. Test 7, Gate-Level DQ with DLY_DMEM_RD=1 ............................. 936.20. Fault Simulation with Gate-Level Modules ......................................... 94
vii
List of Tables
2.1. Average Instruction Periods for Berkeley DSP ................................... 93.1. AMPIRE Instruction Set ...................................................................... 153.2. Processor Modules ............................................................................... 174.1. Frequently Used Verilog Keywords ..................................................... 364.2. DWB Bits in the Instruction Log ......................................................... 57A.1. Verilog Modules ................................................................................... 98
viii
ABSTRACT OF THE THESIS
AMPIRE:
Asynchronous Microprocessor with Instruction Retry
by
Chia-Chi Chao
Master of Science in Computer Science
University of California, Los Angeles, 1995
Professor Yuval Tamir, Chair
As we build faster digital circuits, the clock skew problem becomes a major
limiting factor in large scale synchronous systems. Asynchronous design has an
advantage of not requiring a global clock, and the high modularity that can be achieved
is especially beneficial for large systems. Reliability of a system is an important issue
in many applications, and hardware fault-tolerant techniques can be applied for fast
recovery from transient faults.
This thesis reports the design of an asynchronous microprocessor that supports
concurrent error detection and software-transparent fault recovery. On-chip parity
checkers operate in parallel with other functional units, and when an error is detected,
the state of the processor is restored to the instruction in which the error occurred. A
Verilog behavioral model of the architecture has been written and simulated, and some
gate-level self-timed circuits have also been designed.
ix
Chapter OneIntroduction
1.1. Asynchronous Design
Because of the advances in semiconductor technology, VLSI circuit density and
speed have increased dramatically over the years. In a synchronous design, one of the
limitations on system speed is clock skew, which is the phase difference of the clock
signals at different locations of the system. As the feature size decreases, wire delays
and loadings become proportionally more significant, and the clock skew problem
becomes worse. Clock skew can be minimized with proper distribution, but clock
balancing cannot be done until the chip layout is complete and loadings characterized.
More robust circuits can be designed to be less sensitive to clock skew, but most likely
at the expense of speed and chip area. Furthermore, the maximum clock frequency has
to accommodate the critical paths in the system, even though some stages may be able
to operate at a much faster rate.
Instead of having a global clock for synchronization, each block in an
asynchronous system decides when it is ready to accept a new set of data to be
processed. Various handshake protocols may be used to coordinate the communication,
and they will be discussed in the next chapter. A properly designed self-timed system
can be made very modular. For example, if an application requires a faster adder, a
ripple-carry adder may be directly replaced with a carry-lookahead one. Since the
circuit is self-timed, the rest of the system does not need to be modified; it will just run
faster when add operations are performed. Conversely, any module may be substituted
with a slower version to save power, and there is no clock to be adjusted to compensate
1
for the increase in critical path.
The following examples illustrate some advantages of asynchronous systems. The
torus routing chip developed at Caltech [Dall86] is a self-timed circuit for
multiprocessor interconnections. Due to an oversight in the critical path, the first
silicon could be run only at 4 MHz rather than the expected 20 MHz. However, the
chip still functioned correctly because of the self-timed design. In the DEC PDP-6
computer, an asynchronous adder was used to take advantage of the higher average
performance [Bell78]. It was observed by von Neumann and others that the average
number of carries is log2(width of data ). Therefore, for the 36-bit word size of PDP-6,
the average performance was increased by a factor of 7 because the number of carries is
5.2 on average rather than the worst case of 36. In a recent RISC architecture, the 100
MHz HP PA-RISC CPU employs a self-timed floating point coprocessor due to speed
and area considerations [Wils92]. The multiplier can compute a double-precision result
in 20 ns and consumes only 0.07 cm2 chip area.
However, by not having a global clock, additional handshake circuits must be
added to handle inter-block communication, and self-timed circuits themselves are
more complex, in general. These timing and area overheads for handshaking must be
considered when choosing between a synchronous or an asynchronous path. Also,
designing circuits for proper handshake sequencing is not a trivial task. Several
approaches have been made to synthesize hardware from high-level
specifications [Burn87, Meng89, Moln85].
The difficulty with metastability may seem to be a major drawback for
asynchronous systems. However, as pointed out in [Meng89], asynchronous design
does not introduce more undecidable timing problem. For latching data, the clock in a
synchronous system is made slow enough so that data arrives before the clock edge. In
2
a properly designed, fully asynchronous system, the latching signal is guaranteed to
arrive after the data, based on the handshake methodology. An arbiter, on the other
hand, is inherently metastable because more than one request may become active
simultaneously [Meng89, Seit80]. Asynchronous systems may be more dependent on
arbiters due to the nature of unpredictable timing. However, if fair mutual exclusion is
to be incorporated in a synchronous design, then metastable circuits are usually
necessary.
1.2. Fault Tolerance
In applications where reliability is a major concern, fault tolerant features should
be designed into a system to minimize down time. Since transient faults occur much
more frequently than permanent faults [Cast82], fast recovery from transient faults is a
key to improve system performance. In environments where a high rate of transient
faults is expected (due to radiation, noise, etc.), software-driven corrections may be too
slow and impractical, and hardware fault tolerant features must be added to assist
recovery. The RH32 radiation-hard, 5-chip computer under development by TRW,
McDonnell Douglas, and United Technologies is an example of fault-tolerant
processor [TRW92]. Various degrees of fault tolerance may be achieved by running the
CPU stand-alone, with other chips in the family, or as a processor/checker pair with two
CPUs. Many faults can be detected and corrected under 2 µs (25 MHz clock) without
any software intervention.
To achieve a high degree of fault tolerance, it is often necessary to detect errors as
soon as they occur so that corrupted data is not spread throughout the system. One way
is to check the data in series by delaying execution until verification is complete.
However, performance is reduced because either the cycle time or the number of
3
pipeline stages must be increased. To minimize performance degradation, checkers can
operate in parallel with the functional units. The next pipeline stage can start execution
while data verification is carried out, and the checking result will be available some
time later. This largely solves the problem with delay through the checker, but fault
recovery becomes more complicated because all computations on the corrupted data
must also be discarded. Micro rollback is a technique that utilizes concurrent
verification, and the system is rolled back several cycles in response to a delayed error
signal [Tami88, Tami90a]. Micro rollback will be briefly described in the next chapter,
and for complete discussion, please see the listed references.
1.3. Scope of Thesis
As the title of this thesis suggests, AMPIRE is an asynchronous fault-tolerant
microprocessor. The DLX architecture [Henn90] is the basis for the design because it is
well-known, and its load-store RISC architecture allows ease of implementation in an
academic environment. Asynchronous processors have been built and tested [Jaco90,
Mart89], and a fault-tolerant processor capable of micro rollback has been designed and
implemented [Tami90b]. Therefore, the original goal of this work was to design an
asynchronous processor that supports micro rollback. Due to unexpected complexities
that will be discussed in Chapter 7, AMPIRE rolls back to instruction boundaries only,
not sub-instruction level as in micro rollback. However, many fault-tolerant ideas are
still based on the research results of micro rollback.
In the next chapter, previously published work related to self-timed design and
micro rollback will be reviewed. The AMPIRE architecture and some design decisions
will be discussed in Chapter 3. In order to check if the design is logically correct, a
Verilog model of the processor has been written and simulated. The behavioral model
4
will be described in Chapter 4, and simulation results will be shown in Chapter 5.
Since this processor is not physically implemented, circuit diagrams will be presented
in Chapter 6 to justify some high-level constructs used in the behavioral model.
Chapter 7 will then discuss some issues found in the design process, and what future
work can be done as extensions of this research. The full Verilog code and the source
code for AMPIRE assembler are listed in the appendices.
5
Chapter TwoPrevious Work
2.1. Handshake Protocols
data
req
ack
Figure 2.1: 2-Phase Handshake Protocol
data
req
ack
Figure 2.2: 4-Phase Handshake Protocol
Whereas a synchronous system initiates a task cycle with clocks, an asynchronous
system uses handshake signals to start and stop an activity. The 2-phase and 4-phase
self-timed signaling conventions in [Seit80] are presented in Figures 2.1 and 2.2 in a
slightly modified form. For both cases, the request is sent after data becomes stable.
After the receiver finishes processing the data, an acknowledgement is sent back, and
then data is released. 4-phase handshake is level-sensitive, and it may be slower
because an extra trip is required to disable the req and ack signals. However, 2-phase
handshake needs edge detectors, which results in more complicated circuitry.
Therefore, 4-phase signaling is generally used for local communication, leaving 2-
phase signaling for long distance transactions. Power wise, 2-phase has the advantage
of using half as many signal transitions, and the energy savings may be significant if the
6
interconnection wires are long.
Note that in Figures 2.1 and 2.2, data must be stable before the request signal can
be activated. This is true at the transmitter, but if data and handshake signals are routed
differently, this ordering may not be preserved at the receiver, and therefore violating
the protocol. This type of handshake is valid only in equipotential regions, unless
routing and delays can be carefully controlled. An equipotential region is an area small
enough such that delay through the wire is small compared to signal rise and fall
times [Seit80]. For communication outside of an equipotential region, a delay-
insensitive protocol should be used to maintain correct self-timed operations.
Delay-insensitive design is a subset of the class of self-timed circuits. One way to
achieve delay-insensitivity is by using double-rail encoding [Seit80], in which the
(data,data) pair is: 00 for undefined, 10 for one, and 01 for zero. 11 is not allowed, and
only one of the two signals may switch at any time. A handshake protocol using this
encoding is shown in Figure 2.3. The first cycle transfers a one, and the second cycle
transfers a zero. If a request signal is needed at the receiver end, it can be easily
generated by ORing the double-rail data lines. Figure 2.4 is a circuit that uses a
C !element to merge the multiple internal request signals. Muller C-elements are often
used in self-timed systems and have the following characteristics: when all inputs are 1,
the output becomes 1, and when all inputs are 0, the output becomes 0. Otherwise, the
output remains in its previous state. C-element implementations will be discussed in
section 6.1.
Delay-insensitive circuit removes a timing constraint, but it also carries a high
price tag. For single-rail plus request and acknowledgement protocols, a data bus of
any width only has to add two wires for handshaking. The double-rail method, while
more reliable, requires twice as many signal lines. Therefore, the trade-off is based on
7
data
data
ack
Figure 2.3: Delay-Insensitive 4-Phase Handshake Protocol
req 2
req 1
reqCin 2
in 1in 1
in 2
Figure 2.4: Request Generation for Double-Rail Data
how much control the designer has on signal delays and wire/block placement.
Conversion between single-rail and double-rail signaling is quite straight forward, and a
circuit can be found on [Seit80, p. 257].
2.2. Asynchronous Processors and Circuits
The asynchronous processor developed at Caltech is entirely delay-insensitive,
with the exception of isochronic forks [Mart89]. An isochronic fork is the distribution
of a signal to several receivers, and the differences in delays are assumed to be small
compared to gate delays, like an equipotential region. It is a general purpose 16-bit
microprocessor with load-store architecture and separate instruction and data memories,
with double-rail encoding, 4-phase handshaking as the communication protocol. The
performance profile for the 2 µm version is shown in Figure 2.5. At room temperature,
the chip is functional with the supply voltage as low as 0.35V, and the speed reaches 30
MIPS at 12V when the chip is submerged in liquid nitrogen. All these performance
variations occur without any clock adjustments, since there is no clock.
8
0 2 4 6 8 10 120
5
10
15
20
25
30MIPS
volts
300°K
77°K
Figure 2.5: MIPS as a Function of Supply Voltage for Caltech Processor
Another fully asynchronous chip is a digital signal processor (DSP) designed at
UC Berkeley [Jaco90]. A ripple-carry adder and an iterative multiplier are used to save
chip area and to take advantage of the self-timed circuits. Since an instruction cannot
utilize both the shifter and the multiplier at the same time, these two units are placed in
the same pipeline stage. Therefore, an instruction cycle time is highly dependent on the
instruction and the data being executed. Table 2.1 lists the average instruction cycle
time at various supply voltages.
Vdd Shift Multiply3.6 V 105 ns 440 ns5.0 V 73 ns 337 ns7.0 V 55 ns 260 ns
Table 2.1: Average Instruction Periods for Berkeley DSP
The DSP chip is self-timed, but not delay-insensitive like the Caltech processor.
Single-rail data and request signals are used, as shown in the processor handshake
model in Figure 2.6. After the request is received, the register is clocked to latch the
data. Since there is no feedback from the register about its completion, an assumption
is made on the delay before the signal I (initialize) is raised to start an evaluation cycle.
9
DVI
data
ackreq
LogicTimedSelf!
Reg
CircuitInterconnect
Figure 2.6: Handshake Model for Berkeley DSP
After the self-timed logic block is finished, the data valid DV signal is sent back to
notify the interconnect circuit. A problem was encountered in the initial layout that
resulted in a very long wire for the register clock. The additional delay was long
enough to cause the logic block to start evaluation before the data bits were settled, but
a change in the floorplan solved the problem. This case demonstrates that by using the
more efficient, delay-dependent circuits, some freedom of block placement is
sacrificed.
As mentioned in Chapter 1, handshake circuits are difficult to design manually
because all events have to occur in the correct sequence without, ironically, a clock.
Both processors discussed in this section started with high-level signal descriptions, and
then the handshake/control circuits were synthesized with CAD tools. Even with
synchronous systems, the control blocks in most processors nowadays are built through
hardware synthesis. The various methods are too complex to be covered in this thesis.
Please see these references for details [Burn87, Mart85, Mart86, Meng89, Moln85].
An asynchronous system is not complete without self-timed memory. A self-
timed static RAM is discussed in [Fran83]. The memory array uses conventional six-
transistor static RAM cells, but circuitry is added to support the additional handshake
requirements. Since the RAM cell and sense amplifier already have differential bit
10
lines, similar to double-rail encoding discussed earlier, generating the completion signal
is a very natural extension. The external interface has additional request and
acknowledge lines to be connected to other asynchronous devices, such as a processor.
Only 5.2% of the total chip area is occupied by the self-timed completion detectors.
2.3. Micro Rollback
Micro rollback (in a synchronous system) works by taking snapshots of the state of
a module, and when an error is detected, a valid state is restored using the saved
information. Figure 2.7 is an illustration of micro rollback [Tami90a]. Error is detected
a few cycles later because checkers operate in parallel with the functional units, in order
to minimize performance degradation due to data verification. Micro rollback differs
from instruction retry [Ciac81] because it is based on clock cycles rather than full
instructions. As a result, micro rollback can be independently implemented in each
module that uses the clock to advance its state, regardless of its function or the pipeline
structure.
detectedoccurserrorerror
cycle 17
Micro Rollback
snapshotsnapshotsnapshotsnapshotsnapshotsnapshot
cycle 16cycle 15cycle 14cycle 13cycle 12cycle 11
time
Figure 2.7: Micro Rollback, Restoring a Saved Snapshot
Storing the state of a simple register can be accomplished by adding a controller
and connecting several register elements in a FIFO fashion. However, for a large
11
register file, it is not practical to duplicate the entire block several times to allow rolling
back multiple cycles. Since only one register (or a few registers, depending on the
instruction set) can be written per clock cycle, a delayed write buffer (DWB) is used to
hold the data targeted for the register file until check is complete, as shown in Figure
2.8 [Tami90a].
write
vvvv
write
Register Addresses
Decoder
CAM
Priority Circuit
Bus 2
Bus 1
FIFORegister FileDWB
Figure 2.8: Register File with Support for Micro Rollback
The content-addressable memory (CAM) contains the destination register
addresses, and a valid bit indicates that the corresponding FIFO buffer has a valid data
to be written to the register file. If a clock cycle does not update the register file, the
DWB is simply shifted to the left without setting the valid bit, and when a valid data is
shifted out of the DWB, it is committed to the register file. Therefore, the depth of
DWB determines the maximum number of cycles that can be rolled back, and
verification must be done within that time. When an error is detected, the appropriate
valid bits are cleared, and rollback is achieved because the state changes never reached
12
the register file. The priority circuit is necessary to retrieve the most recent register
data, even if it has not been written to the register file yet, so that other instructions
dependent on the data are not blocked from execution.
The UCLA Mirror Processor [Tami90b] is a fault-tolerant RISC microprocessor
that is capable of micro rollback. In addition to the on-chip parity checkers and DWBs
for error detection and recovery, two processors can operate in lock-step, one master
and one slave, comparing both external signals and internal signatures . It is very
expensive to route tens if not hundreds of internal signals to the pins. Therefore,
interleaved parity bits of the desired signals, called signatures, are generated with
chains of switching XOR cells [Trem89], and the condensed data is then used for
comparison. When a mismatch is found, both processors are rolled back the same
number of cycles. However, certain transient errors, such as a fault in the register file,
cannot be corrected with DWBs alone. Under these conditions, the faulty data in one
processor is replaced with the correct one from the other processor. If both processors
have errors in the same location, then a higher level recovery scheme is necessary.
AMPIRE supports instruction retry, not micro rollback, but DWBs are still used to
postpone write operations to the register file and the data memory, as will be seen later
in this report. All components of the DWB are present: FIFO, CAM, and the priority
circuit, except that they are replaced by their asynchronous counterparts.
13
Chapter ThreeProcessor Architecture
3.1. Instruction Set
A subset of the DLX instruction set as presented in [Henn90] has been chosen for
AMPIRE. The DLX RISC architecture is now widely studied in computer architecture
classes, and it allows ease of implementation. Verilog models of DLX have been built
at CMU [Siew92], and a VLSI implementation has been designed at the Montana State
University with the Berkeley OCT tools and fabricated through MOSIS [Wint92].
opcode rs rt rd function(6) (5) (5) (5) (11)
31 26 20 15 1027 21 16 11 0
opcode rs rd(6) (5) (5)
31 26 20 1527 21 16 0
immediate(16)
opcode(6)
31 2627 0
offset(26)
R!type
I !type
J!type
Figure 3.1: AMPIRE Instruction Format
The AMPIRE instruction format is shown in Figure 3.1, with minor notational
changes from DLX [Henn90, p. 166]. All register-to-register ALU instructions share a
single opcode number, and the specific ALU operations are encoded in the function
field. The AMPIRE instruction set appears in Table 3.1, and their opcode/function
code assignments can be found in the parameter listing in Appendix A. The opcodes
for the DLX instructions were obtained from [Host91]. The new instructions for fault
simulation are placed in slots unused by the DLX.
14
Data Transfer:LHI rd, imm Load high (upper half of register) with immediateLW rd, imm(rs) Load word from Mem[rs+imm]SW imm(rs), rd Store word to Mem[rs+imm]Arithmetic/Logical (register):ADDU rd, rs, rt Add unsignedSUBU rd, rs, rt Subtract unsigned (rs ! rt)AND rd, rs, rt Bitwise ANDOR rd, rs, rt Bitwise ORXOR rd, rs, rt Bitwise XORSLL rd, rs, rt Shift left logical by (rt mod 32) bitsSRL rd, rs, rt Shift right logical by (rt mod 32) bitsSRA rd, rs, rt Shift right arithmetic by (rt mod 32) bitsSEQ rd, rs, rt Set if (rs == rt)SNE rd, rs, rt Set if (rs != rt)SLT rd, rs, rt Set if (rs < rt)SGT rd, rs, rt Set if (rs > rt)SLE rd, rs, rt Set if (rs <= rt)SGE rd, rs, rt Set if (rs >= rt)Arithmetic/Logical (immediate):ADDUI rd, rs, imm Add unsigned immediateSUBUI rd, rs, imm Subtract unsigned immediate (rs ! imm)ANDI rd, rs, imm Bitwise AND immediateORI rd, rs, imm Bitwise OR immediateXORI rd, rs, imm Bitwise XOR immediateSLLI rd, rs, imm Shift left logical by (imm mod 32) bitsSRLI rd, rs, imm Shift right logical by (imm mod 32) bitsSRAI rd, rs, imm Shift right arithmetic by (imm mod 32) bitsSEQI rd, rs, imm Set if (rs == imm)SNEI rd, rs, imm Set if (rs != imm)SLTI rd, rs, imm Set if (rs < imm)SGTI rd, rs, imm Set if (rs > imm)SLEI rd, rs, imm Set if (rs <= imm)SGEI rd, rs, imm Set if (rs >= imm)Flow Control:BEQZ rs, imm Branch to (PC+4+imm) if (rs == 0)BNEZ rs, imm Branch to (PC+4+imm) if (rs != 0)J offset Jump to (PC+4+offset)JR rs Jump to address in rsJAL offset Jump to (PC+4+offset); store (PC+4) in R31JALR rs Jump to address in rs; store (PC+4) in R31Miscellaneous:NOP No operationADDUF rd, rs, rt Add unsigned with fault (bad parity)ADDUIF rd, rs, imm Add unsigned immediate with faultJRF rs Jump to address in rs with faultSWF imm(rs), rd Store word to Mem[rs+imm] with faultTRAP offset Special simulation function
Table 3.1: AMPIRE Instruction Set
15
The immediate and offset values are sign-extended to 32 bits before the
instructions are executed. The TRAP instruction is normally reserved for exception
handling, but it is used by the AMPIRE simulator to print out register values and to
terminate the simulation. Details will be discussed at the end of section 4.2.6.1.
3.2. Processor Overview
The block diagram of the processor is shown in Figure 3.2, with brief descriptions
of the modules in Table 3.2. Delayed write buffers (DWBs) are used to postpone
commitment to the register file and the data memory until all required checks are
finished. Since each module has a different completion time based on the function
being performed, queues are used to improve concurrency. Queues are transparent to
the functional elements so that the number of buffers can be changed without
modifying other parts of the processor.
3.3. Normal Operation
The asynchronous operations without fault tolerance features will be discussed
first. The order of presentation will follow the flow of instructions through the various
modules.
3.3.1. Instruction Fetch
When the processor is reset, the PC is set to 0, and an instruction fetch cycle is
started. As soon as a memory cycle is completed (acknowledged), the PC is
automatically incremented to start another cycle. This process continues until a branch
occurs. Even though AMPIRE only supports 32-bit read/write, the unit of address is in
bytes to be compatible with DLX. Therefore, the PC is incremented by four each time.
16
Module DescriptionALU Arithmetic logic unit. Except for branch/jump computations, all other
arithmetic and logic operations are handled by this module.ARBK Arbiter for accessing the K_bus (not shown). It controls outputs from
the four checkers.ARBR Arbiter for accessing the R_bus.BIGC Big C-element (not shown) for rollback synchronization.CKF Checker for REGDWB outputs to A_bus and B_bus.CKI Checker for instructions executed by the IIU.CKM Checker for data written to MEMDWB.CKR Checker for data written to REGDWB.DMEM Data memory, single-port, organized as 32-bit words.DQ Data queue for memory-to-register transfers.IIU Instruction issuing unit. It is the main controller that decodes
instructions, reads data from the register file, and dispatches operationrequests to other modules.
IMEM Instruction memory, read-only, organized as 32-bit words.IQ Instruction queue.LOG Instruction log, where uncommitted instructions are kept. All checkers
send their results to the log, and the log issues validation signals orinitiates rollback.
MEMDWB Delayed write buffer for data memory operations.PC Program counter. It starts at 0 when the processor is reset, and it is
automatically incremented after each instruction fetch.REGDWB Delayed write buffer for register file operations.REGFILE Register file, with two read ports and one write port. There are thirty
two 32-bit general purpose registers, with R0 being a constant of zero.RESTABLE Register reservation table. All registers being read or written must be
cleared by the reservation table first.
Table 3.2: Processor Modules
After an instruction is retrieved from memory, IMEM sends a write request to the
instruction queue. The time required to read an instruction from memory does not
change much, even for real self-timed memory, unless it actually contains multiple
elements with different access times. On the other hand, the instruction cycles are
variable, from very short NOP to long delay for reservation clearance (discussed in the
next section). With the IQ, multiple instructions can be pre-fetched during long
17
(4)REGDWB
IMEM
(2)DQ
(2)CKM
CKI(2)
(2)CKR
(2)CKF
REGFILE
ALU(2)
ARBR
IQ(2)PC
IIUSEQRSRTRD
RESTABLE
MEMDWB(4)
DMEM
(8)LOG validate
haltrollback
I_bus
D_busR_bus
K_bus
B_busA_bus
(#)=number of buffers
Figure 3.2: AMPIRE Block Diagram
18
instruction cycles, and new instructions can be made available quickly following short
cycles.
When a branch occurs, the pre-fetched instructions have to be invalidated. The
IIU disables the output buffer of the IQ and sends a new address to the PC via I_bus.
The PC_load signal also causes IMEM and IQ to drop all current transactions, and the
IQ is cleared. The IQ size has been chosen to be two so that the latency through IQ is
not excessive after a branch is taken.
Since the speed of IMEM and the size of IQ are both unknown to the rest of the
processor, the value of the PC module cannot be used to determine the address of an
instruction being executed. One solution, as used in AMPIRE, is to add a separate
program counter inside the IIU for instruction logging and branch computations.
Whereas the PC module is incremented after every IMEM access, this internal counter
is incremented when an instruction is read from the IQ, like a synchronous processor in
every instruction fetch stage. In this respect, the PC module is very similar to the
remote program counter as discussed in [Patt83]. When a branch occurs, the value of
the internal program counter is used to calculate the destination address, and both
program counters are loaded with the same new address.
An alternative to having a program counter in the IIU is to store the instructions
and their addresses in IQ. This way, the IIU simply reads both the instruction and its
address at the same time. However, since the IQ can be arbitrarily long, the costs of
additional storage elements for the buffers and wires for routing 32-bit addresses can be
quite high.
19
3.3.2. Instruction Issue
An instruction issue cycle is started when the IIU accepts an instruction presented
by the IQ. Part of the instruction decoding process determines which registers need to
be read or written, and then they are sent to the reservation table for clearance. Because
the processor is asynchronous, the time required to execute an instruction cannot be
pre-determined, and even the order of instruction completion is unknown. Therefore, a
compiler may not be able to schedule instructions correctly, and data hazard avoidance
has to be handled by the processor.
Before an instruction IY that writes to R1 can be issued, there must not be another
instruction IX that also modifies R1 in the pipeline. Otherwise, IY may be completed
before IX and causes a write-after-write error. Reading operation is similar; any
previous instruction that writes to R1 must be completed before R1 can be accessed
again. In fact, all instructions being executed at the same time must be independent of
each other. Note that completion does not imply commitment to the register file.
When a reservation request is received at the RESTABLE, the reservation bits for
all source and destination registers are checked. If all of them are clear, then the
destination register is reserved, and the reservation request is acknowledged.
Otherwise, the acknowledgement is delayed until the appropriate bits are cleared by the
REGDWB. Since R0 is a zero constant, it is always available.
Even though PC and data memory are also state elements, reservations are not
needed. Each PC operation is executed directly and immediately by the IIU, and since
AMPIRE is a load-store machine, memory transfers occur between the data memory
and the register file only. Memory write operations are carried out sequentially through
the MEMDWB, and memory read operations have register destinations which are
checked by the normal reservation process.
20
After reservation, the source registers need to be read from the register file. Even
though the IIU actually communicates with the REGDWB, the existence of REGDWB
is transparent to the IIU. The operations within REGDWB will be discussed later.
Instructions can be divided into three categories: ALU, data memory, and flow
control. ALU and data memory instructions are sent to ALU and MEMDWB,
respectively, and branches/jumps are executed by the IIU itself. Sign extension of all
immediate and offset values are also performed by the IIU because they are required for
branch and data memory address computations.
3.3.3. Instruction Execution
3.3.3.1. ALU
ALU operations are pretty straight forward. Two operands (except one for LHI
and PASS) and a function code are used to produce a result that is to be written to a
destination register. No condition code is used in DLX nor in AMPIRE.
Unlike an ALU for a synchronous processor, there is no pre-determined time limit
for each operation. For example, a bitwise OR can be done very quickly, but a simple
ripple-carry adder needs much longer time for carry propagation. Even though a
synchronous processor can allocate multiple cycles for time consuming operations,
other components in the pipeline must be designed to accommodate it.
21
3.3.3.2. Data Memory Load/Store
All data memory read and write operations are controlled by the MEMDWB.
Address and data (for SW instructions) are sent from the IIU via A_bus and B_bus,
respectively. Based on today’s technology, a memory device is generally slower than
the speed of a processor. After a data memory request is accepted by the MEMDWB,
the IIU can start working on the next instruction instead of waiting for the DMEM.
Therefore, MEMDWB is useful even if fault tolerance is not needed.
Memory writes are queued in the buffers (first-in, first-out) until DMEM is ready
for the next transaction. Each MEMDWB entry contains the data to be written and its
destination address. For read operations, the address to be read is compared with the
ones in the buffers. If a match is found, data is retrieved from the appropriate buffer
and sent to the DQ, without accessing DMEM. If there is more than one match, a
priority decoder selects the most recent value. If there is no match, then data is read
from DMEM in the next memory cycle, even if other write operations are waiting in the
queue. Memory reads are given priority over memory writes so that the reserved
destination registers are cleared as quickly as possible to improve concurrency. Each
DQ buffer entry includes the memory data and its target register number. The register
number is passed directly from the MEMDWB controller to the DQ, without going
through the MEMDWB buffer elements and DMEM.
A drawback for this process is the need for associative comparators. With 32-bit
addresses, the overhead can be quite high. Another penalty is that before comparisons
can be made, the propagation of data in the queue buffers must be stopped, which may
increase the latency through the queue. This side effect will be discussed in detail in
section 3.5.
If associative comparators are not desirable, an alternative is to put both memory
22
read and write requests in the same queue, and process them in the same order as they
are issued. Because there is no comparator to check whether the memory data is
already in the MEMDWB, memory read cannot be assigned a higher priority than
memory write. Since multiple read requests can be accumulated in the queue, each
MEMDWB buffer also has to store two additional pieces of data: destination register
number and a read/write mode bit. The IIU can still go to the next instruction, but
memory read operations have to wait until all previous memory requests have been
processed by DMEM. As will be discussed in the fault tolerance section, memory
writes are held in the MEMDWB until the data have been validated, which means
longer delay for both read and write operations. This method results in simpler
hardware because address comparators are not used, but performance is lower because
of longer reservation waiting period for memory reads, and higher memory traffic.
3.3.3.3. Flow Control
Flow control instructions include conditional branches and unconditional jumps.
These are handled by the IIU directly so that the ALU can be dedicated to perform
"real" computations. This is similar to how branches are handled in the Berkeley
DSP [Jaco90]. Because branches are based on register comparisons instead of condition
codes, they are independent of previous ALU operations except when the registers are
reserved. The IIU does require an internal adder for address calculations.
It is not clear whether the DLX architecture has delayed branch, but for simplicity,
AMPIRE does not support it. Delayed branch is generally used in a synchronous
processor to reduce branch penalties by minimizing pipeline stalls. Although an
asynchronous processor may also benefit from having delayed branch, it makes
rollback more difficult because looking at the instruction in the delay slot alone does
23
not reveal any information about the branch instruction associated with it. Furthermore,
statistics show that less than 50% of the delay slots are usefully filled [Henn90, p. 276].
However, since delayed branch usually improves performance [Henn90, p. 277], a
method to support it in AMPIRE is briefly described here. The reader should go
through section 3.4 on fault tolerance before reading this and the next paragraph. One
way to handle both delayed branch and rollback is to maintain two sets of sequence
number and check vector for each branch in the instruction log. For example, a branch
INSTb has sequence SEQb and check vector Vb , and the delay slot INSTd has sequence
SEQd and check vector Vd . The instruction log has to store both sequence numbers
SEQb SEQd and both check vectors Vb Vd with the branch instruction INSTb . The
entry for the delay slot INSTd is not changed. When a checker clears a bit in vector Vd ,
two bits are cleared in the LOG, one for each instruction INSTb and INSTd . This way,
the branch instruction cannot be validated until all the check bits for its delay slot are
also cleared. The delay slot is already prevented from being validated before its branch
by the FIFO nature of the LOG.
When either instruction INSTb or INSTd is to be invalidated, the processor has to
be rolled back to the branch INSTb . This means the instruction log has to include a
priority circuit to send out the address for INSTb , not for INSTd . The LOG also needs
to have more buffer elements and comparators to maintain the additional sequence
number and check vector for each log entry. In addition, the IIU and/or the LOG
control circuit becomes more complicated because the check vector for INSTd is not
available until that instruction is fetched and decoded, but it must be stored with INSTb
in the LOG.
24
3.3.4. Register File Access
After an ALU computation or memory retrieval is completed, the R_bus is used to
transfer data to the register file. Since both ALU and DQ may have data ready at any
time, the ARBR arbiter is needed to coordinate access to the R_bus. If only one request
is detected, then that request is granted, and the module can start its transaction with the
REGDWB. If both requests are received at the same time, the one which just had the
bus access has to wait until the other module finishes one transaction.
The DQ helps to reduce the DMEM read cycle time by separating the R_bus
arbitration stage from the data memory. Since the R_bus may be busy when the data
read from DMEM is ready, the DQ buffer allows the read transaction to complete so
that the DMEM can start the next memory operation. This is important because
memory access is likely to be the slowest activity in a processor. The overall memory
read time (from IIU to REGDWB) also depends on the depth of DQ. While the DQ
reduces the DMEM read cycle time, it also adds propagation delay through the queue.
The penalty is the highest for isolated single DMEM read operation, but for multiple
DMEM reads in a row, the average performance may be improved due to concurrency
between DMEM and DQ. The depth of DQ needs to be selected based on the expected
memory activity, but in general it should be kept short so that the worst-case penalty is
not excessive.
After a piece of data is written to the REGDWB, the corresponding register
reservation is cleared by sending the register number to RESTABLE. This can be done
before the actual register is updated because the REGDWB controls both read and write
operations. When a read request is initiated by the IIU, the REGFILE is read while the
REGDWB is searched for a match. This is different from MEMDWB because DMEM
is read only if there is a miss in MEMDWB. The register file has two read ports and
25
one write port, and it is designed to be read and written every instruction cycle. This is
not practical for a large and slow memory like DMEM, so MEMDWB is optimized to
reduce memory traffic.
Since the same register in the REGFILE can be read and written at the same time,
and these two operations occur asynchronously, care must be taken to prevent wrong
(partially written) data from being presented to the requester. Because the data in the
REGDWB is not erased until its write cycle is completed, reading the register still
being written to the REGFILE will also result in a match in the REGDWB. By sorting
all matches in REGDWB and REGFILE through a priority decoder, with REGFILE
having the lowest priority, the correct and most recent data is given to the IIU. The
data read from REGFILE is effectively ignored in this case.
3.4. Fault Tolerance
In this section, the instruction rollback process and the hardware features needed
to support it will be described.
3.4.1. Sequence Number and Check Vector
Because each module in AMPIRE runs at its own rate, the sequence of events,
such as the order of completion, is not predictable. Therefore, a sequence number has
to be assigned to each instruction when it is issued by the IIU. All storage elements for
uncommitted data, including intermediate buffers, the RESTABLE, and DWBs, must
store the sequence numbers along with their data. For example, the instruction
R 3=R 1+R 2 has the sequence number SEQx . When the values of R 1 and R 2 are sent
to the ALU, SEQx is sent there, too. When the result for R 3 is delivered to REGDWB,
the same sequence number SEQx is stored in its buffer, since that data is related to the
26
same instruction. When a rollback is to be done, the processor can determine which
instructions have to be invalidated by checking their sequence numbers. The number of
bits that needs to be assigned to the sequence number is determined by the maximum
number of uncommitted instructions that is desired:
uncommittedmax = 2(number of sequence bits ! 1)
An extra bit is used so that a quick comparison can be made to determine whether
an instruction X comes before or after instruction Y . For example, if 3 bits are used for
the sequence number, then there can be up to 4 outstanding instructions, or half of 3-bit
combinations. Figure 3.3 contains graphical illustrations of the sequence number
"windows".
0
3
1
24
67
5
076
54 3
2
1
without rollover with rollover
forbidden regioncurrent sequence number
Figure 3.3: Sequence Number Windows
If n is the number of bits allocated for the sequence number, then there are N =2n
number of possible sequence numbers, and the maximum number of uncommitted
instructions is2N . At any point in time, the sequence numbers of the most recent
2N
instructions are:
k , (k +1) mod N , (k +2) mod N , ..... (k +2N!1) mod N , where 0"k <N.
27
An instruction INSTx is the same as, or newer instruction than INSTy if and only if:
SEQx = (SEQy + j ) mod N , where 0" j "2N!1.
Hence, j = (SEQx ! SEQy ) mod N , which can be obtained using an n -bit subtractor.
The condition 0" j "2N!1 holds true if and only if the most significant bit (MSb) of j is
0. Therefore, if the MSb of the subtraction result is 0, INSTx is the same as or newer
than INSTy . Otherwise, INSTx precedes INSTy . This result is used to determine if an
instruction should be validated or not in the rollback process.
Another piece of information that has to be determined when the instruction is
issued is the verification steps it must go through before that instruction can be declared
valid. For example, all instructions have to pass the instruction checker CKI, but only
memory operations need to be checked by CKM. A check vector is the collection of
check bits, with each bit identifying a particular checker. An example of a check vector
for ALU operation is shown in Figure 3.4. When all the required checks are verified,
that instruction can be validated. The check vectors are stored and updated in the
instruction log.
1 0 1 1
CKM CKF CKICKR
Figure 3.4: Check Vector for ALU Operation
28
3.4.2. Error Detection
Each data word and address in the AMPIRE is protected by an even parity bit.
This simple mechanism is used only to show how rollback is done when an error is
detected. If more protection is desired, better error detection or correction codes can be
used instead. For storage elements such as memories and queue buffers, the parity bit is
simply passed along with the data to the destination. It is up to the receiver to
determine the data integrity. When an instruction or the register file is read, the IIU
sends a request to the CKI or CKF checker, respectively. The two other checkers,
CKM and CKR, are used by MEMDWB and REGDWB to verify the data requests
received by them.
When a module performs a computation, a new parity bit is generated. This is
done in the PC each time the address is incremented, in the IIU for address calculation
and sign extension, and in the ALU before each result is sent to the register file. The
checkers decide whether there is an error by XORing all the bits. Since even parity bit
is used, a result of 1 would indicate an error. Again, more elaborate schemes can be
integrated into the processor.
All checkers share the K_bus for reporting their results to the instruction log, and
the ARBK arbiter is used to grant bus access to each checker in a round-robin fashion.
The components of the K_bus are the sequence number, the checker identification
number, and the pass/fail signal.
29
3.4.3. Instruction Log and Validation
Uncommitted instructions are stored here, and all instruction validations and
rollbacks are determined by this module. The log contains 8 entries, so 4 bits have to
be used for the sequence number, as discussed in section 3.4.1. Each log entry has a set
of data for an instruction: sequence number, instruction address, and check vector. The
instructions themselves are not stored in the log because they are read from IMEM
again in the event of rollback.
When a checker reports a "pass", the appropriate check bit for an instruction is
cleared. When all of the check bits for the oldest entry in the log become zeros, then
that instruction can be validated and deleted from the log. The first-in first-out
sequence must be maintained because once an instruction is validated, it cannot be un -
validated. Because of the asynchrony, INSTx +1 may pass all its checkpoints before
INSTx does, but if an error occurs in INSTx , INSTx +1 must be undone.
An instruction is validated by sending the sequence number to the DWB(s) so that
the data waiting in the buffer can be written to its permanent location. If there is a
piece of data ahead of it in the queue, whether validated or not, then the commitment
process has to be delayed so that write-after-write hazard would not occur. If an
instruction does not modify a storage element, then no validation signal needs to be
sent. In AMPIRE, an instruction can modify at most one register or memory location.
Therefore, the LOG only needs to send the validation signal to one DWB and not
disturb other modules. The added cost is an extra handshaking wire, but the other
DWB not directly involved in the validation is not slowed down for unnecessary
comparisons. This performance decrease will be addressed in section 3.5. However, if
there are many DWBs in the processor, the trade-offs may have to be re-considered.
30
It is important to note that when the validation is issued by the LOG, the entry to
be validated must already be in the appropriate DWB. This is required because
buffering a validation request and waiting for the appropriate data make this problem
more complicated than necessary. This sequence is ensured by sending the checker
request after the data is placed in the DWB queue. Without a "pass" signal from the
checker, the instruction cannot be validated. If there is no checker associated with a
DWB, then a check bit is still used so that the LOG cannot validate an instruction
without receiving an "arrived" signal from the DWB. The CKI checker is placed on the
B_bus instead of the I_bus for the same reason. The IIU first logs an instruction before
sending it to the CKI so that when the checker notifies the LOG, the entry is already in
the LOG to be processed.
3.4.4. Rollback
If a checker reports an error, then that instruction, and all instructions issued after
it, have to be invalidated. The instruction at which the error occurred is re-executed by
loading the PC with its address from the LOG. The rollback request is sent by the LOG
to all modules except PC, IMEM, and IQ, since they can be easily cleared by the IIU as
if a branch is taken. The high-level signal flow diagram of validation and rollback is
shown in Figure 3.5.
LOG
CKF CKMCKI CKR
validate specific DWB invalidate all modules
Figure 3.5: Validation and Rollback Signal Flow Diagram
31
The instructions and operations which have to be invalidated are determined by
subtracting the error sequence number from the sequence numbers in all of the buffers.
If the high bit of a result is 0, then that entry is either the erroneous one or a more recent
one, and it is erased. If the high bit is 1, than that operation is not affected by the error.
An invalidation request is sent to all buffer elements at the same time, and the
comparison process is done in parallel. After each buffer is finished, the
acknowledgement signals from the individual elements are grouped together by one or
more C-elements, from which a module-level acknowledgement signal is generated and
sent back to the LOG.
Since the IIU and all transactions with the IIU (such as reading the register file)
are always working on the most recent instruction, they can be invalidated without any
comparison with the error sequence number. The reservation table is a special case that
deserves attention, since there is no DWB for it. When a reservation is made, the
sequence number is stored in the corresponding entry in the table. Rolling back the
RESTABLE simply means clearing the appropriate reservation bits based on the same
sequence number comparisons.
Before the rollback can actually take place, all components in the processor must
be synchronized. This is the topic of section 3.5. Further details of the rollback process
will be addressed in section 4.1.6 when the Verilog code is discussed.
3.4.5. Delayed Write Buffer
Since the DWB is an important element that allows rollback to be done, its
hardware structure is summarized here. The DWB for REGFILE is shown in Figure
3.6, with an expanded view of the DWB buffer element. It is an extension of the
synchronous DWB shown in Figure 2.8, with two additional fields: sequence number
32
data reg v seq w
CircuitPriority
REGFILE
DATA 1 DATA 2 REG 1 REG 2
REGFILE
SEQLOG
REG 1 REG 2
reg v seq w
SEQLOG
XX S X
v =valid bitw =wait bit
Figure 3.6: Asynchronous Delayed Write Buffer for Register File
and the wait bit.
The X elements are comparators, and the S element is a subtractor as described
before. The bus SEQLOG is driven by the instruction log during validation and rollback
cycles. Since these two activities are mutually exclusive, only one bus is needed.
When an instruction is validated, if seq = SEQLOG , then the wait bit is cleared, and the
data can be written to the register file if all the entries ahead of it are also cleared.
During the rollback cycle, the subtractor S determines if that entry should be
invalidated by controlling the valid bit.
The two X comparators in squares already exist in the synchronous DWB, in the
CAM section shown in Figure 2.8. For MEMDWB, the two DATA buses are reduced
to one, and the two REG buses are replaced by a single memory address bus.
33
3.5. Synchronization
Because all of the operations in AMPIRE are asynchronous, care must be taken
when a transaction requires more than two components to cooperate.
3.5.1. Module Level
AMPIRE contains a lot of buffer elements to balance the different processing
speeds of the various modules, and data can be transferred from one buffer to another at
any time. Many operations involve searching all the buffers in a queue, such as
matching a memory address in the MEMDWB. A comparison cannot be made when a
piece of data is "moving", and this is true for both synchronous and asynchronous
systems.
A comparison cycle is started by requesting all buffers in a queue to suspend their
transactions. After all buffers have acknowledged, then another request is sent to
collect the comparison results, and the interrupted transactions may continue. This
additional delay is the reason that unnecessary search requests should be avoided, as
done in the instruction validation scheme.
3.5.2. Processor Level
Global synchronization is required for the rollback process because all modules
must invalidate the erroneous data. When an error is detected, the LOG first sends out
a halt request to all the modules, except the ones in the instruction fetch stage (PC,
IMEM, and IQ). Each module is responsible for making sure all of its data transactions
are suspended before the acknowledgement is returned. After all modules have
responded, another signal is sent by the LOG for invalidation, performing the real
rollback. The processor is restarted only when the LOG explicitly releases the halt
34
signal, after all rollback steps are completed.
The sequence of halt and rollback events are shown in Figure 3.7. Each request
signal is a single wire from the LOG routed to all modules. Each group of the
acknowledgement signals is fed into the inputs of a large C-element (BIGC), and the
single output goes back to the LOG. Therefore, the LOG has no knowledge of the
number of modules in the processor. Since the C-element function is associative, the
BIGC can be physically distributed as smaller C-elements.
stop outputs restart
load new PCterminate handshakinginvalidate entries
Halt_REQ
Halt_ACK
Rollback_REQ
Rollback_ACK
Figure 3.7: Halt and Rollback Sequence
35
Chapter FourBehavioral Modeling
In order to verify that the AMPIRE design is architecturally correct, a behavioral
model of the processor has been built with the Cadence Verilog hardware description
language. Some selected modules and submodules have also been designed and
simulated at the gate level to show how they may be implemented in VLSI, and they
will be discussed in Chapter 6.
The complete Verilog code can be found in Appendix A. Verilog resembles
procedural programming languages such as C and Pascal, with extensions for hardware
simulation. Some frequently used statements and expressions are listed in Table 4.1 for
reference. For detailed information, please see [Cade91].
Keyword Explanation# Delay execution for a specified time.
Delay execution until an event occurs (edge sensitive).@always Statement is executed repeatedly until simulation is terminated.disable Abruptly terminate a block of statements.fork-join Statements within the structure are executed in parallel.initial Statement is executed once when simulation is started.reg A register variable that holds value.wait Delay execution until expression becomes true (level sensitive).wire A net for signal declaration and connection.
Table 4.1: Frequently Used Verilog Keywords
36
4.1. Building Blocks
The basic code elements and techniques which are widely used in the various
processor modules will be presented in this section.
4.1.1. Tri-State Bus
An output can be tri-stated by assigning a high-impedance value z to the output
variable, and the following is the syntax for conditional assignment:
variable = expression ? value if true : value if false;
Therefore, a tri-state bus connection (assuming 33-bit wide) can be specified in Verilog
continuous assignment as:
assign bus_variable = enable_signal ? output_value : 33’b z;
This way, multiple sources can be connected to the same bus, with the restriction that at
most one may be enabled at any time.
4.1.2. Muller C-Element
The following code simulates a 2-input C-element:
always @(input_1 or input_2)if (input_1 & input_2)
output = 1;else if (˜input_1 & ˜input_2)
output = 0;
The first line activates the always block only when either input changes its value.
The rest of the code simply follows the behavior of the C-element. When both inputs
are high, the output is high; when both inputs are low, the output becomes low. The
code segment can be easily expanded to handle C-elements with more than two inputs.
An example is the bigc.v that has 14 input ports. Since the C-element contains state
information, its state (output) has to be initialized in the reset routine.
37
4.1.3. Buffer
Many buffers are used in AMPIRE to improve concurrency because each module
runs at its own speed. Buffers are also needed to store uncommitted instructions and
data. The basic buffer structure is shown below, divided into input and output sections.
always wait (req_in & ˜valid)begin :input_cycle
#1;data_out = data_in;ack_in = 1;valid = 1;wait (˜req_in);#1;ack_in = 0;
end
always wait (valid)begin :output_cycle
#1;req_out = 1;wait (ack_out);#1;valid = 0;req_out = 0;wait (˜ack_out);
end
Each input and output cycle follow the 4-phase handshake convention discussed in
section 2.1, and each one can be considered as a process executed in parallel. The input
cycle is started by receiving an input request REQin . When data is latched, the input is
acknowledged, and the output cycle is initiated by setting the valid flag. The input
cycle is also guarded by valid so that a new input would not be accepted until the
previous data has been sent to another unit. The #1 statements simulate the delays
between the input and output signals, and they allow the Verilog simulation clock to
advance so that the sequence of events can be observed.
A simple asynchronous queue can be built by stacking the buffer elements. A
piece of data added to the queue would ripple toward the other end until it hits another
38
data entry. The queues used in AMPIRE require additional controls because of the
need to support rollback and other functions, and they will be discussed in separate
sections. As with the C-element, the internal variable valid and the external output
signals have to be initialized when the processor is reset.
4.1.4. Reset
Just like most digital systems, some state elements have to be initialized when the
system is reset. The following reset routine provides the necessary initialization steps
for both the C-element and the buffer discussed in the previous two sections:
always wait (reset)begin
disable input_cycle;disable output_cycle;valid = 0;ack_in = 0;req_out = 0;output = 0; // for C-elementwait (˜reset);
end
Disable is a convenient way for an interrupt handler, such as this reset routine, to
terminate other concurrent procedures. These disable commands are needed because a
reset may occur in the middle of a data transaction, not just when "powered-on". Since
the various modules may receive the reset signal at different times, and the time
required to complete the reset process may be different, the disabled routines must not
continue until the reset signal is withdrawn. This can be done by adding reset as one
of the enabling conditions:
always wait (req_in & ˜valid & ˜reset)always wait (valid & ˜reset)
Since no handshake is involved in the reset process, the reset signal must be
applied long enough for all modules to recognize it and complete the initialization. For
39
convenience, all reset routines in the behavioral level are executed in zero simulation
time. However, gate-level modules do have minimum reset pulse-width requirements,
which will be discussed in Chapter 6.
4.1.5. Arbiter
An arbiter is needed whenever two or more devices may want to access the same
resource at the same time, which can happen because of the asynchronous nature of
AMPIRE. As described before, the two bus arbiters ARBK and ARBR control K_bus
and R_bus, respectively. Some modules, such as LOG and MEMDWB, also require
internal arbiters for sequencing control.
Arbiters can be divided into two categories: prioritized and non-prioritized. Non-
prioritized arbiter is fair and operates in a round-robin fashion. It can be represented in
the following code:
always wait (req1 | req2 | req3)begin
found = 0;while (˜found)
begingrant = grant + 1;if (grant >= 4)
grant = 1;case (grant)
1: if (req1) found = 1;2: if (req2) found = 1;3: if (req3) found = 1;
endcaseend
[process grant handshake]end
If there are only two request lines, the code can be simplified as shown in the
arbr.v listing in Appendix A. Sometimes a prioritized arbiter should be used because
of system requirements, or just for better system performance, as done in log.v and
memdwb.v . Some simple nested if-else statements can be used to describe its behavior:
40
always wait (req1 | req2 | req3)begin
if (req1)[process request 1]
elseif (req2)
[process request 2]else
[process request 3]end
4.1.6. Rollback
The rollback scheme was introduced in the last chapter, but some details were left
off because looking at the behavioral code should provide more insight on how the
rollback works. The following is the basic structure:
always wait (halt_req)begin :rollback_cycle
#1;halt_ack = 1;wait (rollback_req);#1;disable input_cycle;disable output_cycle;ack_in = 0;req_out = 0;
diff = sequence_num - sequence_error;if (˜diff [high_bit])
valid = 0;
rollback_ack = 1;[finish halt and rollback transactions]
end
The rollback cycle follows the sequence of events shown in Figure 3.7. Each
module (and sub-module) that needs to roll back must provide its own rollback handler
that supports this protocol. After the halt request is received, the current data
transactions are stopped by disabling the handshake outputs. The halt
acknowledgement is sent without any feedback from the I/O routine/circuitry because
local signal propagation can be easily controlled. The actual instruction invalidation is
41
not performed until the rollback request is given, which means all modules in the
processor have acknowledged the halt request issued by the instruction log.
Whether a data entry should be deleted is determined by the sequence number
comparison, as discussed in sections 3.4.1 and 3.4.4. If the high bit of the result is zero,
then the entry is invalidated. After the rollback of this module is completed, an
acknowledgement is sent, and the rest of the code just finishes the 4-phase halt and
rollback handshake cycles.
Suspending input and output cycles as part of the rollback process can be
accomplished by adding wait (halt_req ) statements before each group of output
commands, as shown below:
always wait (req_in & ˜valid & ˜halt_req)begin :input_cycle
#1;wait (˜halt_req);data_out = data_in;ack_in = 1;valid = 1;wait (˜req_in);#1;wait (˜halt_req);ack_in = 0;
end
always wait (valid & ˜halt_req)begin :output_cycle
#1;wait (˜halt_req);req_out = 1;wait (ack_out);#1;valid = 0;wait (˜halt_req);req_out = 0;wait (˜ack_out);
end
As can be seen in the code or in the 4-phase handshake diagram shown in Figure
2.2, the input cycle alternates between waiting for request and sending
42
acknowledgement, and vice versa for the output cycle. Once the output cycle issues a
request, it cannot drop the request abruptly without receiving an acknowledgement first,
or the handshake protocol would be violated. The only exceptions are: (1) when both
parties of the transaction are notified before the cancellation, as in the case of rollback,
or (2) when a local or global reset occurs, as for the instruction fetch modules (PC,
IMEM, and IQ) during a branch.
The rollback process deletes all the data associated with the erroneous and more
recent instructions. There is no problem with that because every data entry is tagged
with a sequence number. However, care also must be taken to ensure that the processor
is in a correct state for the data not invalidated in the rollback. Specifically, there must
not be any deadlock, data loss, or data duplication. The three cases are discussed
below:
There is no deadlock because all transactions are cleared during rollback and
restarted afterwards if necessary. Therefore, no module is left in the middle of a
handshake cycle waiting for a signal that will never come.
There is no data loss because all data transfers follow the sequence shown below.
The possibility of duplication in the second phase is eliminated by the next step.
validinvalid
validvalid
invalidvalid
Receiver:Sender:
Figure 4.1: Data Transfer Sequence
In the receiving end, the valid bit is set when acknowledgement is sent, as can be
seen in the code for the input cycle. For the sender, the valid bit is reset when
acknowledgement is received. In order to avoid duplication, the sender must wait
for and respond to the acknowledgement signal after the transfer request is sent.
43
This is consistent with the 4-phase handshake protocol. Note that there is no
wait (halt_req ) blocking the statements between req_out=1 and valid=0 in the
output cycle. If the receiver detects the halt request before acknowledgement is
sent, the sender still has the valid bit set, and the transaction will be restarted after
the rollback cycle, if data is not invalidated. Otherwise, the transfer is treated as
completed, even if the 4-phase handshake protocol is not used to release the
request and acknowledgement signals.
4.2. Processor Modules
Now that we know how the basic elements work, the complete modules will be
discussed by their functional groups. Only code segments will be shown here as
needed. Please see Appendix A for the full Verilog model.
4.2.1. Queues
There are two dedicated queues: instruction queue (IQ) and data queue (DQ).
Other internal queues are very similar. The IQ contains only two buffers, but it also
needs to support local reset to invalidate pre-fetched instructions whenever a branch
occurs. The cancellation cycle is similar to the reset routine, except that the
cancellation is performed with the 4-phase handshake protocol to notify the IIU that
invalidation is finished. A C-element is used to combine the two cancellation
acknowledgement signals from the buffers, as can be seen in iq.v , at line 35. When a
rollback occurs, the IQ is simply cleared with the same signal, so a separate rollback
routine is not needed.
The DQ is a combination of a queue and a bus-access handler. The queue buffer
includes a rollback routine to selectively invalidate data entries. The bus handler sends
44
a request to the ARBR arbiter and then proceeds with the actual transfer after grant is
given. The bus handler also needs to support rollback because data is latched before the
bus cycle is started.
4.2.2. Checkers
All four checkers are functionally and structurely identical, and they differ only in
the number of data bits being checked and in their identification numbers. Each
checker has an internal queue with depth of two, and of course, all components of a
checker must support rollback. Errors are detected by XORing all the bits in a word
along with its parity bit, as in the following code:
parity = 0;for (loop=0; loop<1+DATA_WIDTH; loop=loop+1)
parity = parity ˆ data_buf[loop];
If the resulting parity is one, then an error has occurred because even parity is
used. For checkers with two data words, the two parity bits are generated individually.
After a bus grant is given by ARBK, the checker sends the result to LOG for validation
or invalidation.
4.2.3. Memories
The three memory modules are: IMEM, DMEM, and REGFILE, which all use the
Verilog memory structure. The IMEM is read-only and pipelined so that the PC can
start its next cycle when the instruction is sent to IQ. The pipelined handshake is
handled here by a fork-join structure rather than separate input and output cycles as in
the buffer:
45
always wait (in_req)begin
data = imemory [addr[ADDR_WIDTH-1:ADDR_IGNORE]];fork
beginin_ack = 1;wait (˜in_req);in_ack = 0;
endbegin
out_req = 1;wait (out_ack);out_req = 0;wait (˜out_ack);
endjoin
end
Because the the unit of address is byte, but the the memory is organized as 32-bit
words, the ADDR_IGNORE parameter is used to eliminate the two lower address bits.
DMEM is not pipelined in the read mode because the destination register and the
sequence number from MEMDWB are routed directly to DQ and not through DMEM.
This way, the data from MEMDWB are not released until they are latched in DQ, along
with the data read from DMEM. The following is the non-pipelined handshake, in
contrast to the pipelined version used by IMEM:
always wait (in_req)begin
data_out = dmemory [addr[ADDR_WIDTH-1:ADDR_IGNORE]];out_req = 1;wait (out_ack);in_ack = 1;out_req = ZZZ; // release requestwait (˜in_req);in_ack = 0;wait (˜out_ack);
end
The input is acknowledged after the output acknowledgement is received. Note that
the request signal is released by tri-stating it because it is shared with MEMDWB. That
line is pulled down by an external source.
46
Since the IMEM is read-only, the program is loaded from the instruction file only
when the simulation is started, in an initial block. The same file is also loaded into
DMEM so that memory data defined in the program can be accessed. However, the
loading is done in the reset routine because DMEM has to be initialized when a reset
occurs. Both IMEM and DMEM have "error correction" capability to re-calculate the
parity bit after a rollback, to simulate transient memory faults. The correction is
triggered by the retry flag from IIU.
While IMEM and DMEM are single-ported memory, the REGFILE has three
ports, supporting two reads and one write simultaneously. Even though the register R0
is a zero constant, operations that write to it are carried through as usual, but reading
that register always results in zeroes, as determined by these two statements:
rd_data1 = rd_reg1 ? rfile [rd_reg1] : 0;rd_data2 = rd_reg2 ? rfile [rd_reg2] : 0;
4.2.4. Program Counter
The PC module is a loadable incrementing counter. During normal operations, the
counter is incremented by four after each instruction fetch, and a new parity bit is
generated. When a branch occurs, the load cycle interrupts the increment cycle and
obtains the new address from IIU.
4.2.5. Arithmetic Logic Unit
The ALU itself is very simple because Verilog operators are directly used for all
arithmetic and logic functions except SRA (shift right arithmetic), which is
implemented by:
47
for (loop=0; loop<(data2 % DATA_WIDTH); loop=loop+1)out_data = {data1[DATA_WIDTH-1], data1[DATA_WIDTH-1:1]};
The whole word is shifted to the right, but the highest bit, or sign bit, is kept the
same. After each computation, a new parity bit is generated to accompany the data to
its destination. For fault simulation, if the requested function is ADDUF (add unsigned
with fault), then the parity bit is initialized to 1, which would result in an incorrect
parity bit:
parity = (func == FUNC_ADDUF);
The rest of the ALU module contains an input queue and routines for handshake and
rollback.
4.2.6. Controller
4.2.6.1. Instruction Issuing Unit
The AMPIRE controller is consisted of the instruction issuing unit (IIU) and the
reservation table (RESTABLE). In addition to performing the steps between
instruction fetch and execution, the IIU also handles branches and jumps. Within IIU,
the general sequence is: receive the instruction, decode, log, notify CKI, check
reservation, read registers, notify CKF, and send request to an execution unit.
The "fetch cycle" actually just accepts the instruction from the IQ. The instruction
and its address from the internal PC are latched in the two bus output buffers before the
internal PC is incremented. Then parity bit for the new PC is calculated, and the
go_decode flag is set to start the decoding routine.
One of the steps in the decoding cycle is determining the check vector so that
instruction can be logged. This is done with the case structure at line 163 of iiu.v . As
48
discussed in section 3.4.3, the instruction must be accepted by LOG before it can be
sent to the CKI checker. Therefore, the IIU waits for the log acknowledgement at line
186 before starting the CKI handshake in the fork-join block.
Depending on the opcode, different tasks , or subroutines, are called to perform the
appropriate functions. All register-to-register ALU operations are assigned a single
opcode number (0), and they have to be decoded further using the function field, at line
244. Note that NOP (binary 0) is also decoded in this group because its opcode field is
the same as the special opcode. All ALU instructions are processed by the ALU
control task, which reserves and retrieves the necessary registers. Before these two
steps can be taken, the three output variables reg_rs , reg_rt , and reg_rd must be setup
properly with data from the instruction word. Any unused field is assigned a zero since
R0 is always available for the reservation process. The two source operands are latched
in the bus output buffers by these two statements:
abus_out = abus_in;bbus_out = (opcode == OP_SPECIAL) ? bbus_in :
{{(DATA_WIDTH-IMM_WIDTH){imm[IMM_WIDTH-1]}}, imm};
Abus_in and bbus_in are the bus input registers containing values read from the
register file. The first operand (abus_out) is always a register value as determined by
the instruction set. The second operand (bbus_out) is either a register value or a sign-
extended immediate data from the instruction word. A conditional assignment based on
the opcode is used to determine the proper selection. Finally, the ALU handler is in
charge of generating the parity bits and carrying out communication with the ALU
module.
The two tasks for memory and branch operations should be pretty straight forward.
For jumps, the JAL and JALR instructions also need to have the previous PC stored in
R31. Therefore, PC is stored in the output buffer at line 341 before the new PC is
49
calculated. After the new internal PC is loaded into the PC module, the previous PC is
sent to ALU and then passed to the register file.
The PC handler updates the PC module and invalidates the pre-fetched
instructions in IQ. When the inst_en line is pulled low, the output buffer of IQ is
disabled, and the internal PC is gated to I_bus. The request signal that initiates the PC
module loading cycle also cancels all current transactions in IMEM and IQ. The
acknowledgement signals from all three modules have to be received by the IIU before
the next handshake step is taken. This is simply another C-element.
The rollback routine for IIU is slightly different from other ones because it has to
initiate recovery after rollback. After the halt request is issued, it has to wait for any
current PC loading cycle to finish before it can proceed, which is accomplished by line
94:
wait (˜newpc_req & ˜pc_ack & ˜imem_ack & ˜iq_ack);
The PC handler must be allowed to finish its 4-phase handshake cycle because the
same interface is to be used for loading the address being rolled back, and the three
instruction fetch modules must be left in a usable state. After all modules have rolled
back, as indicated by the LOG releasing the roll_req signal, IIU updates the sequence
number and the internal PC from data stored in the LOG:
seq_num = val_seq;int_pc = a_bus;
Going back to the fetch cycle at line 140, there is an "extra" enabling condition
inst_en . That is necessary to prevent the fetch cycle from being activated before a
valid instruction is available from IQ after rollback. The inst_en signal is reset in the
rollback cycle and enabled in the PC handler, after the rollback address is loaded in the
PC module.
50
A few fault simulation instructions have been added to test the fault-tolerant
portions of the processor. However, since the processor is rolled back to the instruction
where error occurred, simply executing it again would cause the same error and result
in an infinite loop. A retry flag is set in the rollback cycle to indicate that the next
instruction to be executed is the second attempt. That flag is then used to "correct" the
error so that rollback would not be triggered again for the same instruction. The code
can be found at lines 275, 306, and 352.
Another instruction used for simulation is TRAP (as is done in the DLX simulator)
that is supported by the task at line 375. It is normally used to call exception handling
routines in the DLX, but AMPIRE does not support interrupt/exception. Traps from 0
to 31 cause the simulator to print the register values in decimal, hexadecimal, and
binary formats. Trap BLANK (90) simply prints a blank line for output formatting.
Finally, trap STOP (100) is used to signal the end of program. No new instruction will
be issued afterwards, but the ones being executed are allowed to finish within the time
pre-determined in the test setup.
Note that the sequence of executing multiple handshake cycles in the IIU is not
optimal for concurrency. For example, the reservation cycle is completed before the
register file can be accessed because these two routines are in separate tasks. In reality,
the register file request can be sent as soon as the reservation is acknowledged, and the
rest of the handshake cycle can be finished in parallel. However, doing so in the
already complex IIU module would make the code less readable and less modular.
Higher concurrency can be more easily explored in the actual implementation, or in
terms of signal transition specifications [Meng89].
51
4.2.6.2. Reservation Table
The need for register reservation was discussed in section 3.3.2. The reservation
table uses two Verilog memory structures, one for the reservation bits, and another one
for the storage of sequence numbers, both addressed by the register number. When a
reservation request is issued, the first step is to wait until all three reservation bits are
cleared:
wait (˜res_table [reg_w] & ˜res_table [reg_r1] &˜res_table [reg_r2]);
Even though any of the reservation bits may be cleared at any time, there is no
problem with glitches because any change must be from 1 (reserved) to 0 (not
reserved). Furthermore, only one bit can be changed at any time because there is only
one clearance port. Then the destination register is reserved by setting its reservation
bit and storing the sequence number in the corresponding slot:
res_table [reg_w] = 1 & (reg_w != 0);seq_table [reg_w] = seq_num;
The first line above also makes sure that R0 is never reserved because an
instruction writing to R0 does not have any effect on the register file. When a rollback
occurs, each sequence number is compared with the error sequence number, just like
other storage elements. A for-loop cycles through all of the slots sequentially, but in
reality, associative comparators should be used. Invalidation is accomplished by
clearing the appropriate reservation bits.
52
4.2.7. Delayed Write Buffers
A DWB provides a temporary storage for data not yet committed to its permanent
target so that if an error occurs, rollback can be made by deleting the appropriate data
from DWB. The two DWBs are the two largest modules in AMPIRE in terms of code
size, but a lot of the code is related to submodule declarations and connections.
4.2.7.1. REGDWB
In the REGDWB controller module, there are five main functional cycles in
addition to the standard reset and rollback routines: input, check, clear, commit, and
read. The input cycle directs the queue to accept data and then enables the check and
clear cycles, which requests CKR to check the data and clears the reservation at
RESTABLE, respectively. The input routine is guarded with the go_check and
go_clear signals to prevent it from starting another cycle before the previous one is
completed.
When the oldest entry in the queue reaches the last buffer, its output request would
start the commit cycle. However, it does not mean that data is ready to be written to the
register file unless the wait bit has been cleared by LOG. After that bit is cleared, a
write request is sent to the register file. This step is not pipelined because data in the
queue cannot be deleted until it is written to the register file. This is done so that a read
operation always has a valid data to be read. If the commit cycle is pipelined, then the
most recent data may be held in a buffer that cannot be searched, or data is just being
written to the register file and not stable enough to be read properly. Also, the two
com_ack transitions are guarded with the rd_req signal so that data cannot be dropped
in the middle of a read operation.
The read request from IIU goes directly to the REGDWB queue and REGFILE,
53
but the two acknowledgement signals do not return to IIU but are used to activate the
read cycle in the controller instead. If there is a match in the buffer, then that data is
gated to the bus. Otherwise, data from the register file is selected. Multiple matches in
the queue are resolved by a priority circuit, but that is transparent to the controller. The
read cycle also supports fault simulation by inverting the parity bit if the sim_f signal
from IIU is active.
The regdwbq module connects the four buffers in a usable fashion with four C-
elements to handle four sets of acknowledgement signals and a priority routine to
deliver correct data to the REGDWB controller. When multiple matches are found, the
casex structures select the latest entries based on the match results. The two source
registers are handled separately.
In an individual buffer, the validate cycle clears the appropriate wait bit when its
corresponding instruction is validated by the LOG. The find cycle is responsible for
checking if its buffer contains the data for the register being read. Note that a match
signal for R0 must be suppressed because that register cannot be overwritten, although
any instruction may try to do so. Otherwise, a non-zero value may be returned for R0
because data in the buffer has higher priority than the register file.
The input and output cycles are the main handshake routines for data transfers,
similar to the ones in section 4.1.6. The major difference is that the DWB buffer has to
pause when the validate and find operations are active. Both of these two functions are
associative searches, and the validate cycle also modifies a bit in the buffer. Before a
comparison can be made, the input and output routines must be stopped to keep the data
stationary. Therefore, each output group is guarded by this statement:
wait (˜val_req & ˜val_ack & ˜read_req & ˜halt_req);
54
Because of this requirement, each search may slow down the normal flow of data in the
queue. This penalty was discussed in section 3.5.1.
4.2.7.2. MEMDWB
The queue for MEMDWB is basically the same as the one for REGDWB, except
that the register numbers are substituted with memory addresses for data tracking and
searching. The controller has to be modified to handle a single-ported memory rather
than a three-ported register file.
Each memory transaction can be either a read or write, and the input cycle has to
process the data accordingly. Memory writes are queued in DWB, but memory reads
are not buffered in order to reduce reservation waiting time, as explained in section
3.3.3.2. For a read operation, the read routine is enabled by setting the go_read flag,
and then a check request is sent to CKM. For a write operation, data is sent to the
DWB queue.
After the read cycle is activated, the first step is to search DWB for any data
destined to the same memory location. If there is a match, then data is retrieved from
the buffer and sent to DQ directly, and a memory access is saved. If there is no match,
then DMEM has to be read. Since DMEM is a single-ported memory, only one read or
write operation can be done at once. Therefore, an internal arbiter is used for mutual
exclusion, with priority given to the read request for higher performance, as registers
are reserved for loading from memory. Even though both MEMDWB and DMEM may
send data to DQ, there is at most one outstanding memory read operation at any time,
so no arbitration is required for accessing the DQ.
Interactions between the read and commit cycles need some explanation. The read
cycle issues a search request and then waits for arbitration. At the same time, the
55
commit cycle may be writing to DMEM and will then acknowledge the queue to delete
the committed entry. However, as explained in the last section with REGDWB, the
input and output routines must be suspended when the queue is searched. The question
is whether there is a possibility of deadlock since the commit cycle is not able to delete
the entry from the queue. The following code is taken from the commit_cycle in
memdwb.v , but not as a continuous block:
wait (out_ack);com_ack = 1;out_req = 0;arbwr_req = 0;wait (˜com_req);
After DMEM has completed the write cycle (out_ack becomes 1), an
acknowledgement com_ack is sent back to the queue for deletion. At this time, both of
the DMEM and arbitration requests are released, so the read cycle is able to continue
after grant is given by the arbiter. The commit routine cannot finish its cycle until the
queue search is completed, but that does not interfere with the read operation.
4.2.8. Instruction Log
The instruction log stores every instruction issued by the IIU until each one is
completely checked and validated. If an error occurs, the LOG contains information
necessary to roll back the processor and re-execute the offending instruction. The LOG
can be split into a controller and an actual storage log that is organized as a queue. The
IIU logs an instruction by communicating directly with the log, not through the
controller.
The controller has two main functions: validation and rollback. Each checker
reports to LOG with the checker ID, the sequence number, and the pass/fail signal. If
there is no error, then the appropriate check bit in the log is cleared. If the check failed,
56
a rollback cycle is activated, using the sequence discussed before. Each module that
needs to roll back must monitor these signals and respond accordingly. An arbiter is
needed because the sequence number output port is shared between the validation and
rollback routines.
When the check bits of the oldest entry in the log are all cleared, then that
instruction can be validated. Instruction validation involves notifying one of the DWBs
and deleting its entry from the log. Data from the check vector, format shown in Figure
3.4, is used to determine which DWB contains the waiting data. Since there are two
DWBs in AMPIRE, the CKR and CKM check bits are stored separately as DWB bits,
which are not reset by the check bit clearance process. The four possible combinations
are shown in Table 4.2.
<CKR,CKM> Target DWB Instruction Examples00 None NOP, Conditional Branches01 MEMDWB SW (store word)10 REGDWB LHI, All ALU Instructions11 REGDWB LW (load word)
Table 4.2: DWB Bits in the Instruction Log
Cases 01 and 10 should be obvious. For the instruction LW, it has to go through
both checkers CKM and CKR, but the data must be in REGDWB when the instruction
is validated. By sending the validation signal to only one DWB, as indicated by the
DWB bits, unnecessary queue searches are avoided. If the DWB bits are 00, that
instruction does not cause any changes to the permanent storage modules, and it can be
simply deleted from the log.
When the processor is rolled back, the instruction address matching the error
sequence number is retrieved from the log and sent to IIU via A_bus. That is done in
57
the rollback cycle of the log buffer, at line 354 of log.v :
if (valid & (out_s == clr_s))tri_addr = out_d;
There is no conflict with multiple matching because each valid log entry has a unique
sequence number within the current "window". The tri_addr register is put back to
high-impedance state after the rollback cycle.
4.3. Putting the Processor Together
4.3.1. Parameters and Timing
The parameter file declares the constants used by all processor modules,
including the opcode assignments, instruction field widths, module delay factors, etc.
The DLX instruction opcodes are taken from the documentation in [Host91] so that the
same instructions would have identical binary code. Most importantly, the behavior of
the processor can be modified by adjusting the delay parameters. For example, the
order of instruction completion can be changed by increasing the access time of the
DMEM module, which will be shown in the next chapter with simulation results.
Whether the processor is delay-insensitive or just self-timed depends on the actual
implementation at the transistor level. The behavioral model of AMPIRE is self-timed
in the sense that the delay values in all modules may be changed without affecting the
processor’s functionality and correctness, but it is not delay-insensitive because single-
rail data and requests are used, as discussed in section 2.1. Of course, delays cannot be
placed anywhere because logical grouping of some statements must be observed. Most
delays are set at 1 to minimize idle simulation time while allowing the Verilog clock to
advance.
58
4.3.2. Top-Level Wiring and Testing Module
Ampire.v is the file that connects all processor modules together, with a lot of
wires. The buses are declared to be trireg to simulate the capacitance in wires by
keeping the signals at the last driven values when all drivers are at high impedance
state. This only reduces the number of signal transitions and is not required to satisfy
any timing conditions in AMPIRE. Passive pulldowns are used to keep the shared
handshaking lines from floating because all handshake transitions are significant in a
real asynchronous system, and noise should be minimized.
The test setup routine appears toward the end of the ampire.v listing. A large
initial block organizes the signals for generating simulation waveforms that will be
shown in the next chapter. The other initial block starts the processor by issuing a
global reset. After that, the processor runs at its own pace until the stop instruction is
encountered or until the preset simulation time is exceeded. The #100 delay before
stopping Verilog allows instructions still running in the processor to be completed. The
high delay value for abnormal termination is mainly used for catching infinite loops
created by software and "hardware" bugs.
59
Chapter FiveBehavioral Simulation
5.1. Instruction Fetch/Issue and Queue Operations
In this chapter, several test programs and their simulation waveforms will be
presented. Figures 5.1 and 5.2 show the effects of running the following program with
slow and fast instruction memories, respectively. Figures 5.3 displays only the first 85
time steps of Figure 5.2 so that the sequence of events can be easily observed.
// ; test1.a --// ; jumps, IF stage, reservation, ALU & REGDWB queues//// ; DLY_ALU_ADD=20, DLY_ALU_OR=30, DLY_ALU_XOR=10// ; DLY_ALU_PASS=5, DLY_IMEM_RD=1 and 10//
000000000 // 00 nop00C000018 // 04 jal sub
//034000001 // 08 ori r0, r0, 1 ; write R003401000F // 0C ori r1, r0, 15038020005 // 10 xori r2, r0, 5000221821 // 14 addu r3, r1, r2 ; read after write050230004 // 18 slli r3, r1, 4 ; write after write144000064 // 1C trap 100 ; stop
//14BE00000 // 20 sub: jr r31 ; return
Im_req is the request signal from PC to IMEM to start a memory access cycle,
and im_ack is the corresponding acknowledgement. Iq_req from IMEM notifies IQ
that an instruction has been retrieved, and then IQ sends iiu_req to IIU when the
instruction is ready at the output of the queue. Pc_load is active when a branch is
taken, and the two pc_load pulses represent the two jumps in the test program. Please
see Figure 3.2 for direction of data flow.
Since the instruction cycle continues until IQ fills up or a branch occurs, the
60
test1a1 with DLY_IMEM_RD=10 Time Scale: 0 to 346
pc_load
pc_out
im_req
im_ack
im_inst
iq_req
iq_ack
I_bus
iiu_req
iiu_ack
seq_num
12c02802412011c
14be00000
144000064050230004
87654321
Figure 5.1: Test 1 with Slow IMEM
test1a2 with DLY_IMEM_RD=1, Full Time Scale: 0 to 320
pc_load
pc_out
im_req
im_ack
im_inst
iq_req
iq_ack
I_bus
iiu_req
iiu_ack
seq_num
12c02802412011c018030
14be00000
144000064050230004
87654321
Figure 5.2: Test 1 with Fast IMEM
test1a3 with DLY_IMEM_RD=1, Close-Up Time Scale: 0 to 85
pc_load
pc_out
im_req
im_ack
im_inst
iq_req
iq_ack
I_bus
iiu_req
iiu_ack
seq_num
00c10803012c02802412011000c108104
3221
A
B
C
Figure 5.3: Test 1 with Fast IMEM, Close-Up
61
number of pre-fetched instructions depends on the speed and depth of IQ and the access
time of IMEM. Looking at the im_ack signal before the first pc_load pulse, 3 memory
cycles completed in Figure 5.1, compared with 5 cycles in Figure 5.2. Furthermore,
looking at point A in Figure 5.3, the response time of im_ack is significantly slower
than the previous four. At that time, the IQ is full and cannot accept more data until an
entry is removed from its output at point B.
As discussed in section 2.1, the 4-phase handshake protocol requires that an
activated request signal stays high until its acknowledgement is received. Checking
these waveforms would reveal many violations; point C in Figure 5.3 is an example.
However, these are caused by local resets of the IF stage as initiated by pc_load and,
therefore, are not real violations of the protocol.
Figure 5.4 shows the simulation result of the same test program, displaying signals
from other parts of the processor. The JAL instruction needs to store the current PC in
R31, and that register is reserved at point A. The next instruction JR R31 checks for
register availability at point B, but it is held until R31 is cleared at point J. Note that
R31 is not written to the register file until point N, after that instruction is validated at
point R. This means that R31 is read from REGDWB instead of the actual register file.
The two ORI instructions demonstrate that "writing" R0 does not block other
instructions from using it, since it is a zero constant. R0 with the first ORI is not
cleared until point K, but reservation clearance for the second ORI is already completed
at point C. It may be argued that the register clearance at point K and the write
operation at point O are not necessary, since R0 cannot be written anyway. However,
eliminating these steps would need at least an additional zero detector to handle this
special case; another performance and cost trade-off.
The waveforms at points (D,L) and (E,M) indicate the dependencies for read-
62
test1b with DLY_IMEM_RD=1 Time Scale: 0 to 320
pc_load
pc_out
im_req
im_inst
iq_req
I_bus
iiu_req
iiu_ack
seq_num
reg_rs
reg_rt
reg_rd
log_req
cki_req
res_req
rf_req
func
alu_req
R_seq
R_data
R_dest
alu_arb
clr_req
wrf_reg
wrf_req
K_seq
K_chkid
val_seq
val_reg
12c02802412011c018030
14be00000
144000064050230004
87654321
0101001f00
00020000
03030201001f
060004030f
76541
000000005100000008
0302011f
030201001f
7767654654321210
3130130101010130
76543210
A B C D E
F G
H IJ K L M
N O P Q
R
Figure 5.4: Test 1, Detailed Activities
after-write and write-after-write, respectively. Instructions are held by the reservation
table until the required registers are available. The signals at points (F,H) and (G,I)
show the input-output timing for the ALU. The ALU is able to accept a new request
(pulse G) before finishing its current operation (pulse H) because of its internal buffer.
Similarly, (H,P) and (I,Q) are the input-output pairs for REGDWB, indicating queuing
inside that block.
63
test2a1 with All Delays=1 Time Scale: 0 to 198
I_bus
iiu_ack
seq_num
reg_rs
reg_rt
reg_rd
log_req
res_req
rf_req
alu_req
mem_req
dmem_rq
dq_req
R_seq
R_data
R_dest
alu_arb
dq_arb
clr_req
wrf_reg
wrf_req
val_seq
val_reg
val_mem
14400006408c03004003422000718c010040
4321
0000010000
00 0000
0303020100
321
000000000100000007000000000
030201
0201
3210
S L L
S L L
L L
A
B CD
E
F
Figure 5.5: Test 2 with All Delays=1
5.2. Delayed Write Buffers and Checker Arbitration
Figure 5.5 is the simulation output of test program #2 with all delays=1.
Mem_req is the DMEM request signal from IIU to MEMDWB, and then MEMDWB
communicates with DMEM through dmem_rq as needed. Dq_req is driven either by
DMEM or MEMDWB, depending on where the data is actually retrieved. These three
waveforms are labeled S and L for store and load, for ease of reference. The ORI
instruction is another example of reading data from REGDWB, as indicated by the
timing of pulse A occurring before pulse E.
64
test2a2 with DLY_DMEM_WR=15 Time Scale: 0 to 197
I_bus
iiu_ack
seq_num
reg_rs
reg_rt
reg_rd
log_req
res_req
rf_req
alu_req
mem_req
dmem_rq
dq_req
R_seq
R_data
R_dest
alu_arb
dq_arb
clr_req
wrf_reg
wrf_req
val_seq
val_reg
val_mem
14400006408c03004003422000718c010040
4321
0000010000
00 0000
0303020100
321
000000000100000007000000000
030201
0201
3210
S L L
S L
L L
XY
Figure 5.6: Test 2 with Long DMEM Write Cycle
// ; test2.a -- MEMDWB, REGDWB, checker arbitration//// ; one at a time: DLY_DMEM_WR=15, DLY_CKM_CHK=10// ; DLY_CKF_CHK=10//
1AC000040 // 00 sw 40h(r0), r018C010040 // 04 lw r1, 40h(r0) ; read from MEMDWB034220007 // 08 ori r2, r1, 7 ; read from REGDWB08C030040 // 0C lw r3, 40h(r0) ; read from memory144000064 // 10 trap 100
The SW instruction is sent to MEMDWB at point B, but the memory write is not
done until point D because it has to be validated at point F first. When the write is
completed, that entry is deleted from MEMDWB, and the LW instruction at point C has
to retrieve the data from DMEM. With a long memory write cycle, as shown in Figure
65
test2a3 with DLY_CKM_CHK=10 Time Scale: 0 to 196
I_bus
iiu_ack
seq_num
reg_rs
reg_rt
reg_rd
log_req
res_req
rf_req
alu_req
mem_req
dmem_rq
dq_req
R_seq
R_data
R_dest
alu_arb
dq_arb
clr_req
wrf_reg
wrf_req
val_seq
val_reg
val_mem
14400006408c03004003422000718c010040
4321
0000010000
00 0000
0303020100
321
000000000100000007000000000
030201
0201
3210
S L
SL
Figure 5.7: Test 2 with Slow Memory Checker
5.6, the load request may arrive before the write is finished. Data is transferred directly
from MEMDWB to DQ at point Y, saving a DMEM read cycle. For the second LW
instruction, data is still read from DMEM at point X, as expected.
Because a memory store has to be validated before initiating the write cycle, a
long delay through the memory checker CKM may have a similar effect as the previous
case. In Figure 5.7, the read/search cycle for MEMDWB is already started when the
write operation takes place. Even though read has higher priority than write, the
external write cycle must be allowed to complete. The internal queue is already halted
by the read request, preventing data from being deleted due to write completion.
Therefore, data is retrieved from the buffer and sent to DQ.
66
test2b with DLY_CKF_CHK=3 Time Scale: 0 to 200
seq_num
log_req
log_ack
K_seq
cki_arb
ckf_arb
ckm_arb
ckr_arb
chk_ack
4321
3432321210
A
BC
D
Figure 5.8: Test 2, Showing Arbitration of Checkers
Figure 5.8 shows the activities between the four checkers and the instruction log.
The K_seq bus is shared by all checkers to send the sequence number being verified to
LOG. The single chk_ack line from LOG is routed to all checkers, but it is meaningful
only to the checker that has been granted access to the K_bus. Two arbitration requests
are overlapped at points B and D. Pulse B is actually active one time step ahead of
pulse D, and therefore, CKI is given access while CKR must wait.
This figure also indicates that the log_ack at point A has a longer delay than other
cycles. That is caused by communication with CKF at point C because the queue in
LOG must be kept from advancing while the appropriate check bit is searched and
updated in the queue.
5.3. Arbitration for R_bus and Out-of-Order Completion
Just like the K_bus, an arbiter is also needed for the R_bus to control data flow
from ALU and DQ. The following program is run with different DMEM delays, and
the results are shown in Figures 5.9 and 5.10.
67
test3a1 with DLY_DMEM_RD=23 Time Scale: 0 to 210
I_bus
iiu_ack
seq_num
reg_rd
log_req
res_req
rf_req
alu_req
mem_req
dmem_rq
dq_req
R_seq
R_data
R_dest
alu_arb
dq_arb
clr_req
wrf_reg
wrf_req
val_seq
val_reg
14400006402403000302402000208c010000
4321
03020101
3120
00000000308c010000
030101
010101
32100
DE F
X W
Y Z
Figure 5.9: Test 3 with Slow DMEM
// ; test3.a -- R_bus arbitration//// ; DLY_DMEM_RD=1 and 23//
08C010000 // 00 lw r1, 0(r0)08C010000 // 04 lw r1, 0(r0) ;completed after next instruction024020002 // 08 addui r2, r0, 2024030003 // 0C addui r3, r0, 3144000064 // 10 trap 100
In Figure 5.9, arbitration requests D and F occur simultaneously. Access is
granted to ALU because the previous bus cycle was taken by DQ at point E. As a
consequence, result for the third instruction (ADDUI) is written to REGDWB before
the second instruction (LW). Since REGDWB is a FIFO queue, data are also
committed to the register file in reverse order, as can be seen by the W and X pulses in
the two figures. Even though instructions may complete out of order, they are always
68
test3a2 with All Delays=1 Time Scale: 0 to 187
I_bus
iiu_ack
seq_num
reg_rd
log_req
res_req
rf_req
alu_req
mem_req
dmem_rq
dq_req
R_seq
R_data
R_dest
alu_arb
dq_arb
clr_req
wrf_reg
wrf_req
val_seq
val_reg
14400006402403000302402000208c010000
4321
03020101
3210
00000000310000000208c010000
03020101
020101
3210
A B
C
W X
Y Z
Figure 5.10: Test 3 with Fast DMEM
validated in the same sequence as issued, at points Y and Z.
Figure 5.10 also has two other results worth noting. The DQ-to-REGDWB cycle
at C is delayed because of the REGFILE access at A. At that time, the queue in
REGDWB is stopped so that uncommitted register data may be searched and retrieved.
For the same reason, the write cycle at W takes longer to complete because of the
rf _req signal at B.
69
test4a: All Faults Time Scale: 0 to 530
pc_load
pc_out
I_bus
iiu_req
iiu_ack
seq_num
reg_rd
log_req
res_req
rf_req
func
alu_req
mem_req
dmem_rq
dq_req
R_seq
R_data
R_dest
alu_arb
dq_arb
clr_req
wrf_reg
wrf_req
K_seq
K_chkid
K_err
val_seq
val_reg
val_mem
hlt_req
02802402802412012011c11c110
65654543432212
030300000202000001
001003
5330
10000000010000000100000000100000000c
03020201
01
565654332010
300300021031010103010
5432101
Figure 5.11: Test 4, All Faults
5.4. Fault Detection and Rollback
All of the simulation results presented thus far are free of faults (at least from the
processor’s point of view). In this section, we will see how the rollback process works
when faults are detected. Figure 5.11 is an overview of running test program #4 with
all faults shown. The five halt request hlt_req pulses at the bottom correspond to the
five faults in the program. Each one will be displayed and discussed in greater detail.
70
test4b: CKI Fault Time Scale: 7 to 106
pc_load
pc_out
I_bus
iiu_req
iiu_ack
seq_num
reg_rd
log_req
res_req
rf_req
func
alu_req
R_seq
R_data
R_dest
alu_arb
clr_req
wrf_reg
wrf_req
K_seq
K_chkid
K_err
val_seq
val_reg
hlt_req
01801411000c10810401801411000c108
1c4020001100000000
21210
000101
0303
00
00000000c00000000c
0101
01
21010
03010
101
A
B
C
D
E
F
Figure 5.12: Test 4, CKI Fault
// ; test4.a -- faults, rollback//
03401000C // 00 ori r1, r0, here100000000 // 04 *nop ; cki fault0C8200000 // 08 jrf r1 ; ckf fault1C4020001 // 0C here: adduif r2, r0, 1 ; ckr fault1CC000040 // 10 swf 40h(r0), r0 ; ckm fault08C03001C // 14 lw r3, badata(r0) ; ckr fault144000064 // 18 trap 100
//100000000 // 1C badata: *.word 0
The first fault is caused by the instruction NOP retrieved from IMEM, and it is
detected at point D in Figure 5.12. The three K_bus components indicate that there is
an error for sequence number 1, and it is detected by checker 0 (CKI). The instruction
71
log controller then initiates a rollback cycle by activating hlt_req at point F, and the
program counter is updated at point A. Before the rollback starts, the result for the first
instruction ORI is already written to REGDWB at point B. Since it is not dependent on
the erroneous instruction, data is kept in the REGDWB, and after it is validated at point
E, it is written to the REGFILE at point C.
The second fault is from the JRF instruction. Register R1 written by ORI is
correct, but JRF forces a bad parity bit when R1 is read, and the fault is detected by the
register file checker CKF. The pc_load signal at point G in Figure 5.13 is the first JRF
attempt. A new address is loaded at point H for rollback, and the final branch occurs at
point I.
test4c: CKF Fault Time Scale: 78 to 168
pc_load
pc_out
I_bus
iiu_req
iiu_ack
seq_num
reg_rd
log_req
res_req
rf_req
K_seq
K_chkid
K_err
val_seq
hlt_req
00c01801411000c10811000c018014110
1c40200011c4020001
32232
0000
2 221
10110
221
G H I
Figure 5.13: Test 4, CKF Fault
72
The next instruction ADDUIF tells the ALU to perform an addition with bad
parity. The results can be seen on the R_data bus in Figure 5.14, before the rollback at
point J, and after the rollback at point K. Both data words are written to REGDWB, but
the incorrect one is invalidated in the rollback process, and only one word is actually
committed to the register file at point L. Similarly, the SWF instruction causes an
erroneous data to be written to the MEMDWB at point M in Figure 5.15. The rollback
cycle makes the correction, and the data is sent to DMEM at point N.
test4d: ALU CKR Fault Time Scale: 169 to 313
pc_load
pc_out
I_bus
iiu_req
iiu_ack
seq_num
reg_rd
log_req
res_req
rf_req
func
alu_req
R_seq
R_data
R_dest
alu_arb
clr_req
wrf_reg
wrf_req
K_seq
K_chkid
K_err
val_seq
val_reg
hlt_req
12011c12011c
08c03001c1cc00004008c03001c1cc000040
5434
00020002
000010
3 330
10000000100000000100000000100000000c
02 020201
02
4343343
130103010
332
J K
L
Figure 5.14: Test 4, ALU CKR Fault
73
test4e: CKM Fault Time Scale: 272 to 393
pc_load
pc_out
I_bus
iiu_req
iiu_ack
seq_num
reg_rd
log_req
res_req
rf_req
mem_req
dmem_rq
K_seq
K_chkid
K_err
val_seq
val_mem
hlt_req
024120120
14400006408c03001c08c03001c
54454
030000
54434
02102130
443
MN
Figure 5.15: Test 4, CKM Fault
Another CKR fault takes place when the LW instruction tries to load a bad word
from DMEM. Since the error is in DMEM and not in MEMDWB, two DMEM read
cycles occur in Figure 5.16, at points O and P. The rest of the events are similar to the
CKR fault caused by ADDUIF because both instructions modify the register file. Note
that the pulse at N is from the previous SWF instruction.
Figure 5.17 takes the first fault at NOP as an example to show the detailed
interactions between IIU and LOG. The four halt and rollback signals are discussed in
section 3.5.2 and also shown in Figure 3.7. At point Y, the IIU sends the sequence
number on seq_num , the instruction address on A_bus_L , and the instruction on B_bus
to the LOG. A_bus_L is just the lower bits of A_bus that are significant for address
formation. Before the rol_req signal is released at point X, the rollback address is
retrieved from LOG and sent to IIU on A_bus_L . The IIU then updates its internal
program counter and loads it into PC at point Z.
74
test4f: DMEM CKR Fault Time Scale: 369 to 530
pc_load
pc_out
I_bus
iiu_req
iiu_ack
seq_num
reg_rd
log_req
res_req
rf_req
mem_req
dmem_rq
dq_req
R_seq
R_data
R_dest
dq_arb
clr_req
wrf_req
K_seq
K_chkid
K_err
val_seq
val_reg
hlt_req
028024028024
100000000144000064100000000144000064
6565
03030303
5 55
000000000100000000100000000100000001
03 0303
565565
30210302102
554 4
N O P
Figure 5.16: Test 4, DMEM CKR Fault
test4g: Rollback Sequence Time Scale: 8 to 70
val_seq
hlt_req
hlt_ack
rol_req
rol_ack
seq_num
A_bus_L
B_bus
log_req
cki_req
pc_load
pc_out
1
1210
104108104000000
0c820000010000000003401000c
10401801411011000c108
X
Y
Z
Figure 5.17: Test 4, Rollback Sequence
75
5.5. Running a Real Program
A simple, useful program is shown below to exercise various parts of the
processor while causing three different faults. The program should be self-explanatory,
and the simulation waveforms are displayed in Figure 5.18. Even though the details are
too small to be traced, some properties, such as the variations in register reservation
delays res_req and occurrences of rollback hlt_req , can still be observed.
// ; test5.a -- find the largest number//
13401003C // 00 *ori r1, r0, first02402004C // 04 addui r2, r0, last100001825 // 08 or r3, r0, r0 ;current largest number
//18C240000 // 0C loop: lw r4, 0(r1) ; load from memory10083282B // 10 sgt r5, r4, r3 ; r5 = 1 if r4 > r3010A00004 // 14 beqz r5, smallr000041825 // 18 or r3, r0, r4 ; found larger number000223028 // 1C smallr: seq r6, r1, r2 ; check if done114C00008 // 20 bnez r6, done124210004 // 24 addui r1, r1, 4 ; increment pointer00BFFFFE0 // 28 j loop
//034070050 // 2C done: ori r7, r0, largst1CCE30000 // 30 swf 0(r7), r3 ; store in memory044000003 // 34 trap 3 ; print the number144000064 // 38 trap 100 ; stop
//00000000A // 3C first: .word 10100000005 // 40 *.word 510000000D // 44 .word 13100000002 // 48 .word 2000000009 // 4C last: .word 9
//000000000 // 50 largst: .word 0
76
test5a: Find the Largest Number Time Scale: 0 to 1543
pc_load
pc_out
I_bus
iiu_req
iiu_ack
seq_num
log_req
res_req
rf_req
func
alu_req
mem_req
dmem_rq
dq_req
R_seq
R_dest
alu_arb
dq_arb
clr_req
wrf_reg
wrf_req
cki_arb
ckf_arb
ckm_arb
ckr_arb
K_seq
K_chkid
K_err
val_seq
val_reg
val_mem
hlt_req
7764321dcba32ecbcb43
030d0a000d0a000d0a000d0a000d0a03
642fdb8630ecb9742
07060501060501060501060504010605
0605010605010605010605010605
71a8520bb960
230
75310eca97421fdba85320
Figure 5.18: Test 5, Find the Largest Number
77
Chapter SixHardware Design and Gate-Level Simulation
Gates.v in Appendix A is a collection of gate-level components, from which more
complex circuits are built in other Verilog modules. The main purpose of this chapter
is to show how some structures used in the behavioral Verilog model may be built in
hardware. Some circuit diagrams contain high level logic gates for simplicity and
clarity, but more efficient implementations may be found at the full transistor level.
6.1. C-Elements
C-element is a very basic unit in a self-timed system, and a NMOS 2-input C-
element appears on [Seit80, p. 255]. The number of inputs may be extended by adding
transistors to the parallel and series structures, as shown in Figure 6.1. However, the
number of transistors in series should be limited because of the body effect.
out
in 3in 2in 1
in 3 in 2 in 1M2 M1
Figure 6.1: 3-Input C-Element
When a C-element with large number of inputs is required, it can be easily built
from smaller C-elements because its function is associative. The 14-input C-element
78
used in bigc.v is shown in Figure 6.2. Sometimes a C-element needs to be reset to clear
its state, and that can be accomplished by forcing the inputs to be zeros. Figure 6.3
represents the muller 2c module in gates.v . Other C-element implementations can be
found in [Berk91, Jaco90, Suth89].
output
C
CCCC
1413121110987654321inputs
Figure 6.2: 14-Input C-Element
C out
in 1
in 2
clear
outCin 2
in 1
clear
Figure 6.3: 2-Input C-Element with Clear
All transistor and gate delays in the Verilog model are set to 1 unless there are
timing issues involved. This code segment comes from the muller 3 module:
gates.v 29: nmos #2 m1 (out, gnd, b);gates.v 30: nmos #1 m2 (b, a, out);
Transistors M1 and M2 are marked in Figure 6.1. The cross-coupled structure needs to
be biased so that oscillation would not occur in simulation, due to exactly matched
delays when two or more inputs switch at the same time in opposite directions. This
79
kind of switching does not occur under the 4-phase handshake protocol, except during
local reset when the transaction is canceled anyway. In a real circuit, the C-element
should not oscillate.
6.2. Arbiter
Another circuit, an interlock element, is also presented on [Seit80, p. 261]. It is
shown in Figure 6.4 along with logic for rollback signals, which make up the arbiter for
R_bus. This arbiter is non-prioritized, and multiple modules can be connected in a
binary-tree fashion to support more than two arbitration requests.
roll_ack
halt_ack
ack 1ack 0
roll_req
halt_req
req 1
req 0
ack 0
ack 1
v 1
v 0
Figure 6.4: Arbiter for R_bus
Since the arbiter is not directly involved in transferring data, it is always reset
when a rollback occurs. As long as the actual data transmitter and receiver handle the
rollback sequence properly, the outputs of the arbiter do not have to be stopped before
sending halt_ack . This simplification allows the arbiter to run at full speed in normal
operations. The 3-input AND gate delays the generation of roll_ack until all arbitration
grants (ack s) are released.
Figure 6.5 is the simulation result of running test program #3 (section 5.3) with the
gate-level arbiter. Alu_arb is connected to req 0, and dq_arb is connected to req 1.
When both requests become active at the same time, ALU wins because the circuit is
80
arbr1 [test3] with DLY_DMEM_RD=23 Time Scale: 103 to 220
R_seq
alu_arb
dq_arb
req0
v0
ack0
req1
v1
ack1
31200
Figure 6.5: Gate-Level Arbiter
biased toward req 0 due to the cross-coupled AND gates. In actual implementation,
slight mismatch between cross-coupled pairs would help combat the problem of
metastability.
6.3. Register Completion Detector
Before the actual completion circuit is discussed, some latches need to be
introduced first. Figure 6.6 is a simple latch made from inverters and pass gates. For
actual circuit implementation, it may be more desirable to use full CMOS pass gates to
reduce power consumption, but they do not make any difference in logic simulation.
Figure 6.7 adds the capability to set or reset the latch by connecting one of the S/R
transistors (but not both). Again, the NMOS pull-up is used only for simplicity.
D
G
QQD
G
Figure 6.6: D-Latch
After data is latched in a register, a signal has to be generated to notify other
81
GD Q
Q
G
D
R
S
S /R
S /R
Figure 6.7: D-Latch with Set/Reset
same 2
same 1
osc q 2
q 1done
regclk
complete
latch
reset
reset reset
resetC
RD Q
RGD Q
SGD Q
QDG
QDG
x1
x2
x3
REGISTER
Figure 6.8: Register Completion Detector
circuits about its completion. A fully self-timed circuit requires a completion signal
generator (XNOR) for each bit of the register and a large C-element to merge them.
The overhead is very high, especially if the processor has a high proportion of register
elements.
The completion scheme used in AMPIRE is shown in Figure 6.8. Component x1
is a master-slave flip-flop made from two resettable D-latches. With an inverter
connected from its output back to the input, the output is flipped every time a latch
request is received. X2 and x3 model the delay of the actual register, and they are most
82
likely to be slower than the real register on the same die (compare Figures 6.6 and 6.7).
Together with the flip-flop "oscillator", the outputs q1 and q2 are switched every latch
cycle, modeling both rise and fall delays.
For the register, the gate is opened when the latch signal becomes active. After
the signal done becomes high, the gate is closed, protecting the register from further
changes at the input. Therefore, the latches in the register behave like edge-triggered
flip-flops without the additional hardware complexity. The register can be arbitrarily
long, but the completion circuit must be placed properly to take the wire delays into
account. Note that the reset signal is not necessarily the global processor reset.
Simulation waveforms will be shown with the gate-level instruction queue.
6.4. Instruction Queue
The buffers in AMPIRE may be separated into two groups: the ones which are
always cleared during rollback or other conditions, and the ones with selective
invalidation. The IQ buffer falls into the former group and will be presented here, and
the latter type will be discussed in the next section. The actual storage elements are
simple latches that can be used with the register completion detector in the previous
section. What we need is additional circuitry to communicate with other modules or
sub-modules.
The IQ buffer control is shown in Figure 6.9. The two C-elements make up a 4-
phase full-handshake circuit, and they represent the input and output cycles in the
behavioral code. Please see [Meng89] for detailed description on this handshake
circuit. Cancel_r is asserted when a rollback or branch occurs. The buffer for
cancel_a has to provide enough delay to reset the control circuit, including the register
completion unit. A better method should really be used to generate that
83
cancel_a
clearresetcancel_r
CC out_r
out_a
in_r
in_a
completelatch
Figure 6.9: Control for IQ Buffer
iq1 [test1] Time Scale: 73 to 171
pc_load
pc_out
in_d
out_d
in_r
in_a
out_r
out_a
can_r
can_a
latch
osc
q1
q2
same1
same2
done
regclk
complet
12000c108108
03400000100c00001800c000018
03400000100c00001800c000018
AB
CD
E
FG
Figure 6.10: Gate-Level IQ
acknowledgment because matching the delays of a large circuit may be difficult and not
always reliable.
Figure 6.10 is simulated with full gate-level buffers, running test program #1
(section 5.1). Pulses (A,B) and (C,D) form the 4-phase handshake pairs for input and
84
output, respectively. The output request C can be started before the input
acknowledgement B is finished because the data is already latched and ready for output.
Looking at the signals inside the register completion detector, the done signal at
point F becomes high after the input data has propagated to the output of the latch, as
checked by the two same signals. Then the register clock regclk is turned off at point
G. When a branch is taken, the queue is cleared in response to the cancel request
can_r , which is connected directly to pc_load . As a result, the signal out_r is dropped
at point E without receiving a corresponding acknowledgement.
6.5. Data Queue
Unlike the IQ buffer, the buffers in DQ, DWBs, checkers, etc. have to be
selectively invalidated in the event of a rollback. Since invalidation is based on
sequence number comparison, the comparator hardware will be discussed next.
6.5.1. Sequence Number Comparator
The rollback code was introduced in section 4.1.6, and these lines describe what
needs to be done to determine if data should be invalidated:
diff = sequence_num - sequence_error;if (˜diff [high_bit])
valid = 0;
The subtractor can be simplified because only the highest bit of the difference is used.
Four bits are allocated for the sequence number in AMPIRE, and therefore, we need
three bits of borrow (as opposed to carry) and one bit of difference circuits, as shown in
Figure 6.11. BRWEND is a borrow generator without borrow-in, for the least
significant bit. The logic equations for these blocks are (for A!B):
85
BRW
req
XX
ZZ
BA
BRWEND
req
ZZ
BAA BZZ
XX
reqBRWSUB
req
XX
DD
BA
ack
0123
reqack
seqerror
invalid
Figure 6.11: Sequence Number Comparator
YYreqreq
req
A
B
X
BA
X
B
A
B
A
YYreqreq
req
A
BBA
Figure 6.12: Borrow Circuits, Full and Half
BRWEND: Z = AB
BRW: Z = AB + AX + BX = AB + X (A +B )
SUB: D = A xor B xor X
The self-timed components appear in Figures 6.12 through 6.14, which are
designed using a technique called DCVSL, differential cascode voltage switch
logic [Jaco90, Meng89]. The PMOS transistors precharge the Y and Y nodes when req
is low, and the NMOS transistor that is gated by req prevents premature discharge.
When the request signal is activated, only one of the precharged nodes is pulled low
because the two NMOS trees are complementary. The output inverters are added, as
86
Z
Z
B
A
req
req Y
B
A
B
A
req Y
A
BAABB
ZZ
reqXOR
Figure 6.13: XOR Gate
XORXOR
req
ZZ
BBAA
req
ZZ
BBAA
B
A
XX
req
ack
DD
Figure 6.14: Difference (SUB) Circuit
done in Figure 6.13, so that the voltage level is low when the circuit is inactive or not
ready. The differential outputs can then be fed directly into another DCVSL circuit. At
the final stage of the chain (SUB), the outputs are ORed together to generate the
completion/acknowledgement signal. Because the circuit is self-timed, completion
time varies with the data being compared.
6.5.2. DQ Control
The control circuit for the DQ buffer is shown in Figure 6.15. The register
completion detector is interfaced through latch and complete , like the IQ buffer. A
sequence number comparator cycle is started with roll_req , and the acknowledgement
cmp_done is raised when the result invalid is available.
87
g9
g8
cmp_doneinvalid
cout 2clear 2
out_rin_acout 1
roll_req
reset
roll_ackC
halt_ackhalt_req
clear 2
clear 1
invalid
reset
latch complete
in_a
in_r
out_a
out_rcout 1 cout 2
roll_req
C Ca
Figure 6.15: Control for DQ Buffer
The square part with equal sign is just a standard resettable D-latch, with its gate
connected to halt_req (not shown) and its reset connected to clear1. In normal
operations, it is in the transparent mode (output = input). When a rollback cycle is
started, the inputs are cut off because halt_req becomes low, and thus accomplishing
wait (˜halt_req) in the behavioral code. However, complete cannot be
interrupted because if data is already latched, the sender must be notified, or data
duplication may occur when normal operation is resumed (see section 4.1.6). The delay
buffer in the halt_ack circuit is provided for that purpose.
88
The cout2 signal is equivalent to the valid variable in the behavioral code, and it is
cleared in the rollback process only if the corresponding register is invalidated, and
consequently activating clear2. The AND gates g8 and g9 partially determine when the
invalidation process is finished. If invalidation is not necessary, then it can proceed
once the comparison is done (g9). Otherwise, roll_ack cannot be sent until the cout2
signal is reset (g8).
6.5.3. DQ Simulation
A couple of test programs will be run to show the internal signals of the DQ buffer
and the sequence number comparator. Both programs read data from DMEM, which
are sent to the REGDWB through DQ. The first program causes a CKI fault, but it does
not invalidate LW.
// ; test6.a -- gate level DQ, no invalidation//// ; DLY_DMEM_RD=50, DLY_CKI_CHK=48 and 52//
08C010000 // 00 lw r1, 0(r0)100000000 // 04 *nop144000001 // 08 trap 1144000064 // 0C trap 100
In Figure 6.16, the output request out_r is set at point A, but the halt request is
also activated at the same time. Out_r is cleared by clear1 at point C, but since cout2
retains its value through the rollback process (not invalidated), another handshake cycle
is started at point B.
The signals in the lower half of the figure belong to the sequence number
comparator, showing 0!1. (_a,na) through (d,inv) are the differential output pairs of the
four sub-blocks in Figure 6.11. The waveforms are staggered because each one is
dependent on the result from the preceding stage.
89
dq1 [test6] with DLY_CKI_CHK=48 Time Scale: 127 to 269
in_r
cout1
latch
in_a
a
cout2
out_r
out_a
h_req
h_ack
r_req
clear1
clear2
r_ack
seq
error
req
_a
na
b
nb
c
nc
d
inv
ack
0 0 0
1 11
A B
C
Figure 6.16: Test 6, Gate-Level DQ with DLY_CKI_CHK=48
By delaying the rollback cycle a little, we get Figure 6.17. By the time h_req
becomes high at point G, the receiver has already started latching its register. That is
completed at point F, and the the acknowledgment clears cout2 at point D. The signal
out_r is held constant by a closed latch, and therefore, it is not dropped until point E,
after clear1 is raised.
The next program causes LW to be invalidated because of an error in the
preceding instruction.
90
dq2 [test6] with DLY_CKI_CHK=52 Time Scale: 127 to 224
in_r
cout1
latch
in_a
a
cout2
out_r
out_a
h_req
h_ack
r_req
clear1
clear2
r_ack
seq
error
req
_a
na
b
nb
c
nc
d
inv
ack
0 0 0
11
DE
FG
Figure 6.17: Test 6, Gate-Level DQ with DLY_CKI_CHK=52
// ; test7.a -- gate level DQ, invalidation//// ; DLY_CKI_CHK=15, DLY_DMEM_RD=8 and 1//
000000000 // 00 nop000000000 // 04 nop000000000 // 08 nop100000000 // 0C *nop08C010000 // 10 lw r1, 0(r0)144000001 // 14 trap 1144000064 // 18 trap 100
In Figure 6.18, an input request in_r is received and causes cout1 to become high.
However, that signal is not propagated to latch because the path is already closed by
91
dq3 [test7] with DLY_DMEM_RD=8 Time Scale: 133 to 190
in_r
cout1
latch
in_a
a
cout2
out_r
out_a
h_req
h_ack
r_req
clear1
clear2
r_ack
seq
error
req
_a
na
b
nb
c
nc
d
inv
ack
0 0 0
3 33
Figure 6.18: Test 7, Gate-Level DQ with DLY_DMEM_RD=8
h_req . Since the data is not received by this buffer, it is invalidated somewhere else.
In Figure 6.19, the sequence number comparison results in the inv signal being high at
point I. Then clear2 invalidates the data by resetting cout2 at point H. Notice that
there is no activity on the out_r line during this period of time.
The sequence number comparisons performed in Figures 6.18 and 6.19 are: 0!3
and 4!3 (0000!0011 and 0100!0011 in binary). For both cases, the borrow generation
for bit 1 is not dependent on the result from bit 0. Therefore, _a and b become high at
the same time, reducing the time for borrow propagation.
92
dq4 [test7] with DLY_DMEM_RD=1 Time Scale: 127 to 196
in_r
cout1
latch
in_a
a
cout2
out_r
out_a
h_req
h_ack
r_req
clear1
clear2
r_ack
seq
error
req
_a
na
b
nb
c
nc
d
inv
ack
4 44
3 33
H
I
Figure 6.19: Test 7, Gate-Level DQ with DLY_DMEM_RD=1
6.6. Fault Simulation with Gate-Level Modules
Four modules in AMPIRE have gate-level models: BIGC, ARBR, IQ, and DQ.
Each one can be switched through declarations in the parameter file. Figure 6.20 is the
result of running test program #4 (section 5.4) with all gate-level models enabled. The
waveforms are very similar to the ones in Figure 5.11, which is from the behavioral
model. Figure 6.20 does have an extra REGFILE write request pulse at point A. The
data is not actually written at that time because it is interrupted by the rollback cycle at
93
allgates [test4a]: All Faults Time Scale: 0 to 1405
pc_load
pc_out
I_bus
iiu_req
iiu_ack
seq_num
reg_rd
log_req
res_req
rf_req
func
alu_req
mem_req
dmem_rq
dq_req
R_seq
R_data
R_dest
alu_arb
dq_arb
clr_req
wrf_reg
wrf_req
K_seq
K_chkid
K_err
val_seq
val_reg
val_mem
hlt_req
656545434323212
03030002000001
001003
53300
10000000010000000100000000100000000c
0302020101
0201
5656543210
303023110
54321
A B
C
Figure 6.20: Fault Simulation with Gate-Level Modules
point C. After the processor is restarted, the transaction is repeated at point B.
94
Chapter SevenConclusion
This thesis has demonstrated a fault-tolerance method for an asynchronous
processor. The register reservation mechanism guarantees mutual exclusion on the
registers and enables concurrency for independent instructions. The instruction log and
delayed write buffers provide temporary storage for un-validated data so that permanent
state elements are not affected until verification is finished. Because execution of
instructions may complete out of order in an asynchronous environment, sequence
numbers are used to track dependencies at the instruction level, and instructions which
need to be rolled back can be properly identified and undone.
The performance of a synchronous system is penalized because the clock rate is
limited by the longest path of all pipeline stages, and time may be wasted in many
stages. An asynchronous system requires only as much time as necessary to complete a
task, and therefore, it achieves an average processing speed, not the worst case.
Asynchronous design also allows the system to be very modular because there is no
global constraint like the clock. Blocks of different speeds may be mixed and matched
without altering the functionality of a system, as long as the handshake protocol is not
violated. However, some resources may be well under-utilized if the system is not
balanced properly, and the overhead for handshaking must be considered.
One of the difficulties with a fully asynchronous design is generating the
handshake signals in the correct sequence. A state machine in the traditional
synchronous sense is not available because there is no clock. Certain sequencing can be
forced by introducing delay elements, as done in some circuits in Chapter 6, but their
accuracy and reliability in different conditions (electrical, thermal, and IC process
95
variations) are the drawbacks. Too short of a delay makes the circuit non-functional,
but the performance suffers if the delay is longer than necessary. CAD tools should
really be used to assist the design of complex asynchronous systems, in terms of
handshake circuit synthesis and analysis.
As discussed in Chapter 2, delay-insensitive circuits guarantee correct
asynchronous operations because no assumption is made on wire delays. However, the
increased circuit complexity and chip area can be significant, and the larger physical
size does mean increased propagation delays. Since gate and wire delays can be
controlled locally without much difficulty, self-timed but delay-dependent circuits can
be applied in small blocks. Proper trade-offs between the two design methods should
lead to a better implementation than either one alone.
One of the goals at the beginning of this research was to design an equivalent of
micro rollback in an asynchronous environment. Micro rollback works at a finer
granularity than instruction retry, and as a result, the number of repeated operations is
reduced. For example, it is not necessary to fetch an instruction again if the error of the
same instruction occurs in the execution stage. In an asynchronous processor, however,
each unit cannot simply go back a few steps because each one operates at its own speed.
Dependency tracking probably has to be based on individual operations rather than
instruction sequence numbers, and if more than one dependency has to be kept [Stro85],
then it may become very difficult to manage. Furthermore, each functional unit (such
as the ALU) may need to keep a history log so that previous data can be played back for
rollback.
Some areas of this project can be improved and expanded. In AMPIRE, the
instruction and data memories are completely separate units so that they are easier to be
controlled. In a real system, though, a single memory device usually holds both
96
instructions and data. An arbiter has to be added to coordinate instruction and data
access, and the instruction fetch process has to be changed because now the IMEM
portion can be modified. Also, many faults cannot be detected and corrected with a
single processor. If two asynchronous processors are to be connected as a master/slave
pair like the UCLA Mirror Processor, then additional handshake mechanism has to be
added for synchronization, which is another layer of overhead. Since AMPIRE has not
been physically implemented, the impact of the addition of fault-tolerant features
cannot be quantitatively measured. However, based on the architecture of parallel data
verification, results should be similar to those of the Mirror Processor: high overhead in
chip area but insignificant reduction in performance.
97
Appendix AVerilog Simulation Code
The Verilog modules for AMPIRE are listed in alphabetical order, except for
parameter and ampire.v , which appear first.
Module Descriptionparameter Parameter declarations.ampire.v Top-level wiring and test setup.alu.v Arithmetic logic unit.arbk.v Arbiter for K_bus.arbr.v Arbiter for R_bus.bigc.v Big C-element for rollback synchronization.ckf.v Checker for REGDWB outputs to A_bus and B_bus.cki.v Checker for instructions executed by the IIU.ckm.v Checker for data written to MEMDWB.ckr.v Checker for data written to REGDWB.dmem.v Data memory.dq.v Data queue for memory-to-register transfers.gates.v Gate-level components.iiu.v Instruction issuing unit.imem.v Instruction memory.iq.v Instruction queue.log.v Instruction log.memdwb.v Delayed write buffer for data memory.pc.v Program counter.regdwb.v Delayed write buffer for register file.regfile.v Register file.restable.v Register reservation table.
Table A.1: Verilog Modules
98
parameter
1: parameter ADDR_WIDTH = 8; // width of address bus (bytes)2: parameter ADDR_IGNORE = 2; // number of address bits to ignore3: parameter ADDR_INC = 4; // amount of address increment (2ˆIGNORE)4: parameter SEQ_WIDTH = 4; // bits for sequence number5: parameter FUNC_WIDTH = 5; // ALU function code width6: parameter CHECKERS = 4; // number of checkers7: parameter CHKID_WIDTH = 2; // number of bits for checker ID8: parameter DWBS = 2; // number of DWBs (for validation)9:10: parameter OP_WIDTH = 6; // bits for op code11: parameter REG_WIDTH = 5; // bits for register number12: parameter EXTRA_WIDTH = 11; // "extra" field, "func" in R-type13:14: parameter REG_SIZE = 32; // number of registers (2ˆREG_WIDTH)15: parameter IMEM_SIZE = 64; // number of instruction memory WORDS16: parameter DMEM_SIZE = 64; // number of data memory WORDS17:18: parameter IMM_WIDTH = REG_WIDTH + EXTRA_WIDTH; // immediate field19: parameter OFFSET_WIDTH = 2 * REG_WIDTH + IMM_WIDTH; // offset field20: parameter DATA_WIDTH = OP_WIDTH + OFFSET_WIDTH; // word size21:22: parameter RESET_TIME = 1; // 1 for behavioral, 30 for gate-level23: ‘define BEHAV_IQ // comment out for gate-level24: ‘define BEHAV_DQ // *** CHANGE RESET_TIME ***25: ‘define BEHAV_ARBR26: ‘define BEHAV_BIGC27:28: // ‘define WAVE_IQ // define to add signal waves29: // ‘define WAVE_REGCMPL30: // ‘define WAVE_DQ31: // ‘define WAVE_SEQCMP32: // ‘define WAVE_ARBR33:34: parameter DLY_PC_INC = 1; // delay settings35: parameter DLY_IMEM_RD = 1;36: parameter DLY_IIU_PCINC = 1;37: parameter DLY_IIU_DECOD = 1;38: parameter DLY_IIU_ADD = 1;39: parameter DLY_RT_RES = 1;40: parameter DLY_RT_CLR = 1;41: parameter DLY_RF_RD = 1;42: parameter DLY_RF_WR = 1;43: parameter DLY_ALU_ADD = 1;44: parameter DLY_ALU_SUB = 1;45: parameter DLY_ALU_AND = 1;46: parameter DLY_ALU_OR = 1;47: parameter DLY_ALU_XOR = 1;48: parameter DLY_ALU_PASS = 1;49: parameter DLY_ALU_SHIFT = 1;50: parameter DLY_ALU_COMP = 1;51: parameter DLY_DMEM_RD = 1;52: parameter DLY_DMEM_WR = 1;53: parameter DLY_SEQ_COMP = 1; // sequence comparison delay for all modules54:55: parameter DLY_CKI_CHK = 1;56: parameter DLY_CKF_CHK = 1;57: parameter DLY_CKM_CHK = 1;58: parameter DLY_CKR_CHK = 1;59:60: parameter ID_CKI = 0; // checker ID numbers61: parameter ID_CKF = 1;62: parameter ID_CKM = 2;63: parameter ID_CKR = 3;64:65: parameter XXX = 33’bx; // to overcome default size of 32 bits66: parameter ZZZ = 33’bz;67:68: parameter OP_SPECIAL = 0; // reg-reg special ops69: parameter OP_J = 2;70: parameter OP_JAL = 3;71: parameter OP_BEQZ = 4;
99
parameter
72: parameter OP_BNEZ = 5;73: parameter OP_ADDUI = 9;74: parameter OP_SUBUI = 11;75: parameter OP_ANDI = 12;76: parameter OP_ORI = 13;77: parameter OP_XORI = 14;78: parameter OP_LHI = 15; // shift by IMM_WIDTH bits79: parameter OP_TRAP = 17; // used as simulator special service80: parameter OP_JR = 18;81: parameter OP_JALR = 19;82: parameter OP_SLLI = 20;83: parameter OP_SRLI = 22;84: parameter OP_SRAI = 23;85: parameter OP_SEQI = 24;86: parameter OP_SNEI = 25;87: parameter OP_SLTI = 26;88: parameter OP_SGTI = 27;89: parameter OP_SLEI = 28;90: parameter OP_SGEI = 29;91: parameter OP_LW = 35;92: parameter OP_SW = 43;93: parameter OP_ADDUIF = 49; // add immediate with fault94: parameter OP_JRF = 50; // jump register with fault95: parameter OP_SWF = 51; // store word with fault96:97: parameter SOP_NOP = 0; // reg-reg special opcodes98: parameter SOP_SLL = 4;99: parameter SOP_SRL = 6;100: parameter SOP_SRA = 7;101: parameter SOP_ADDUF = 25; // add with fault102: parameter SOP_ADDU = 33;103: parameter SOP_SUBU = 35;104: parameter SOP_AND = 36;105: parameter SOP_OR = 37;106: parameter SOP_XOR = 38;107: parameter SOP_SEQ = 40;108: parameter SOP_SNE = 41;109: parameter SOP_SLT = 42;110: parameter SOP_SGT = 43;111: parameter SOP_SLE = 44;112: parameter SOP_SGE = 45;113:114: parameter FUNC_ADDU = 0; // ALU function codes115: parameter FUNC_SUBU = 1;116: parameter FUNC_AND = 2;117: parameter FUNC_OR = 3;118: parameter FUNC_XOR = 4;119: parameter FUNC_LHI = 5;120: parameter FUNC_SLL = 6;121: parameter FUNC_SRL = 7;122: parameter FUNC_SRA = 8;123: parameter FUNC_SLT = 9;124: parameter FUNC_SGT = 10;125: parameter FUNC_SLE = 11;126: parameter FUNC_SGE = 12;127: parameter FUNC_SEQ = 13;128: parameter FUNC_SNE = 14;129: parameter FUNC_PASS = 15;130: parameter FUNC_ADDUF = 16; // add with fault131:132: parameter TRAP_BLANK = 90; // must be >= REG_SIZE133: parameter TRAP_STOP = 100;
100
ampire.v
1: // AMPIRE Top-Level Wiring and Test Setup2:3: module ampire;4:5: ‘include "parameter"6:7: reg reset;8:9: // ====================================================================10: // bus connections11:12: trireg [DATA_WIDTH:0] i_bus,13: a_bus,14: b_bus,15: d_bus;16:17: trireg [SEQ_WIDTH-1:0] r_bus_seq;18: trireg [REG_WIDTH-1:0] r_bus_dest;19: trireg [DATA_WIDTH:0] r_bus_data;20: trireg r_bus_req;21:22: trireg [SEQ_WIDTH-1:0] k_bus_seq;23: trireg [CHKID_WIDTH-1:0] k_bus_chkid;24: trireg k_bus_error,25: k_bus_req;26:27: trireg tri_dq_req;28:29: pulldown (r_bus_req); // do not float handshaking signals30: pulldown (k_bus_req);31: pulldown (tri_dq_req);32:33: // ====================================================================34: // component outputs35:36: wire [ADDR_WIDTH:0]37: pc_out;38: wire pc_out_retry,39: pc_load_ack,40: pc_out_req;41:42: wire [DATA_WIDTH:0]43: imem_data;44: wire imem_in_ack,45: imem_out_req,46: imem_cancel_ack;47:48: wire iq_in_ack,49: iq_out_req,50: iq_cancel_ack;51:52: wire [CHECKERS-1:0]53: iiu_chkbits;54: wire [SEQ_WIDTH-1:0]55: iiu_seq_num;56: wire [REG_WIDTH-1:0]57: iiu_reg_rd,58: iiu_reg_rs,59: iiu_reg_rt;60: wire [FUNC_WIDTH-1:0]61: iiu_alu_func;62: wire iiu_inst_en,63: iiu_sim_f,64: iiu_mem_rw,65: iiu_retry,66: stop;67: wire iiu_inst_ack,68: iiu_log_req,69: iiu_cki_req,70: iiu_newpc_req,71: iiu_res_req,
101
ampire.v
72: iiu_rfile_req,73: iiu_ckf_req,74: iiu_alu_req,75: iiu_mem_req;76:77: wire restable_res_ack,78: restable_clr_ack;79:80: wire alu_comp_ack,81: alu_arb_req;82:83: wire [ADDR_WIDTH:0]84: memdwb_out_addr;85: wire [SEQ_WIDTH-1:0]86: memdwb_out_seq;87: wire [REG_WIDTH-1:0]88: memdwb_out_reg;89: wire memdwb_rw_mode,90: memdwb_out_retry,91: memdwb_in_ack,92: memdwb_chk_req,93: memdwb_val_ack,94: memdwb_out_req;95:96: wire dmem_in_ack;97:98: wire dq_in_ack,99: dq_arb_req;100:101: wire arbr_ack0,102: arbr_ack1;103:104: wire [SEQ_WIDTH-1:0]105: regdwb_out_seq;106: wire [REG_WIDTH-1:0]107: regdwb_out_reg,108: regdwb_wrf_reg;109: wire [DATA_WIDTH:0]110: regdwb_out_data,111: regdwb_wrf_data;112: wire regdwb_in_ack,113: regdwb_chk_req,114: regdwb_clr_req,115: regdwb_val_ack,116: regdwb_rd_ack,117: regdwb_rdf_req,118: regdwb_wrf_req;119:120: wire [DATA_WIDTH:0]121: rfile_rd_data1,122: rfile_rd_data2;123: wire rfile_wr_ack,124: rfile_rd_ack;125:126: wire cki_in_ack,127: cki_arb_req;128:129: wire ckf_in_ack,130: ckf_arb_req;131:132: wire ckm_in_ack,133: ckm_arb_req;134:135: wire ckr_in_ack,136: ckr_arb_req;137:138: wire arbk_ack0,139: arbk_ack1,140: arbk_ack2,141: arbk_ack3;142:
102
ampire.v
143: wire [SEQ_WIDTH-1:0]144: val_seq;145: wire log_ack,146: log_chk_ack,147: log_valmem_req,148: log_valreg_req,149: halt_req,150: roll_req;151:152: wire ch_out;153:154: wire cr_out;155:156: // ====================================================================157: // component inputs158:159: wire [ADDR_WIDTH:0]160: pc_load = i_bus;161: wire pc_in_retry = iiu_retry,162: pc_load_req = iiu_newpc_req,163: pc_out_ack = imem_in_ack;164:165: wire [ADDR_WIDTH:0]166: imem_addr = pc_out;167: wire imem_retry = pc_out_retry,168: imem_in_req = pc_out_req,169: imem_out_ack = iq_in_ack,170: imem_cancel_req = iiu_newpc_req;171:172: wire [DATA_WIDTH:0]173: iq_in_data = imem_data;174: wire iq_in_req = imem_out_req,175: iq_out_ack = iiu_inst_ack,176: iq_en_out = iiu_inst_en,177: iq_cancel_req = iiu_newpc_req;178:179: wire iiu_inst_req = iq_out_req,180: iiu_log_ack = log_ack,181: iiu_cki_ack = cki_in_ack,182: iiu_pc_ack = pc_load_ack,183: iiu_imem_ack = imem_cancel_ack,184: iiu_iq_ack = iq_cancel_ack,185: iiu_res_ack = restable_res_ack,186: iiu_rfile_ack = regdwb_rd_ack,187: iiu_ckf_ack = ckf_in_ack,188: iiu_alu_ack = alu_comp_ack,189: iiu_mem_ack = memdwb_in_ack;190:191: wire [SEQ_WIDTH-1:0]192: restable_seq_num = iiu_seq_num;193: wire [REG_WIDTH-1:0]194: restable_reg_w = iiu_reg_rd,195: restable_reg_r1 = iiu_reg_rs,196: restable_reg_r2 = iiu_reg_rt,197: restable_reg_clr = regdwb_out_reg;198: wire restable_res_req = iiu_res_req,199: restable_clr_req = regdwb_clr_req;200:201: wire [SEQ_WIDTH-1:0]202: alu_in_seq = iiu_seq_num;203: wire [REG_WIDTH-1:0]204: alu_in_dest = iiu_reg_rd;205: wire [DATA_WIDTH:0]206: alu_in_d1 = a_bus,207: alu_in_d2 = b_bus;208: wire [FUNC_WIDTH-1:0]209: alu_in_func = iiu_alu_func;210: wire alu_comp_req = iiu_alu_req,211: alu_arb_ack = arbr_ack0,212: alu_out_ack = regdwb_in_ack;213:
103
ampire.v
214: wire [SEQ_WIDTH-1:0]215: memdwb_in_seq = iiu_seq_num;216: wire [REG_WIDTH-1:0]217: memdwb_in_reg = iiu_reg_rd;218: wire [DATA_WIDTH:0]219: memdwb_in_addr = a_bus,220: memdwb_in_data = b_bus;221: wire memdwb_in_rw_mode = iiu_mem_rw,222: memdwb_in_retry = iiu_retry,223: memdwb_in_req = iiu_mem_req,224: memdwb_chk_ack = ckm_in_ack,225: memdwb_val_req = log_valmem_req,226: memdwb_out_ack = dmem_in_ack,227: memdwb_dq_ack = dq_in_ack;228:229: wire [ADDR_WIDTH:0]230: dmem_addr = memdwb_out_addr;231: wire dmem_rw_mode = memdwb_rw_mode,232: dmem_retry = memdwb_out_retry,233: dmem_in_req = memdwb_out_req,234: dmem_out_ack = dq_in_ack;235:236: wire [SEQ_WIDTH-1:0]237: dq_in_seq = memdwb_out_seq;238: wire [REG_WIDTH-1:0]239: dq_in_reg = memdwb_out_reg;240: wire [DATA_WIDTH:0]241: dq_in_data = d_bus;242: wire dq_in_req = tri_dq_req,243: dq_arb_ack = arbr_ack1,244: dq_out_ack = regdwb_in_ack;245:246: wire arbr_req0 = alu_arb_req,247: arbr_req1 = dq_arb_req;248:249: wire [SEQ_WIDTH-1:0]250: regdwb_in_seq = r_bus_seq;251: wire [REG_WIDTH-1:0]252: regdwb_in_reg = r_bus_dest,253: regdwb_rd_reg1 = iiu_reg_rs,254: regdwb_rd_reg2 = iiu_reg_rt;255: wire [DATA_WIDTH:0]256: regdwb_in_data = r_bus_data,257: regdwb_rf_data1 = rfile_rd_data1,258: regdwb_rf_data2 = rfile_rd_data2;259: wire regdwb_sim_f = iiu_sim_f;260: wire regdwb_in_req = r_bus_req,261: regdwb_chk_ack = ckr_in_ack,262: regdwb_clr_ack = restable_clr_ack,263: regdwb_val_req = log_valreg_req,264: regdwb_rd_req = iiu_rfile_req,265: regdwb_rdf_ack = rfile_rd_ack,266: regdwb_wrf_ack = rfile_wr_ack;267:268: wire [REG_WIDTH-1:0]269: rfile_wr_reg = regdwb_wrf_reg,270: rfile_rd_reg1 = iiu_reg_rs,271: rfile_rd_reg2 = iiu_reg_rt;272: wire [DATA_WIDTH:0]273: rfile_wr_data = regdwb_wrf_data;274: wire rfile_wr_req = regdwb_wrf_req,275: rfile_rd_req = regdwb_rdf_req;276:277: wire [SEQ_WIDTH-1:0]278: log_seq = iiu_seq_num,279: log_chk_seq = k_bus_seq;280: wire [CHECKERS-1:0]281: log_chkbits = iiu_chkbits;282: wire [CHKID_WIDTH-1:0]283: log_chk_chkid = k_bus_chkid;284: wire log_chk_error = k_bus_error;
104
ampire.v
285: wire log_req = iiu_log_req,286: log_chk_req = k_bus_req,287: log_valmem_ack = memdwb_val_ack,288: log_valreg_ack = regdwb_val_ack,289: log_halt_ack = ch_out,290: log_roll_ack = cr_out;291:292: wire [SEQ_WIDTH-1:0]293: cki_in_seq = iiu_seq_num;294: wire [DATA_WIDTH:0]295: cki_in_data = b_bus;296: wire cki_in_req = iiu_cki_req,297: cki_arb_ack = arbk_ack0,298: cki_out_ack = log_chk_ack;299:300: wire [SEQ_WIDTH-1:0]301: ckf_in_seq = iiu_seq_num;302: wire [DATA_WIDTH:0]303: ckf_in_data1 = a_bus,304: ckf_in_data2 = b_bus;305: wire ckf_in_req = iiu_ckf_req,306: ckf_arb_ack = arbk_ack1,307: ckf_out_ack = log_chk_ack;308:309: wire [SEQ_WIDTH-1:0]310: ckm_in_seq = iiu_seq_num;311: wire [DATA_WIDTH:0]312: ckm_in_data1 = a_bus,313: ckm_in_data2 = b_bus;314: wire ckm_in_req = memdwb_chk_req,315: ckm_arb_ack = arbk_ack2,316: ckm_out_ack = log_chk_ack;317:318: wire [SEQ_WIDTH-1:0]319: ckr_in_seq = regdwb_out_seq;320: wire [DATA_WIDTH:0]321: ckr_in_data = regdwb_out_data;322: wire ckr_in_req = regdwb_chk_req,323: ckr_arb_ack = arbk_ack3,324: ckr_out_ack = log_chk_ack;325:326: wire arbk_req0 = cki_arb_req,327: arbk_req1 = ckf_arb_req,328: arbk_req2 = ckm_arb_req,329: arbk_req3 = ckr_arb_req;330:331: // ====================================================================332: // components333:334: pc pc (pc_load, pc_in_retry, pc_load_req, pc_load_ack, pc_out, pc_out_retry,335: pc_out_req, pc_out_ack, reset);336:337: imem imem (imem_addr, imem_retry, imem_in_req, imem_in_ack, imem_data,338: imem_out_req, imem_out_ack, imem_cancel_req, imem_cancel_ack, reset);339:340: iq iq (iq_in_data, iq_in_req, iq_in_ack, i_bus, iq_out_req,341: iq_out_ack, iq_en_out, iq_cancel_req, iq_cancel_ack, reset);342:343: iiu iiu (i_bus, iiu_inst_req, iiu_inst_ack, iiu_inst_en,344: iiu_chkbits, iiu_log_req, iiu_log_ack, iiu_cki_req, iiu_cki_ack,345: iiu_newpc_req, iiu_pc_ack, iiu_imem_ack, iiu_iq_ack, iiu_seq_num,346: iiu_reg_rd, iiu_reg_rs, iiu_reg_rt, iiu_res_req, iiu_res_ack,347: iiu_sim_f, a_bus, b_bus, iiu_rfile_req, iiu_rfile_ack,348: iiu_ckf_req, iiu_ckf_ack, iiu_alu_func,349: iiu_alu_req, iiu_alu_ack, iiu_mem_rw, iiu_mem_req, iiu_mem_ack,350: val_seq, halt_req, iiu_halt_ack, roll_req, iiu_roll_ack,351: iiu_retry, stop, reset);352:353: restable restable (restable_seq_num, restable_reg_w, restable_reg_r1,354: restable_reg_r2, restable_res_req, restable_res_ack,355: restable_reg_clr, restable_clr_req, restable_clr_ack,
105
ampire.v
356: val_seq, halt_req, restable_halt_ack, roll_req, restable_roll_ack,357: reset);358:359: alu alu (alu_in_seq, alu_in_dest, alu_in_d1, alu_in_d2, alu_in_func,360: alu_comp_req, alu_comp_ack, alu_arb_req, alu_arb_ack, r_bus_seq,361: r_bus_dest, r_bus_data, r_bus_req, alu_out_ack,362: val_seq, halt_req, alu_halt_ack, roll_req, alu_roll_ack, reset);363:364: memdwb memdwb (memdwb_in_seq, memdwb_in_reg, memdwb_in_addr, memdwb_in_data,365: memdwb_in_rw_mode, memdwb_in_retry, memdwb_in_req, memdwb_in_ack,366: memdwb_chk_req, memdwb_chk_ack, val_seq, memdwb_val_req,367: memdwb_val_ack, memdwb_out_addr, d_bus, memdwb_rw_mode,368: memdwb_out_retry, memdwb_out_req, memdwb_out_ack, memdwb_out_seq,369: memdwb_out_reg, tri_dq_req, memdwb_dq_ack,370: halt_req, memdwb_halt_ack, roll_req, memdwb_roll_ack, reset);371:372: dmem dmem (dmem_addr, d_bus, dmem_rw_mode, dmem_retry, dmem_in_req,373: dmem_in_ack, tri_dq_req, dmem_out_ack,374: halt_req, dmem_halt_ack, roll_req, dmem_roll_ack, reset);375:376: dq dq (dq_in_seq, dq_in_reg, dq_in_data, dq_in_req, dq_in_ack, dq_arb_req,377: dq_arb_ack, r_bus_seq, r_bus_dest, r_bus_data, r_bus_req, dq_out_ack,378: val_seq, halt_req, dq_halt_ack, roll_req, dq_roll_ack, reset);379:380: arbr arbr (arbr_req0, arbr_ack0, arbr_req1, arbr_ack1,381: halt_req, arbr_halt_ack, roll_req, arbr_roll_ack, reset);382:383: regdwb regdwb (regdwb_in_seq, regdwb_in_reg, regdwb_in_data, regdwb_in_req,384: regdwb_in_ack, regdwb_out_seq, regdwb_out_reg, regdwb_out_data,385: regdwb_chk_req, regdwb_chk_ack, regdwb_clr_req, regdwb_clr_ack,386: val_seq, regdwb_val_req, regdwb_val_ack,387: regdwb_rd_reg1, regdwb_rd_reg2, regdwb_sim_f, a_bus, b_bus,388: regdwb_rd_req, regdwb_rd_ack, regdwb_rf_data1, regdwb_rf_data2,389: regdwb_rdf_req, regdwb_rdf_ack, regdwb_wrf_reg, regdwb_wrf_data,390: regdwb_wrf_req, regdwb_wrf_ack,391: halt_req, regdwb_halt_ack, roll_req, regdwb_roll_ack, reset);392:393: regfile regfile (rfile_wr_reg, rfile_wr_data, rfile_wr_req, rfile_wr_ack,394: rfile_rd_reg1, rfile_rd_reg2, rfile_rd_data1, rfile_rd_data2,395: rfile_rd_req, rfile_rd_ack,396: halt_req, rfile_halt_ack, roll_req, rfile_roll_ack, reset);397:398: log log (log_seq, log_chkbits, a_bus, log_req, log_ack, log_chk_seq,399: log_chk_chkid, log_chk_error, log_chk_req, log_chk_ack,400: val_seq, log_valmem_req, log_valmem_ack, log_valreg_req,401: log_valreg_ack, halt_req, log_halt_ack, roll_req,402: log_roll_ack, reset);403:404: cki cki (cki_in_seq, cki_in_data, cki_in_req, cki_in_ack, cki_arb_req,405: cki_arb_ack, k_bus_seq, k_bus_chkid, k_bus_error, k_bus_req,406: cki_out_ack, val_seq, halt_req, cki_halt_ack, roll_req, cki_roll_ack,407: reset);408:409: ckf ckf (ckf_in_seq, ckf_in_data1, ckf_in_data2, ckf_in_req, ckf_in_ack,410: ckf_arb_req, ckf_arb_ack, k_bus_seq, k_bus_chkid, k_bus_error,411: k_bus_req, ckf_out_ack, val_seq, halt_req, ckf_halt_ack, roll_req,412: ckf_roll_ack, reset);413:414: ckm ckm (ckm_in_seq, ckm_in_data1, ckm_in_data2, ckm_in_req, ckm_in_ack,415: ckm_arb_req, ckm_arb_ack, k_bus_seq, k_bus_chkid, k_bus_error,416: k_bus_req, ckm_out_ack, val_seq, halt_req, ckm_halt_ack, roll_req,417: ckm_roll_ack, reset);418:419: ckr ckr (ckr_in_seq, ckr_in_data, ckr_in_req, ckr_in_ack, ckr_arb_req,420: ckr_arb_ack, k_bus_seq, k_bus_chkid, k_bus_error, k_bus_req,421: ckr_out_ack, val_seq, halt_req, ckr_halt_ack, roll_req, ckr_roll_ack,422: reset);423:424: arbk arbk (arbk_req0, arbk_ack0, arbk_req1, arbk_ack1,425: arbk_req2, arbk_ack2, arbk_req3, arbk_ack3,426: halt_req, arbk_halt_ack, roll_req, arbk_roll_ack, reset);
106
ampire.v
427:428: bigc c_halt (iiu_halt_ack, restable_halt_ack, alu_halt_ack, memdwb_halt_ack,429: dmem_halt_ack, dq_halt_ack, regdwb_halt_ack, rfile_halt_ack,430: cki_halt_ack, ckf_halt_ack, ckm_halt_ack, ckr_halt_ack,431: arbr_halt_ack, arbk_halt_ack, ch_out, reset);432:433: bigc c_roll (iiu_roll_ack, restable_roll_ack, alu_roll_ack, memdwb_roll_ack,434: dmem_roll_ack, dq_roll_ack, regdwb_roll_ack, rfile_roll_ack,435: cki_roll_ack, ckf_roll_ack, ckm_roll_ack, ckr_roll_ack,436: arbr_roll_ack, arbk_roll_ack, cr_out, reset);437:438: // ====================================================================439: // test setup440:441: initial442: begin443: $freeze_waves; // saves simulation time444: $gr_waves_memsize (1024 * 1024);445: $gr_position ("waves", 50, 0, 1050, 850);446:447: $gr_waves (448: "pc_load", pc_load_req, // instruction fetch449: "pc_out", pc_out,450: "im_req", pc_out_req,451: "im_ack", pc_out_ack,452: "im_inst", imem_data,453: "iq_req", imem_out_req,454: "iq_ack", imem_out_ack,455: "I_bus", i_bus,456: "iiu_req", iiu_inst_req,457: "iiu_ack", iiu_inst_ack,458:459: "seq_num", iiu_seq_num, // instruction preparation460: "reg_rs", iiu_reg_rs,461: "reg_rt", iiu_reg_rt,462: "reg_rd", iiu_reg_rd,463: "log_req", iiu_log_req,464: "log_ack", iiu_log_ack,465: "cki_req", iiu_cki_req,466: "res_req", iiu_res_req,467: "rf_req", iiu_rfile_req,468:469: "A_bus", a_bus, // instruction issue & exec470: "A_bus_L", a_bus [ADDR_WIDTH:0],471: "B_bus", b_bus,472: "func", iiu_alu_func,473: "alu_req", iiu_alu_req,474: "mem_req", iiu_mem_req,475:476: "dmem_rq", memdwb_out_req,477: "dq_req", dq_in_req,478:479: "R_seq", r_bus_seq, // register file access480: "R_data", r_bus_data,481: "R_dest", r_bus_dest,482: "alu_arb", alu_arb_req,483: "dq_arb", dq_arb_req,484:485: "clr_req", regdwb_clr_req,486: "wrf_reg", rfile_wr_reg,487: "wrf_req", rfile_wr_req,488:489: "cki_arb", cki_arb_req, // error detection490: "ckf_arb", ckf_arb_req,491: "ckm_arb", ckm_arb_req,492: "ckr_arb", ckr_arb_req,493: "chk_ack", log_chk_ack,494: "K_seq", k_bus_seq,495: "K_chkid", k_bus_chkid,496: "K_err", k_bus_error,497:
107
ampire.v
498: "val_seq", val_seq, // validation & rollback499: "val_mem", log_valmem_req,500: "val_reg", log_valreg_req,501: "hlt_req", halt_req,502: "hlt_ack", log_halt_ack,503: "rol_req", roll_req,504: "rol_ack", log_roll_ack);505:506: $define_group_waves (1, "tst1a",507: "pc_load", "pc_out", "im_req", "im_ack", "im_inst",508: "iq_req", "iq_ack", "I_bus", "iiu_req", "iiu_ack", "seq_num");509:510: $define_group_waves (2, "tst1b",511: "pc_load", "pc_out", "im_req", "im_inst", "iq_req", "I_bus",512: "iiu_req", "iiu_ack", "seq_num", "reg_rs", "reg_rt", "reg_rd",513: "log_req", "cki_req", "res_req", "rf_req", "func", "alu_req",514: "R_seq", "R_data", "R_dest", "alu_arb", "clr_req", "wrf_reg",515: "wrf_req", "K_seq", "K_chkid", "val_seq", "val_reg");516:517: $define_group_waves (3, "tst2a",518: "I_bus", "iiu_ack", "seq_num", "reg_rs", "reg_rt", "reg_rd",519: "log_req", "res_req", "rf_req", "alu_req", "mem_req", "dmem_rq",520: "dq_req", "R_seq", "R_data", "R_dest", "alu_arb", "dq_arb",521: "clr_req", "wrf_reg", "wrf_req", "val_seq", "val_reg", "val_mem");522:523: $define_group_waves (4, "tst2b",524: "seq_num", "log_req", "log_ack", "K_seq", "cki_arb", "ckf_arb",525: "ckm_arb", "ckr_arb", "chk_ack");526:527: $define_group_waves (5, "tst3a",528: "I_bus", "iiu_ack", "seq_num", "reg_rd", "log_req", "res_req",529: "rf_req", "alu_req", "mem_req", "dmem_rq", "dq_req", "R_seq",530: "R_data", "R_dest", "alu_arb", "dq_arb", "clr_req", "wrf_reg",531: "wrf_req", "val_seq", "val_reg");532:533: $define_group_waves (6, "tst4a",534: "pc_load", "pc_out", "I_bus", "iiu_req", "iiu_ack", "seq_num",535: "reg_rd", "log_req", "res_req", "rf_req", "func", "alu_req",536: "mem_req", "dmem_rq", "dq_req", "R_seq", "R_data", "R_dest",537: "alu_arb", "dq_arb", "clr_req", "wrf_reg", "wrf_req", "K_seq",538: "K_chkid", "K_err", "val_seq", "val_reg", "val_mem", "hlt_req");539:540: $define_group_waves (7, "tst4b",541: "pc_load", "pc_out", "I_bus", "iiu_req", "iiu_ack", "seq_num",542: "reg_rd", "log_req", "res_req", "rf_req", "func", "alu_req",543: "R_seq", "R_data", "R_dest", "alu_arb", "clr_req", "wrf_reg",544: "wrf_req", "K_seq", "K_chkid", "K_err", "val_seq", "val_reg",545: "hlt_req");546:547: $define_group_waves (8, "tst4c",548: "pc_load", "pc_out", "I_bus", "iiu_req", "iiu_ack", "seq_num",549: "reg_rd", "log_req", "res_req", "rf_req", "K_seq", "K_chkid",550: "K_err", "val_seq", "hlt_req");551:552: $define_group_waves (9, "tst4d",553: "pc_load", "pc_out", "I_bus", "iiu_req", "iiu_ack", "seq_num",554: "reg_rd", "log_req", "res_req", "rf_req", "func", "alu_req",555: "R_seq", "R_data", "R_dest", "alu_arb", "clr_req", "wrf_reg",556: "wrf_req", "K_seq", "K_chkid", "K_err", "val_seq", "val_reg",557: "hlt_req");558:559: $define_group_waves (10, "tst4e",560: "pc_load", "pc_out", "I_bus", "iiu_req", "iiu_ack", "seq_num",561: "reg_rd", "log_req", "res_req", "rf_req", "mem_req", "dmem_rq",562: "K_seq", "K_chkid", "K_err", "val_seq", "val_mem", "hlt_req");563:564: $define_group_waves (11, "tst4f",565: "pc_load", "pc_out", "I_bus", "iiu_req", "iiu_ack", "seq_num",566: "reg_rd", "log_req", "res_req", "rf_req", "mem_req", "dmem_rq",567: "dq_req", "R_seq", "R_data", "R_dest", "dq_arb", "clr_req",568: "wrf_req", "K_seq", "K_chkid", "K_err", "val_seq", "val_reg",
108
ampire.v
569: "hlt_req");570:571: $define_group_waves (12, "tst4g",572: "val_seq", "hlt_req", "hlt_ack", "rol_req", "rol_ack", "seq_num",573: "A_bus_L", "B_bus", "log_req", "cki_req", "pc_load", "pc_out");574:575: $define_group_waves (13, "tst5a",576: "pc_load", "pc_out", "I_bus", "iiu_req", "iiu_ack", "seq_num",577: "log_req", "res_req", "rf_req", "func", "alu_req", "mem_req",578: "dmem_rq", "dq_req", "R_seq", "R_dest", "alu_arb", "dq_arb",579: "clr_req", "wrf_reg", "wrf_req", "cki_arb", "ckf_arb", "ckm_arb",580: "ckr_arb", "K_seq", "K_chkid", "K_err", "val_seq", "val_reg",581: "val_mem", "hlt_req");582:583: $define_group_waves (20, "all",584: "pc_load", "pc_out", "im_req", "im_ack", "im_inst", "iq_req",585: "iq_ack", "I_bus", "iiu_req", "iiu_ack", "seq_num", "reg_rs",586: "reg_rt", "reg_rd", "log_req", "log_ack", "cki_req", "res_req",587: "rf_req", "A_bus", "A_bus_L", "B_bus", "func", "alu_req", "mem_req",588: "dmem_rq", "dq_req", "R_seq", "R_data", "R_dest", "alu_arb",589: "dq_arb", "clr_req", "wrf_reg", "wrf_req", "cki_arb", "ckf_arb",590: "ckm_arb", "ckr_arb", "chk_ack", "K_seq", "K_chkid", "K_err",591: "val_seq", "val_mem", "val_reg", "hlt_req", "hlt_ack", "rol_req",592: "rol_ack");593:594: #1; // for #0 $gr_addwaves595: $define_group_waves (15, "arbr",596: "R_seq", "alu_arb", "dq_arb", "none",597: "req0", "v0", "ack0", "req1", "v1", "ack1");598:599: $define_group_waves (16, "iq",600: "pc_load", "pc_out", "none", "in_d", "out_d", "in_r", "in_a",601: "out_r", "out_a", "can_r", "can_a", "none", "latch", "osc", "q1",602: "q2", "same1", "same2", "done", "regclk", "complet");603:604: $define_group_waves (17, "dq",605: "in_r", "cout1", "latch", "in_a", "a", "cout2", "out_r", "out_a",606: "h_req", "h_ack", "r_req", "clear1", "clear2", "r_ack", "none",607: "seq", "error", "req", "_a", "na", "b", "nb", "c", "nc", "d",608: "inv", "ack");609: end610:611: initial612: begin613: reset = 1;614: #RESET_TIME;615: reset = 0;616:617: fork618: begin // normal termination619: wait (stop);620: #400 $stop;621: end622: begin // abnormal termination623: #5000;624: $display ("AMPIRE: exceeded maximum simulation time.");625: $stop;626: end627: join628: end629:630: endmodule // ampire631:632: ‘include "pc.v"633: ‘include "imem.v"634: ‘include "iq.v"635: ‘include "iiu.v"636: ‘include "restable.v"637: ‘include "alu.v"638: ‘include "memdwb.v"639: ‘include "dmem.v"
109
ampire.v
640: ‘include "dq.v"641: ‘include "arbr.v"642: ‘include "regdwb.v"643: ‘include "regfile.v"644: ‘include "cki.v"645: ‘include "ckf.v"646: ‘include "ckm.v"647: ‘include "ckr.v"648: ‘include "arbk.v"649: ‘include "log.v"650: ‘include "bigc.v"651: ‘include "gates.v"
110
alu.v
1: // ALU (combining ALUQ and ALUF)2:3: module alu (in_seq, in_dest, in_d1, in_d2, in_func, comp_req, comp_ack,4: arb_req, arb_ack, tri_seq, tri_dest, tri_data, tri_req, out_ack,5: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);6:7: ‘include "parameter"8:9: input [SEQ_WIDTH-1:0] in_seq, val_seq;10: input [REG_WIDTH-1:0] in_dest;11: input [DATA_WIDTH:0] in_d1, in_d2;12: input [FUNC_WIDTH-1:0] in_func;13: output [SEQ_WIDTH-1:0] tri_seq;14: output [REG_WIDTH-1:0] tri_dest;15: output [DATA_WIDTH:0] tri_data;16: input comp_req, arb_ack, out_ack, halt_req, roll_req, reset;17: output comp_ack, arb_req, tri_req, halt_ack, roll_ack;18:19: reg halt_ack, roll_ack;20:21: wire [SEQ_WIDTH-1:0] mid_seq;22: wire [REG_WIDTH-1:0] mid_dest;23: wire [DATA_WIDTH:0] mid_d1, mid_d2;24: wire [FUNC_WIDTH-1:0] mid_func;25:26: aluq aluq (in_seq, in_dest, in_d1, in_d2, in_func, comp_req, comp_ack,27: mid_seq, mid_dest, mid_d1, mid_d2, mid_func, mid_req, mid_ack,28: val_seq, halt_req, haltq_ack, roll_req, rollq_ack, reset);29: aluf aluf (mid_seq, mid_dest, mid_d1, mid_d2, mid_func, mid_req, mid_ack,30: arb_req, arb_ack, tri_seq, tri_dest, tri_data, tri_req, out_ack,31: val_seq, halt_req, haltf_ack, roll_req, rollf_ack, reset);32:33: always wait (reset)34: begin35: halt_ack = 0;36: roll_ack = 0;37: wait (˜reset);38: end39:40: always @(haltq_ack or haltf_ack)41: if (haltq_ack & haltf_ack)42: halt_ack = 1;43: else if (˜haltq_ack & ˜haltf_ack)44: halt_ack = 0;45:46: always @(rollq_ack or rollf_ack)47: if (rollq_ack & rollf_ack)48: roll_ack = 1;49: else if (˜rollq_ack & ˜rollf_ack)50: roll_ack = 0;51:52: endmodule // alu53:54: // ====================================================================55: // ALUF (functional part)56:57: module aluf (in_seq, in_dest, in_d1, in_d2, in_func, comp_req, comp_ack,58: arb_req, arb_ack, tri_seq, tri_dest, tri_data, tri_req, out_ack,59: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);60:61: ‘include "parameter"62:63: input [SEQ_WIDTH-1:0] in_seq, val_seq;64: input [REG_WIDTH-1:0] in_dest;65: input [DATA_WIDTH:0] in_d1, in_d2;66: input [FUNC_WIDTH-1:0] in_func;67: output [SEQ_WIDTH-1:0] tri_seq;68: output [REG_WIDTH-1:0] tri_dest;69: output [DATA_WIDTH:0] tri_data;70: input comp_req, arb_ack, out_ack, halt_req, roll_req, reset;71: output comp_ack, arb_req, tri_req, halt_ack, roll_ack;
111
alu.v
72:73: reg [SEQ_WIDTH-1:0] out_seq;74: reg [REG_WIDTH-1:0] out_dest;75: reg [DATA_WIDTH:0] out_data;76: reg comp_ack, arb_req, out_req, halt_ack, roll_ack;77:78: reg [DATA_WIDTH-1:0] data1, data2;79: reg [FUNC_WIDTH-1:0] func;80: reg [SEQ_WIDTH-1:0] diff;81: reg [5:0] loop; // DATA_WIDTH = 32 bits max82: reg valid, parity;83:84: assign tri_seq = arb_ack ? out_seq : ZZZ;85: assign tri_dest = arb_ack ? out_dest : ZZZ;86: assign tri_data = arb_ack ? out_data : ZZZ;87: assign tri_req = arb_ack ? out_req : ZZZ;88:89: always wait (reset)90: begin91: disable rollback_cycle;92: disable latch_cycle;93: disable output_cycle;94: out_seq = XXX;95: out_dest = XXX;96: out_data = XXX;97: comp_ack = 0;98: arb_req = 0;99: out_req = 0;100: halt_ack = 0;101: roll_ack = 0;102: valid = 0;103: wait (˜reset);104: end105:106: always wait (halt_req & ˜reset)107: begin :rollback_cycle108: #1;109: halt_ack = 1;110: wait (roll_req);111: #1;112: disable latch_cycle;113: disable output_cycle;114: comp_ack = 0;115: arb_req = 0;116: out_req = 0;117:118: #DLY_SEQ_COMP;119: diff = out_seq - val_seq; // compare sequence numbers120: if (˜diff [SEQ_WIDTH-1])121: begin122: valid = 0;123: out_seq = XXX;124: out_dest = XXX;125: out_data = XXX;126: end127:128: roll_ack = 1;129: fork130: begin131: wait (˜roll_req);132: #1;133: roll_ack = 0;134: end135: begin136: wait (˜halt_req);137: #1;138: halt_ack = 0;139: end140: join141: end142:
112
alu.v
143: always wait (comp_req & ˜valid & ˜halt_req & ˜reset)144: begin :latch_cycle145: #1;146: wait (˜halt_req);147: data1 = in_d1 [DATA_WIDTH-1:0];148: data2 = in_d2 [DATA_WIDTH-1:0];149: func = in_func;150: out_seq = in_seq;151: out_dest = in_dest;152: comp_ack = 1;153: valid = 1;154: wait (˜comp_req);155: #1;156: wait (˜halt_req);157: comp_ack = 0;158: end159:160: always wait (valid & ˜halt_req & ˜reset)161: begin :output_cycle162: #1;163: compute;164: arb_req = 1;165: wait (arb_ack);166: #1;167: wait (˜halt_req);168: out_req = 1;169: wait (out_ack);170: #1;171: valid = 0;172: arb_req = 0;173: wait (˜halt_req);174: out_req = 0;175: wait (˜out_ack & ˜arb_ack);176: end177:178: task compute;179: begin180: case (func)181: FUNC_ADDU, FUNC_ADDUF:182: #DLY_ALU_ADD out_data = data1 + data2;183: FUNC_SUBU: #DLY_ALU_SUB out_data = data1 - data2;184:185: FUNC_AND: #DLY_ALU_AND out_data = data1 & data2;186: FUNC_OR: #DLY_ALU_OR out_data = data1 | data2;187: FUNC_XOR: #DLY_ALU_XOR out_data = data1 ˆ data2;188: FUNC_PASS: #DLY_ALU_PASS out_data = data2;189: FUNC_LHI: #DLY_ALU_SHIFT out_data = data2 << IMM_WIDTH;190: FUNC_SLL: #DLY_ALU_SHIFT out_data = data1 << (data2 % DATA_WIDTH);191: FUNC_SRL: #DLY_ALU_SHIFT out_data = data1 >> (data2 % DATA_WIDTH);192: FUNC_SRA: #DLY_ALU_SHIFT193: for (loop=0; loop<(data2 % DATA_WIDTH); loop=loop+1)194: out_data = {data1[DATA_WIDTH-1], data1[DATA_WIDTH-1:1]};195:196: FUNC_SLT: #DLY_ALU_COMP out_data = (data1 < data2);197: FUNC_SGT: #DLY_ALU_COMP out_data = (data1 > data2);198: FUNC_SLE: #DLY_ALU_COMP out_data = (data1 <= data2);199: FUNC_SGE: #DLY_ALU_COMP out_data = (data1 >= data2);200: FUNC_SEQ: #DLY_ALU_COMP out_data = (data1 == data2);201: FUNC_SNE: #DLY_ALU_COMP out_data = (data1 != data2);202:203: default: out_data = XXX;204: endcase205:206: parity = (func == FUNC_ADDUF); // bad parity207: for (loop=0; loop<DATA_WIDTH; loop=loop+1) // calculate parity208: parity = parity ˆ out_data [loop];209: out_data [DATA_WIDTH] = parity;210: end211: endtask212:213: endmodule // aluf
113
alu.v
214:215: // ====================================================================216: // ALU Input Queue217:218: module aluq (in_seq, in_dest, in_d1, in_d2, in_func, in_req, in_ack,219: out_seq, out_dest, out_d1, out_d2, out_func, out_req, out_ack,220: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);221:222: ‘include "parameter"223:224: input [SEQ_WIDTH-1:0] in_seq, val_seq;225: input [REG_WIDTH-1:0] in_dest;226: input [DATA_WIDTH:0] in_d1, in_d2;227: input [FUNC_WIDTH-1:0] in_func;228: output [SEQ_WIDTH-1:0] out_seq;229: output [REG_WIDTH-1:0] out_dest;230: output [DATA_WIDTH:0] out_d1, out_d2;231: output [FUNC_WIDTH-1:0] out_func;232: input in_req, out_ack, halt_req, roll_req, reset;233: output in_ack, out_req, halt_ack, roll_ack;234:235: reg halt_ack, roll_ack;236:237: wire [SEQ_WIDTH-1:0] seq1;238: wire [REG_WIDTH-1:0] dest1;239: wire [DATA_WIDTH:0] d1_1, d2_1;240: wire [FUNC_WIDTH-1:0] func1;241:242: aluq_buffer buf1 (in_seq, in_dest, in_d1, in_d2, in_func, in_req, in_ack,243: seq1, dest1, d1_1, d2_1, func1, req1, ack1,244: val_seq, halt_req, halt_a1, roll_req, roll_a1, reset);245: aluq_buffer buf2 (seq1, dest1, d1_1, d2_1, func1, req1, ack1, out_seq,246: out_dest, out_d1, out_d2, out_func, out_req, out_ack,247: val_seq, halt_req, halt_a2, roll_req, roll_a2, reset);248:249: always wait (reset)250: begin251: halt_ack = 0;252: roll_ack = 0;253: wait (˜reset);254: end255:256: always @(halt_a1 or halt_a2)257: if (halt_a1 & halt_a2)258: halt_ack = 1;259: else if (˜halt_a1 & ˜halt_a2)260: halt_ack = 0;261:262: always @(roll_a1 or roll_a2)263: if (roll_a1 & roll_a2)264: roll_ack = 1;265: else if (˜roll_a1 & ˜roll_a2)266: roll_ack = 0;267:268: endmodule // aluq269:270: // ====================================================================271:272: module aluq_buffer (in_s, in_ds, in_d1, in_d2, in_f, in_r, in_a,273: out_s, out_ds, out_d1, out_d2, out_f, out_r, out_a,274: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);275:276: ‘include "parameter"277:278: input [SEQ_WIDTH-1:0] in_s, val_seq;279: input [REG_WIDTH-1:0] in_ds;280: input [DATA_WIDTH:0] in_d1, in_d2;281: input [FUNC_WIDTH-1:0] in_f;282: output [SEQ_WIDTH-1:0] out_s;283: output [REG_WIDTH-1:0] out_ds;284: output [DATA_WIDTH:0] out_d1, out_d2;
114
alu.v
285: output [FUNC_WIDTH-1:0] out_f;286: input in_r, out_a, halt_req, roll_req, reset;287: output in_a, out_r, halt_ack, roll_ack;288:289: reg [SEQ_WIDTH-1:0] out_s;290: reg [REG_WIDTH-1:0] out_ds;291: reg [DATA_WIDTH:0] out_d1, out_d2;292: reg [FUNC_WIDTH-1:0] out_f;293: reg in_a, out_r, halt_ack, roll_ack;294:295: reg [SEQ_WIDTH-1:0] diff;296: reg valid;297:298: always wait (reset)299: begin300: disable rollback_cycle;301: disable input_cycle;302: disable output_cycle;303: out_s = XXX;304: out_ds = XXX;305: out_d1 = XXX;306: out_d2 = XXX;307: out_f = XXX;308: in_a = 0;309: out_r = 0;310: halt_ack = 0;311: roll_ack = 0;312: valid = 0;313: wait (˜reset);314: end315:316: always wait (halt_req & ˜reset)317: begin :rollback_cycle318: #1;319: halt_ack = 1;320: wait (roll_req);321: #1;322: disable input_cycle;323: disable output_cycle;324: in_a = 0;325: out_r = 0;326:327: #DLY_SEQ_COMP;328: diff = out_s - val_seq; // compare sequence numbers329: if (˜diff [SEQ_WIDTH-1])330: begin331: valid = 0;332: out_s = XXX;333: out_ds = XXX;334: out_d1 = XXX;335: out_d2 = XXX;336: out_f = XXX;337: end338:339: roll_ack = 1;340: fork341: begin342: wait (˜roll_req);343: #1;344: roll_ack = 0;345: end346: begin347: wait (˜halt_req);348: #1;349: halt_ack = 0;350: end351: join352: end353:354: always wait (in_r & ˜valid & ˜halt_req & ˜reset)355: begin :input_cycle
115
alu.v
356: #1;357: wait (˜halt_req);358: out_s = in_s;359: out_ds = in_ds;360: out_d1 = in_d1;361: out_d2 = in_d2;362: out_f = in_f;363: in_a = 1;364: valid = 1;365: wait (˜in_r);366: #1;367: wait (˜halt_req);368: in_a = 0;369: end370:371: always wait (valid & ˜halt_req & ˜reset)372: begin :output_cycle373: #1;374: wait (˜halt_req);375: out_r = 1;376: wait (out_a);377: #1;378: valid = 0;379: wait (˜halt_req);380: out_r = 0;381: wait (˜out_a);382: end383:384: endmodule // aluq_buffer
116
arbk.v
1: // Arbiter for K_bus2:3: module arbk (req0, ack0, req1, ack1, req2, ack2, req3, ack3,4: halt_req, halt_ack, roll_req, roll_ack, reset);5:6: ‘include "parameter"7:8: input req0, req1, req2, req3, halt_req, roll_req, reset;9: output ack0, ack1, ack2, ack3, halt_ack, roll_ack;10:11: reg ack0, ack1, ack2, ack3, halt_ack, roll_ack;12:13: reg [CHKID_WIDTH-1:0] grant;14: reg found;15:16: always wait (reset)17: begin18: disable rollback_cycle;19: disable arb_cycle;20: ack0 = 0;21: ack1 = 0;22: ack2 = 0;23: ack3 = 0;24: halt_ack = 0;25: roll_ack = 0;26: grant = 0;27: wait (˜reset);28: end29:30: always wait (halt_req & ˜reset)31: begin :rollback_cycle32: #1;33: halt_ack = 1;34: wait (roll_req);35: #1;36: disable arb_cycle;37: ack0 = 0;38: ack1 = 0;39: ack2 = 0;40: ack3 = 0;41: grant = 0;42:43: roll_ack = 1;44: fork45: begin46: wait (˜roll_req);47: #1;48: roll_ack = 0;49: end50: begin51: wait (˜halt_req);52: #1;53: halt_ack = 0;54: end55: join56: end57:58: always wait ((req0 | req1 | req2 | req3) & ˜halt_req & ˜reset)59: begin :arb_cycle60: found = 0;61: while (˜found)62: begin63: grant = grant + 1; // scan request signals64: if (grant >= CHECKERS)65: grant = 0;66: case (grant)67: 0: if (req0) found = 1;68: 1: if (req1) found = 1;69: 2: if (req2) found = 1;70: 3: if (req3) found = 1;71: endcase
117
arbk.v
72: end73:74: #1;75: case (grant)76: 0: begin77: ack0 = 1;78: wait (˜req0);79: #1;80: ack0 = 0;81: end82: 1: begin83: ack1 = 1;84: wait (˜req1);85: #1;86: ack1 = 0;87: end88: 2: begin89: ack2 = 1;90: wait (˜req2);91: #1;92: ack2 = 0;93: end94: 3: begin95: ack3 = 1;96: wait (˜req3);97: #1;98: ack3 = 0;99: end100: endcase101: end102:103: endmodule // arbk
118
arbr.v
1: // Arbiter for R_bus2:3: module arbr (req0, ack0, req1, ack1,4: halt_req, halt_ack, roll_req, roll_ack, reset);5:6: ‘include "parameter"7:8: input req0, req1, halt_req, roll_req, reset;9: output ack0, ack1, halt_ack, roll_ack;10:11: ‘ifdef BEHAV_ARBR // behavioral level12:13: reg ack0, ack1, halt_ack, roll_ack;14:15: reg grant;16:17: always wait (reset)18: begin19: disable rollback_cycle;20: disable arb_cycle;21: ack0 = 0;22: ack1 = 0;23: halt_ack = 0;24: roll_ack = 0;25: grant = 0;26: wait (˜reset);27: end28:29: always wait (halt_req & ˜reset)30: begin :rollback_cycle31: #1;32: halt_ack = 1;33: wait (roll_req);34: #1;35: disable arb_cycle;36: ack0 = 0;37: ack1 = 0;38: grant = 0;39:40: roll_ack = 1;41: fork42: begin43: wait (˜roll_req);44: #1;45: roll_ack = 0;46: end47: begin48: wait (˜halt_req);49: #1;50: halt_ack = 0;51: end52: join53: end54:55: always wait ((req0 | req1) & ˜halt_req & ˜reset)56: begin :arb_cycle57: if (req0 & req1)58: grant = ˜grant; // switch to other one59: else60: grant = req1; // 0 if not req161:62: #1;63: if (grant)64: begin65: ack1 = 1;66: wait (˜req1);67: #1;68: ack1 = 0;69: end70: else71: begin
119
arbr.v
72: ack0 = 1;73: wait (˜req0);74: #1;75: ack0 = 0;76: end77: end78:79: ‘else // gate level80:81: and #1 g1 (v0, req0, ˜v1);82: and #2 g2 (v1, req1, ˜v0);83: not #1 g3 (ack0, nack0);84: not #1 g4 (ack1, nack1);85: nmos #1 m1 (nack1, v0, v1);86: nmos #1 m2 (nack0, v1, v0);87: pullup (nack0), (nack1);88:89: buf #1 g5 (halt_ack, halt_req);90: and #1 g6 (roll_ack, roll_req, ˜ack0, ˜ack1);91:92: ‘ifdef WAVE_ARBR93: initial94: #0 $gr_addwaves ("req0", req0, "v0", v0, "ack0", ack0,95: "req1", req1, "v1", v1, "ack1", ack1);96: ‘endif97:98: ‘endif // BEHAV_ARBR99:100: endmodule // arbr
120
bigc.v
1: // Big C-element: used for halt_ack and roll_ack2:3: module bigc (in1, in2, in3, in4, in5, in6, in7, in8, in9, in10, in11, in12,4: in13, in14, out, reset);5:6: input in1, in2, in3, in4, in5, in6, in7, in8, in9, in10, in11, in12, in13,7: in14;8: output out;9: input reset;10:11: ‘ifdef BEHAV_BIGC // behavioral level12:13: reg out;14:15: always wait (reset)16: begin17: out = 0;18: wait (˜reset);19: end20:21: always @(in1 or in2 or in3 or in4 or in5 or in6 or in7 or in8 or in9 or22: in10 or in11 or in12 or in13 or in14)23: if (in1 & in2 & in3 & in4 & in5 & in6 & in7 & in8 & in9 & in10 &24: in11 & in12 & in13 & in14)25: out = 1;26: else if (˜in1 & ˜in2 & ˜in3 & ˜in4 & ˜in5 & ˜in6 & ˜in7 & ˜in8 &27: ˜in9 & ˜in10 & ˜in11 & ˜in12 & ˜in13 & ˜in14)28: out = 0;29:30: ‘else // gate level31:32: muller4 x1 (in1, in2, in3, in4, out1);33: muller4 x2 (in5, in6, in7, in8, out2);34: muller3 x3 (in9, in10, in11, out3);35: muller3 x4 (in12, in13, in14, out4);36: muller4 x5 (out1, out2, out3, out4, out);37:38: ‘endif39:40: endmodule // bigc
121
ckf.v
1: // CKF (combining CKFQ and CKFF)2:3: module ckf (in_seq, in_data1, in_data2, in_req, in_ack, arb_req, arb_ack,4: tri_seq, tri_chkid, tri_error, tri_req, out_ack,5: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);6:7: ‘include "parameter"8:9: input [SEQ_WIDTH-1:0] in_seq, val_seq;10: input [DATA_WIDTH:0] in_data1, in_data2;11: output [SEQ_WIDTH-1:0] tri_seq;12: output [CHKID_WIDTH-1:0] tri_chkid;13: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;14: output in_ack, arb_req, tri_error, tri_req, halt_ack, roll_ack;15:16: reg halt_ack, roll_ack;17:18: wire [SEQ_WIDTH-1:0] mid_seq;19: wire [DATA_WIDTH:0] mid_data1, mid_data2;20:21: ckfq ckfq (in_seq, in_data1, in_data2, in_req, in_ack,22: mid_seq, mid_data1, mid_data2, mid_req, mid_ack,23: val_seq, halt_req, haltq_ack, roll_req, rollq_ack, reset);24: ckff ckff (mid_seq, mid_data1, mid_data2, mid_req, mid_ack, arb_req, arb_ack,25: tri_seq, tri_chkid, tri_error, tri_req, out_ack,26: val_seq, halt_req, haltf_ack, roll_req, rollf_ack, reset);27:28: always wait (reset)29: begin30: halt_ack = 0;31: roll_ack = 0;32: wait (˜reset);33: end34:35: always @(haltq_ack or haltf_ack)36: if (haltq_ack & haltf_ack)37: halt_ack = 1;38: else if (˜haltq_ack & ˜haltf_ack)39: halt_ack = 0;40:41: always @(rollq_ack or rollf_ack)42: if (rollq_ack & rollf_ack)43: roll_ack = 1;44: else if (˜rollq_ack & ˜rollf_ack)45: roll_ack = 0;46:47: endmodule // ckf48:49: // ====================================================================50: // Checker for Register File (functional part)51:52: module ckff (in_seq, in_data1, in_data2, in_req, in_ack, arb_req, arb_ack,53: tri_seq, tri_chkid, tri_error, tri_req, out_ack,54: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);55:56: ‘include "parameter"57:58: input [SEQ_WIDTH-1:0] in_seq, val_seq;59: input [DATA_WIDTH:0] in_data1, in_data2;60: output [SEQ_WIDTH-1:0] tri_seq;61: output [CHKID_WIDTH-1:0] tri_chkid;62: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;63: output in_ack, arb_req, tri_error, tri_req, halt_ack, roll_ack;64:65: reg [SEQ_WIDTH-1:0] out_seq;66: reg in_ack, arb_req, out_error, out_req, halt_ack, roll_ack;67:68: reg [SEQ_WIDTH-1:0] diff;69: reg [DATA_WIDTH:0] data1_buf, data2_buf;70: reg [5:0] loop;71: reg valid, parity1, parity2;
122
ckf.v
72:73: assign tri_seq = arb_ack ? out_seq : ZZZ;74: assign tri_chkid = arb_ack ? ID_CKF : ZZZ;75: assign tri_error = arb_ack ? out_error : ZZZ;76: assign tri_req = arb_ack ? out_req : ZZZ;77:78: always wait (reset)79: begin80: disable rollback_cycle;81: disable latch_cycle;82: disable output_cycle;83: out_seq = XXX;84: in_ack = 0;85: arb_req = 0;86: out_error = XXX;87: out_req = 0;88: halt_ack = 0;89: roll_ack = 0;90: valid = 0;91: wait (˜reset);92: end93:94: always wait (halt_req & ˜reset)95: begin :rollback_cycle96: #1;97: halt_ack = 1;98: wait (roll_req);99: #1;100: disable latch_cycle;101: disable output_cycle;102: in_ack = 0;103: arb_req = 0;104: out_req = 0;105:106: #DLY_SEQ_COMP;107: diff = out_seq - val_seq; // compare sequence numbers108: if (˜diff [SEQ_WIDTH-1])109: begin110: valid = 0;111: out_seq = XXX;112: out_error = XXX;113: end114:115: roll_ack = 1;116: fork117: begin118: wait (˜roll_req);119: #1;120: roll_ack = 0;121: end122: begin123: wait (˜halt_req);124: #1;125: halt_ack = 0;126: end127: join128: end129:130: always wait (in_req & ˜valid & ˜halt_req & ˜reset)131: begin :latch_cycle132: #1;133: wait (˜halt_req);134: out_seq = in_seq;135: data1_buf = in_data1;136: data2_buf = in_data2;137: in_ack = 1;138: valid = 1;139: wait (˜in_req);140: #1;141: wait (˜halt_req);142: in_ack = 0;
123
ckf.v
143: end144:145: always wait (valid & ˜halt_req & ˜reset)146: begin :output_cycle147: #1;148: parity1 = 0;149: parity2 = 0;150: for (loop=0; loop<1+DATA_WIDTH; loop=loop+1) // calculate parity151: begin152: parity1 = parity1 ˆ data1_buf [loop];153: parity2 = parity2 ˆ data2_buf [loop];154: end155: if (parity1 === 1’bx) parity1 = 1;156: if (parity2 === 1’bx) parity2 = 1;157:158: #DLY_CKF_CHK;159: out_error = parity1 | parity2;160: arb_req = 1;161: wait (arb_ack);162: #1;163: wait (˜halt_req);164: out_req = 1;165: wait (out_ack);166: #1;167: valid = 0;168: arb_req = 0;169: wait (˜halt_req);170: out_req = 0;171: wait (˜out_ack & ˜arb_ack);172: end173:174: endmodule // ckff175:176: // ====================================================================177: // Queue for Register File Checker178:179: module ckfq (in_seq, in_data1, in_data2, in_req, in_ack,180: out_seq, out_data1, out_data2, out_req, out_ack,181: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);182:183: ‘include "parameter"184:185: input [SEQ_WIDTH-1:0] in_seq, val_seq;186: input [DATA_WIDTH:0] in_data1, in_data2;187: output [SEQ_WIDTH-1:0] out_seq;188: output [DATA_WIDTH:0] out_data1, out_data2;189: input in_req, out_ack, halt_req, roll_req, reset;190: output in_ack, out_req, halt_ack, roll_ack;191:192: reg halt_ack, roll_ack;193:194: wire [SEQ_WIDTH-1:0] seq1;195: wire [DATA_WIDTH:0] data11, data21;196:197: ckf_buf buf1 (in_seq, in_data1, in_data2, in_req, in_ack,198: seq1, data11, data21, req1, ack1,199: val_seq, halt_req, halt_a1, roll_req, roll_a1, reset);200: ckf_buf buf2 (seq1, data11, data21, req1, ack1,201: out_seq, out_data1, out_data2, out_req, out_ack,202: val_seq, halt_req, halt_a2, roll_req, roll_a2, reset);203:204: always wait (reset)205: begin206: halt_ack = 0;207: roll_ack = 0;208: wait (˜reset);209: end210:211: always @(halt_a1 or halt_a2)212: if (halt_a1 & halt_a2)213: halt_ack = 1;
124
ckf.v
214: else if (˜halt_a1 & ˜halt_a2)215: halt_ack = 0;216:217: always @(roll_a1 or roll_a2)218: if (roll_a1 & roll_a2)219: roll_ack = 1;220: else if (˜roll_a1 & ˜roll_a2)221: roll_ack = 0;222:223: endmodule // ckfq224:225: // ====================================================================226:227: module ckf_buf (in_s, in_d1, in_d2, in_r, in_a,228: out_s, out_d1, out_d2, out_r, out_a,229: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);230:231: ‘include "parameter"232:233: input [SEQ_WIDTH-1:0] in_s, val_seq;234: input [DATA_WIDTH:0] in_d1, in_d2;235: output [SEQ_WIDTH-1:0] out_s;236: output [DATA_WIDTH:0] out_d1, out_d2;237: input in_r, out_a, halt_req, roll_req, reset;238: output in_a, out_r, halt_ack, roll_ack;239:240: reg [SEQ_WIDTH-1:0] out_s;241: reg [DATA_WIDTH:0] out_d1, out_d2;242: reg in_a, out_r, halt_ack, roll_ack;243:244: reg [SEQ_WIDTH-1:0] diff;245: reg valid;246:247: always wait (reset)248: begin249: disable rollback_cycle;250: disable input_cycle;251: disable output_cycle;252: out_s = XXX;253: out_d1 = XXX;254: out_d2 = XXX;255: in_a = 0;256: out_r = 0;257: halt_ack = 0;258: roll_ack = 0;259: valid = 0;260: wait (˜reset);261: end262:263: always wait (halt_req & ˜reset)264: begin :rollback_cycle265: #1;266: halt_ack = 1;267: wait (roll_req);268: #1;269: disable input_cycle;270: disable output_cycle;271: in_a = 0;272: out_r = 0;273:274: #DLY_SEQ_COMP;275: diff = out_s - val_seq; // compare sequence numbers276: if (˜diff [SEQ_WIDTH-1])277: begin278: valid = 0;279: out_s = XXX;280: out_d1 = XXX;281: out_d2 = XXX;282: end283:284: roll_ack = 1;
125
ckf.v
285: fork286: begin287: wait (˜roll_req);288: #1;289: roll_ack = 0;290: end291: begin292: wait (˜halt_req);293: #1;294: halt_ack = 0;295: end296: join297: end298:299: always wait (in_r & ˜valid & ˜halt_req & ˜reset)300: begin :input_cycle301: #1;302: wait (˜halt_req);303: out_s = in_s;304: out_d1 = in_d1;305: out_d2 = in_d2;306: in_a = 1;307: valid = 1;308: wait (˜in_r);309: #1;310: wait (˜halt_req);311: in_a = 0;312: end313:314: always wait (valid & ˜halt_req & ˜reset)315: begin :output_cycle316: #1;317: wait (˜halt_req);318: out_r = 1;319: wait (out_a);320: #1;321: valid = 0;322: wait (˜halt_req);323: out_r = 0;324: wait (˜out_a);325: end326:327: endmodule // ckf_buf
126
cki.v
1: // CKI (combining CKIQ and CKIF)2:3: module cki (in_seq, in_data, in_req, in_ack, arb_req, arb_ack,4: tri_seq, tri_chkid, tri_error, tri_req, out_ack,5: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);6:7: ‘include "parameter"8:9: input [SEQ_WIDTH-1:0] in_seq, val_seq;10: input [DATA_WIDTH:0] in_data;11: output [SEQ_WIDTH-1:0] tri_seq;12: output [CHKID_WIDTH-1:0] tri_chkid;13: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;14: output in_ack, arb_req, tri_error, tri_req, halt_ack, roll_ack;15:16: reg halt_ack, roll_ack;17:18: wire [SEQ_WIDTH-1:0] mid_seq;19: wire [DATA_WIDTH:0] mid_data;20:21: ckiq ckiq (in_seq, in_data, in_req, in_ack,22: mid_seq, mid_data, mid_req, mid_ack,23: val_seq, halt_req, haltq_ack, roll_req, rollq_ack, reset);24: ckif ckif (mid_seq, mid_data, mid_req, mid_ack, arb_req, arb_ack,25: tri_seq, tri_chkid, tri_error, tri_req, out_ack,26: val_seq, halt_req, haltf_ack, roll_req, rollf_ack, reset);27:28: always wait (reset)29: begin30: halt_ack = 0;31: roll_ack = 0;32: wait (˜reset);33: end34:35: always @(haltq_ack or haltf_ack)36: if (haltq_ack & haltf_ack)37: halt_ack = 1;38: else if (˜haltq_ack & ˜haltf_ack)39: halt_ack = 0;40:41: always @(rollq_ack or rollf_ack)42: if (rollq_ack & rollf_ack)43: roll_ack = 1;44: else if (˜rollq_ack & ˜rollf_ack)45: roll_ack = 0;46:47: endmodule // cki48:49: // ====================================================================50: // Checker for I_bus (functional part)51:52: module ckif (in_seq, in_data, in_req, in_ack, arb_req, arb_ack,53: tri_seq, tri_chkid, tri_error, tri_req, out_ack,54: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);55:56: ‘include "parameter"57:58: input [SEQ_WIDTH-1:0] in_seq, val_seq;59: input [DATA_WIDTH:0] in_data;60: output [SEQ_WIDTH-1:0] tri_seq;61: output [CHKID_WIDTH-1:0] tri_chkid;62: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;63: output in_ack, arb_req, tri_error, tri_req, halt_ack, roll_ack;64:65: reg [SEQ_WIDTH-1:0] out_seq;66: reg in_ack, arb_req, out_error, out_req, halt_ack, roll_ack;67:68: reg [SEQ_WIDTH-1:0] diff;69: reg [DATA_WIDTH:0] data_buf;70: reg [5:0] loop;71: reg valid, parity;
127
cki.v
72:73: assign tri_seq = arb_ack ? out_seq : ZZZ;74: assign tri_chkid = arb_ack ? ID_CKI : ZZZ;75: assign tri_error = arb_ack ? out_error : ZZZ;76: assign tri_req = arb_ack ? out_req : ZZZ;77:78: always wait (reset)79: begin80: disable rollback_cycle;81: disable latch_cycle;82: disable output_cycle;83: out_seq = XXX;84: in_ack = 0;85: arb_req = 0;86: out_error = XXX;87: out_req = 0;88: halt_ack = 0;89: roll_ack = 0;90: valid = 0;91: wait (˜reset);92: end93:94: always wait (halt_req & ˜reset)95: begin :rollback_cycle96: #1;97: halt_ack = 1;98: wait (roll_req);99: #1;100: disable latch_cycle;101: disable output_cycle;102: in_ack = 0;103: arb_req = 0;104: out_req = 0;105:106: #DLY_SEQ_COMP;107: diff = out_seq - val_seq; // compare sequence numbers108: if (˜diff [SEQ_WIDTH-1])109: begin110: valid = 0;111: out_seq = XXX;112: out_error = XXX;113: end114:115: roll_ack = 1;116: fork117: begin118: wait (˜roll_req);119: #1;120: roll_ack = 0;121: end122: begin123: wait (˜halt_req);124: #1;125: halt_ack = 0;126: end127: join128: end129:130: always wait (in_req & ˜valid & ˜halt_req & ˜reset)131: begin :latch_cycle132: #1;133: wait (˜halt_req);134: out_seq = in_seq;135: data_buf = in_data;136: in_ack = 1;137: valid = 1;138: wait (˜in_req);139: #1;140: wait (˜halt_req);141: in_ack = 0;142: end
128
cki.v
143:144: always wait (valid & ˜halt_req & ˜reset)145: begin :output_cycle146: #1;147: parity = 0;148: for (loop=0; loop<1+DATA_WIDTH; loop=loop+1) // calculate parity149: parity = parity ˆ data_buf [loop];150: if (parity === 1’bx) parity = 1;151:152: #DLY_CKI_CHK;153: out_error = parity;154: arb_req = 1;155: wait (arb_ack);156: #1;157: wait (˜halt_req);158: out_req = 1;159: wait (out_ack);160: #1;161: valid = 0;162: arb_req = 0;163: wait (˜halt_req);164: out_req = 0;165: wait (˜out_ack & ˜arb_ack);166: end167:168: endmodule // ckif169:170: // ====================================================================171: // Queue for I_bus Checker172:173: module ckiq (in_seq, in_data, in_req, in_ack,174: out_seq, out_data, out_req, out_ack,175: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);176:177: ‘include "parameter"178:179: input [SEQ_WIDTH-1:0] in_seq, val_seq;180: input [DATA_WIDTH:0] in_data;181: output [SEQ_WIDTH-1:0] out_seq;182: output [DATA_WIDTH:0] out_data;183: input in_req, out_ack, halt_req, roll_req, reset;184: output in_ack, out_req, halt_ack, roll_ack;185:186: reg halt_ack, roll_ack;187:188: wire [SEQ_WIDTH-1:0] seq1;189: wire [DATA_WIDTH:0] data1;190:191: cki_buf buf1 (in_seq, in_data, in_req, in_ack, seq1, data1, req1, ack1,192: val_seq, halt_req, halt_a1, roll_req, roll_a1, reset);193: cki_buf buf2 (seq1, data1, req1, ack1, out_seq, out_data, out_req, out_ack,194: val_seq, halt_req, halt_a2, roll_req, roll_a2, reset);195:196: always wait (reset)197: begin198: halt_ack = 0;199: roll_ack = 0;200: wait (˜reset);201: end202:203: always @(halt_a1 or halt_a2)204: if (halt_a1 & halt_a2)205: halt_ack = 1;206: else if (˜halt_a1 & ˜halt_a2)207: halt_ack = 0;208:209: always @(roll_a1 or roll_a2)210: if (roll_a1 & roll_a2)211: roll_ack = 1;212: else if (˜roll_a1 & ˜roll_a2)213: roll_ack = 0;
129
cki.v
214:215: endmodule // ckiq216:217: // ====================================================================218:219: module cki_buf (in_s, in_d, in_r, in_a, out_s, out_d, out_r, out_a,220: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);221:222: ‘include "parameter"223:224: input [SEQ_WIDTH-1:0] in_s, val_seq;225: input [DATA_WIDTH:0] in_d;226: output [SEQ_WIDTH-1:0] out_s;227: output [DATA_WIDTH:0] out_d;228: input in_r, out_a, halt_req, roll_req, reset;229: output in_a, out_r, halt_ack, roll_ack;230:231: reg [SEQ_WIDTH-1:0] out_s;232: reg [DATA_WIDTH:0] out_d;233: reg in_a, out_r, halt_ack, roll_ack;234:235: reg [SEQ_WIDTH-1:0] diff;236: reg valid;237:238: always wait (reset)239: begin240: disable rollback_cycle;241: disable input_cycle;242: disable output_cycle;243: out_s = XXX;244: out_d = XXX;245: in_a = 0;246: out_r = 0;247: halt_ack = 0;248: roll_ack = 0;249: valid = 0;250: wait (˜reset);251: end252:253: always wait (halt_req & ˜reset)254: begin :rollback_cycle255: #1;256: halt_ack = 1;257: wait (roll_req);258: #1;259: disable input_cycle;260: disable output_cycle;261: in_a = 0;262: out_r = 0;263:264: #DLY_SEQ_COMP;265: diff = out_s - val_seq; // compare sequence numbers266: if (˜diff [SEQ_WIDTH-1])267: begin268: valid = 0;269: out_s = XXX;270: out_d = XXX;271: end272:273: roll_ack = 1;274: fork275: begin276: wait (˜roll_req);277: #1;278: roll_ack = 0;279: end280: begin281: wait (˜halt_req);282: #1;283: halt_ack = 0;284: end
130
cki.v
285: join286: end287:288: always wait (in_r & ˜valid & ˜halt_req & ˜reset)289: begin :input_cycle290: #1;291: wait (˜halt_req);292: out_s = in_s;293: out_d = in_d;294: in_a = 1;295: valid = 1;296: wait (˜in_r);297: #1;298: wait (˜halt_req);299: in_a = 0;300: end301:302: always wait (valid & ˜halt_req & ˜reset)303: begin :output_cycle304: #1;305: wait (˜halt_req);306: out_r = 1;307: wait (out_a);308: #1;309: valid = 0;310: wait (˜halt_req);311: out_r = 0;312: wait (˜out_a);313: end314:315: endmodule // cki_buf
131
ckm.v
1: // CKM (combining CKMQ and CKMF)2:3: module ckm (in_seq, in_addr, in_data, in_req, in_ack, arb_req, arb_ack,4: tri_seq, tri_chkid, tri_error, tri_req, out_ack,5: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);6:7: ‘include "parameter"8:9: input [SEQ_WIDTH-1:0] in_seq, val_seq;10: input [DATA_WIDTH:0] in_addr, in_data;11: output [SEQ_WIDTH-1:0] tri_seq;12: output [CHKID_WIDTH-1:0] tri_chkid;13: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;14: output in_ack, arb_req, tri_error, tri_req, halt_ack, roll_ack;15:16: reg halt_ack, roll_ack;17:18: wire [ADDR_WIDTH:0] adpt_addr = in_addr; // addr "adaptor"19: wire [SEQ_WIDTH-1:0] mid_seq;20: wire [ADDR_WIDTH:0] mid_addr;21: wire [DATA_WIDTH:0] mid_data;22:23: ckmq ckmq (in_seq, adpt_addr, in_data, in_req, in_ack,24: mid_seq, mid_addr, mid_data, mid_req, mid_ack,25: val_seq, halt_req, haltq_ack, roll_req, rollq_ack, reset);26: ckmf ckmf (mid_seq, mid_addr, mid_data, mid_req, mid_ack, arb_req, arb_ack,27: tri_seq, tri_chkid, tri_error, tri_req, out_ack,28: val_seq, halt_req, haltf_ack, roll_req, rollf_ack, reset);29:30: always wait (reset)31: begin32: halt_ack = 0;33: roll_ack = 0;34: wait (˜reset);35: end36:37: always @(haltq_ack or haltf_ack)38: if (haltq_ack & haltf_ack)39: halt_ack = 1;40: else if (˜haltq_ack & ˜haltf_ack)41: halt_ack = 0;42:43: always @(rollq_ack or rollf_ack)44: if (rollq_ack & rollf_ack)45: roll_ack = 1;46: else if (˜rollq_ack & ˜rollf_ack)47: roll_ack = 0;48:49: endmodule // ckm50:51: // ====================================================================52: // Checker for Memory Operations (functional part)53:54: module ckmf (in_seq, in_addr, in_data, in_req, in_ack, arb_req, arb_ack,55: tri_seq, tri_chkid, tri_error, tri_req, out_ack,56: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);57:58: ‘include "parameter"59:60: input [SEQ_WIDTH-1:0] in_seq, val_seq;61: input [ADDR_WIDTH:0] in_addr;62: input [DATA_WIDTH:0] in_data;63: output [SEQ_WIDTH-1:0] tri_seq;64: output [CHKID_WIDTH-1:0] tri_chkid;65: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;66: output in_ack, arb_req, tri_error, tri_req, halt_ack, roll_ack;67:68: reg [SEQ_WIDTH-1:0] out_seq;69: reg in_ack, arb_req, out_error, out_req, halt_ack, roll_ack;70:71: reg [SEQ_WIDTH-1:0] diff;
132
ckm.v
72: reg [ADDR_WIDTH:0] addr_buf;73: reg [DATA_WIDTH:0] data_buf;74: reg [5:0] loop;75: reg valid, parity1, parity2;76:77: assign tri_seq = arb_ack ? out_seq : ZZZ;78: assign tri_chkid = arb_ack ? ID_CKM : ZZZ;79: assign tri_error = arb_ack ? out_error : ZZZ;80: assign tri_req = arb_ack ? out_req : ZZZ;81:82: always wait (reset)83: begin84: disable rollback_cycle;85: disable latch_cycle;86: disable output_cycle;87: out_seq = XXX;88: in_ack = 0;89: arb_req = 0;90: out_error = XXX;91: out_req = 0;92: halt_ack = 0;93: roll_ack = 0;94: valid = 0;95: wait (˜reset);96: end97:98: always wait (halt_req & ˜reset)99: begin :rollback_cycle100: #1;101: halt_ack = 1;102: wait (roll_req);103: #1;104: disable latch_cycle;105: disable output_cycle;106: in_ack = 0;107: arb_req = 0;108: out_req = 0;109:110: #DLY_SEQ_COMP;111: diff = out_seq - val_seq; // compare sequence numbers112: if (˜diff [SEQ_WIDTH-1])113: begin114: valid = 0;115: out_seq = XXX;116: out_error = XXX;117: end118:119: roll_ack = 1;120: fork121: begin122: wait (˜roll_req);123: #1;124: roll_ack = 0;125: end126: begin127: wait (˜halt_req);128: #1;129: halt_ack = 0;130: end131: join132: end133:134: always wait (in_req & ˜valid & ˜halt_req & ˜reset)135: begin :latch_cycle136: #1;137: wait (˜halt_req);138: out_seq = in_seq;139: addr_buf = in_addr;140: data_buf = in_data;141: in_ack = 1;142: valid = 1;
133
ckm.v
143: wait (˜in_req);144: #1;145: wait (˜halt_req);146: in_ack = 0;147: end148:149: always wait (valid & ˜halt_req & ˜reset)150: begin :output_cycle151: #1;152: parity1 = 0;153: for (loop=0; loop<1+ADDR_WIDTH; loop=loop+1) // calculate parity154: parity1 = parity1 ˆ addr_buf [loop];155: if (parity1 === 1’bx) parity1 = 1;156: parity2 = 0;157: for (loop=0; loop<1+DATA_WIDTH; loop=loop+1) // calculate parity158: parity2 = parity2 ˆ data_buf [loop];159: if (parity2 === 1’bx) parity2 = 1;160:161: #DLY_CKM_CHK;162: out_error = parity1 | parity2;163: arb_req = 1;164: wait (arb_ack);165: #1;166: wait (˜halt_req);167: out_req = 1;168: wait (out_ack);169: #1;170: valid = 0;171: arb_req = 0;172: wait (˜halt_req);173: out_req = 0;174: wait (˜out_ack & ˜arb_ack);175: end176:177: endmodule // ckmf178:179: // ====================================================================180: // Queue for Memory Operations Checker181:182: module ckmq (in_seq, in_addr, in_data, in_req, in_ack,183: out_seq, out_addr, out_data, out_req, out_ack,184: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);185:186: ‘include "parameter"187:188: input [SEQ_WIDTH-1:0] in_seq, val_seq;189: input [ADDR_WIDTH:0] in_addr;190: input [DATA_WIDTH:0] in_data;191: output [SEQ_WIDTH-1:0] out_seq;192: output [ADDR_WIDTH:0] out_addr;193: output [DATA_WIDTH:0] out_data;194: input in_req, out_ack, halt_req, roll_req, reset;195: output in_ack, out_req, halt_ack, roll_ack;196:197: reg halt_ack, roll_ack;198:199: wire [SEQ_WIDTH-1:0] seq1;200: wire [ADDR_WIDTH:0] addr1;201: wire [DATA_WIDTH:0] data1;202:203: ckm_buf buf1 (in_seq, in_addr, in_data, in_req, in_ack,204: seq1, addr1, data1, req1, ack1,205: val_seq, halt_req, halt_a1, roll_req, roll_a1, reset);206: ckm_buf buf2 (seq1, addr1, data1, req1, ack1,207: out_seq, out_addr, out_data, out_req, out_ack,208: val_seq, halt_req, halt_a2, roll_req, roll_a2, reset);209:210: always wait (reset)211: begin212: halt_ack = 0;213: roll_ack = 0;
134
ckm.v
214: wait (˜reset);215: end216:217: always @(halt_a1 or halt_a2)218: if (halt_a1 & halt_a2)219: halt_ack = 1;220: else if (˜halt_a1 & ˜halt_a2)221: halt_ack = 0;222:223: always @(roll_a1 or roll_a2)224: if (roll_a1 & roll_a2)225: roll_ack = 1;226: else if (˜roll_a1 & ˜roll_a2)227: roll_ack = 0;228:229: endmodule // ckmq230:231: // ====================================================================232:233: module ckm_buf (in_s, in_d1, in_d2, in_r, in_a,234: out_s, out_d1, out_d2, out_r, out_a,235: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);236:237: ‘include "parameter"238:239: input [SEQ_WIDTH-1:0] in_s, val_seq;240: input [ADDR_WIDTH:0] in_d1;241: input [DATA_WIDTH:0] in_d2;242: output [SEQ_WIDTH-1:0] out_s;243: output [ADDR_WIDTH:0] out_d1;244: output [DATA_WIDTH:0] out_d2;245: input in_r, out_a, halt_req, roll_req, reset;246: output in_a, out_r, halt_ack, roll_ack;247:248: reg [SEQ_WIDTH-1:0] out_s;249: reg [ADDR_WIDTH:0] out_d1;250: reg [DATA_WIDTH:0] out_d2;251: reg in_a, out_r, halt_ack, roll_ack;252:253: reg [SEQ_WIDTH-1:0] diff;254: reg valid;255:256: always wait (reset)257: begin258: disable rollback_cycle;259: disable input_cycle;260: disable output_cycle;261: out_s = XXX;262: out_d1 = XXX;263: out_d2 = XXX;264: in_a = 0;265: out_r = 0;266: halt_ack = 0;267: roll_ack = 0;268: valid = 0;269: wait (˜reset);270: end271:272: always wait (halt_req & ˜reset)273: begin :rollback_cycle274: #1;275: halt_ack = 1;276: wait (roll_req);277: #1;278: disable input_cycle;279: disable output_cycle;280: in_a = 0;281: out_r = 0;282:283: #DLY_SEQ_COMP;284: diff = out_s - val_seq; // compare sequence numbers
135
ckm.v
285: if (˜diff [SEQ_WIDTH-1])286: begin287: valid = 0;288: out_s = XXX;289: out_d1 = XXX;290: out_d2 = XXX;291: end292:293: roll_ack = 1;294: fork295: begin296: wait (˜roll_req);297: #1;298: roll_ack = 0;299: end300: begin301: wait (˜halt_req);302: #1;303: halt_ack = 0;304: end305: join306: end307:308: always wait (in_r & ˜valid & ˜halt_req & ˜reset)309: begin :input_cycle310: #1;311: wait (˜halt_req);312: out_s = in_s;313: out_d1 = in_d1;314: out_d2 = in_d2;315: in_a = 1;316: valid = 1;317: wait (˜in_r);318: #1;319: wait (˜halt_req);320: in_a = 0;321: end322:323: always wait (valid & ˜halt_req & ˜reset)324: begin :output_cycle325: #1;326: wait (˜halt_req);327: out_r = 1;328: wait (out_a);329: #1;330: valid = 0;331: wait (˜halt_req);332: out_r = 0;333: wait (˜out_a);334: end335:336: endmodule // ckm_buf
136
ckr.v
1: // CKR (combining CKRQ and CKRF)2:3: module ckr (in_seq, in_data, in_req, in_ack, arb_req, arb_ack,4: tri_seq, tri_chkid, tri_error, tri_req, out_ack,5: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);6:7: ‘include "parameter"8:9: input [SEQ_WIDTH-1:0] in_seq, val_seq;10: input [DATA_WIDTH:0] in_data;11: output [SEQ_WIDTH-1:0] tri_seq;12: output [CHKID_WIDTH-1:0] tri_chkid;13: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;14: output in_ack, arb_req, tri_error, tri_req, halt_ack, roll_ack;15:16: reg halt_ack, roll_ack;17:18: wire [SEQ_WIDTH-1:0] mid_seq;19: wire [DATA_WIDTH:0] mid_data;20:21: ckrq ckrq (in_seq, in_data, in_req, in_ack,22: mid_seq, mid_data, mid_req, mid_ack,23: val_seq, halt_req, haltq_ack, roll_req, rollq_ack, reset);24: ckrf ckrf (mid_seq, mid_data, mid_req, mid_ack, arb_req, arb_ack,25: tri_seq, tri_chkid, tri_error, tri_req, out_ack,26: val_seq, halt_req, haltf_ack, roll_req, rollf_ack, reset);27:28: always wait (reset)29: begin30: halt_ack = 0;31: roll_ack = 0;32: wait (˜reset);33: end34:35: always @(haltq_ack or haltf_ack)36: if (haltq_ack & haltf_ack)37: halt_ack = 1;38: else if (˜haltq_ack & ˜haltf_ack)39: halt_ack = 0;40:41: always @(rollq_ack or rollf_ack)42: if (rollq_ack & rollf_ack)43: roll_ack = 1;44: else if (˜rollq_ack & ˜rollf_ack)45: roll_ack = 0;46:47: endmodule // ckr48:49: // ====================================================================50: // Checker for R_bus (functional part)51:52: module ckrf (in_seq, in_data, in_req, in_ack, arb_req, arb_ack,53: tri_seq, tri_chkid, tri_error, tri_req, out_ack,54: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);55:56: ‘include "parameter"57:58: input [SEQ_WIDTH-1:0] in_seq, val_seq;59: input [DATA_WIDTH:0] in_data;60: output [SEQ_WIDTH-1:0] tri_seq;61: output [CHKID_WIDTH-1:0] tri_chkid;62: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;63: output in_ack, arb_req, tri_error, tri_req, halt_ack, roll_ack;64:65: reg [SEQ_WIDTH-1:0] out_seq;66: reg in_ack, arb_req, out_error, out_req, halt_ack, roll_ack;67:68: reg [SEQ_WIDTH-1:0] diff;69: reg [DATA_WIDTH:0] data_buf;70: reg [5:0] loop;71: reg valid, parity;
137
ckr.v
72:73: assign tri_seq = arb_ack ? out_seq : ZZZ;74: assign tri_chkid = arb_ack ? ID_CKR : ZZZ;75: assign tri_error = arb_ack ? out_error : ZZZ;76: assign tri_req = arb_ack ? out_req : ZZZ;77:78: always wait (reset)79: begin80: disable rollback_cycle;81: disable latch_cycle;82: disable output_cycle;83: out_seq = XXX;84: in_ack = 0;85: arb_req = 0;86: out_error = XXX;87: out_req = 0;88: halt_ack = 0;89: roll_ack = 0;90: valid = 0;91: wait (˜reset);92: end93:94: always wait (halt_req & ˜reset)95: begin :rollback_cycle96: #1;97: halt_ack = 1;98: wait (roll_req);99: #1;100: disable latch_cycle;101: disable output_cycle;102: in_ack = 0;103: arb_req = 0;104: out_req = 0;105:106: #DLY_SEQ_COMP;107: diff = out_seq - val_seq; // compare sequence numbers108: if (˜diff [SEQ_WIDTH-1])109: begin110: valid = 0;111: out_seq = XXX;112: out_error = XXX;113: end114:115: roll_ack = 1;116: fork117: begin118: wait (˜roll_req);119: #1;120: roll_ack = 0;121: end122: begin123: wait (˜halt_req);124: #1;125: halt_ack = 0;126: end127: join128: end129:130: always wait (in_req & ˜valid & ˜halt_req & ˜reset)131: begin :latch_cycle132: #1;133: wait (˜halt_req);134: out_seq = in_seq;135: data_buf = in_data;136: in_ack = 1;137: valid = 1;138: wait (˜in_req);139: #1;140: wait (˜halt_req);141: in_ack = 0;142: end
138
ckr.v
143:144: always wait (valid & ˜halt_req & ˜reset)145: begin :output_cycle146: #1;147: parity = 0;148: for (loop=0; loop<1+DATA_WIDTH; loop=loop+1) // calculate parity149: parity = parity ˆ data_buf [loop];150: if (parity === 1’bx) parity = 1;151:152: #DLY_CKR_CHK;153: out_error = parity;154: arb_req = 1;155: wait (arb_ack);156: #1;157: wait (˜halt_req);158: out_req = 1;159: wait (out_ack);160: #1;161: valid = 0;162: arb_req = 0;163: wait (˜halt_req);164: out_req = 0;165: wait (˜out_ack & ˜arb_ack);166: end167:168: endmodule // ckrf169:170: // ====================================================================171: // Queue for R_bus Checker172:173: module ckrq (in_seq, in_data, in_req, in_ack,174: out_seq, out_data, out_req, out_ack,175: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);176:177: ‘include "parameter"178:179: input [SEQ_WIDTH-1:0] in_seq, val_seq;180: input [DATA_WIDTH:0] in_data;181: output [SEQ_WIDTH-1:0] out_seq;182: output [DATA_WIDTH:0] out_data;183: input in_req, out_ack, halt_req, roll_req, reset;184: output in_ack, out_req, halt_ack, roll_ack;185:186: reg halt_ack, roll_ack;187:188: wire [SEQ_WIDTH-1:0] seq1;189: wire [DATA_WIDTH:0] data1;190:191: ckr_buf buf1 (in_seq, in_data, in_req, in_ack, seq1, data1, req1, ack1,192: val_seq, halt_req, halt_a1, roll_req, roll_a1, reset);193: ckr_buf buf2 (seq1, data1, req1, ack1, out_seq, out_data, out_req, out_ack,194: val_seq, halt_req, halt_a2, roll_req, roll_a2, reset);195:196: always wait (reset)197: begin198: halt_ack = 0;199: roll_ack = 0;200: wait (˜reset);201: end202:203: always @(halt_a1 or halt_a2)204: if (halt_a1 & halt_a2)205: halt_ack = 1;206: else if (˜halt_a1 & ˜halt_a2)207: halt_ack = 0;208:209: always @(roll_a1 or roll_a2)210: if (roll_a1 & roll_a2)211: roll_ack = 1;212: else if (˜roll_a1 & ˜roll_a2)213: roll_ack = 0;
139
ckr.v
214:215: endmodule // ckrq216:217: // ====================================================================218:219: module ckr_buf (in_s, in_d, in_r, in_a, out_s, out_d, out_r, out_a,220: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);221:222: ‘include "parameter"223:224: input [SEQ_WIDTH-1:0] in_s, val_seq;225: input [DATA_WIDTH:0] in_d;226: output [SEQ_WIDTH-1:0] out_s;227: output [DATA_WIDTH:0] out_d;228: input in_r, out_a, halt_req, roll_req, reset;229: output in_a, out_r, halt_ack, roll_ack;230:231: reg [SEQ_WIDTH-1:0] out_s;232: reg [DATA_WIDTH:0] out_d;233: reg in_a, out_r, halt_ack, roll_ack;234:235: reg [SEQ_WIDTH-1:0] diff;236: reg valid;237:238: always wait (reset)239: begin240: disable rollback_cycle;241: disable input_cycle;242: disable output_cycle;243: out_s = XXX;244: out_d = XXX;245: in_a = 0;246: out_r = 0;247: halt_ack = 0;248: roll_ack = 0;249: valid = 0;250: wait (˜reset);251: end252:253: always wait (halt_req & ˜reset)254: begin :rollback_cycle255: #1;256: halt_ack = 1;257: wait (roll_req);258: #1;259: disable input_cycle;260: disable output_cycle;261: in_a = 0;262: out_r = 0;263:264: #DLY_SEQ_COMP;265: diff = out_s - val_seq; // compare sequence numbers266: if (˜diff [SEQ_WIDTH-1])267: begin268: valid = 0;269: out_s = XXX;270: out_d = XXX;271: end272:273: roll_ack = 1;274: fork275: begin276: wait (˜roll_req);277: #1;278: roll_ack = 0;279: end280: begin281: wait (˜halt_req);282: #1;283: halt_ack = 0;284: end
140
ckr.v
285: join286: end287:288: always wait (in_r & ˜valid & ˜halt_req & ˜reset)289: begin :input_cycle290: #1;291: wait (˜halt_req);292: out_s = in_s;293: out_d = in_d;294: in_a = 1;295: valid = 1;296: wait (˜in_r);297: #1;298: wait (˜halt_req);299: in_a = 0;300: end301:302: always wait (valid & ˜halt_req & ˜reset)303: begin :output_cycle304: #1;305: wait (˜halt_req);306: out_r = 1;307: wait (out_a);308: #1;309: valid = 0;310: wait (˜halt_req);311: out_r = 0;312: wait (˜out_a);313: end314:315: endmodule // ckr_buf
141
dmem.v
1: // Data Memory2:3: module dmem (addr, data, rw_mode, retry, in_req, in_ack,4: tri_out_req, out_ack,5: halt_req, halt_ack, roll_req, roll_ack, reset);6:7: ‘include "parameter"8:9: input [ADDR_WIDTH:0] addr;10: inout [DATA_WIDTH:0] data;11: input rw_mode, retry, in_req, out_ack, halt_req, roll_req, reset;12: output in_ack, tri_out_req, halt_ack, roll_ack;13:14: reg in_ack, tri_out_req, halt_ack, roll_ack;15:16: reg [DATA_WIDTH:0] data_out, dmemory [0:DMEM_SIZE-1];17: reg [ADDR_WIDTH:0] loop; // one extra bit for loop termination18: reg [5:0] loop1; // DATA_WIDTH = 32 bits max19: reg parity;20:21: assign data = (rw_mode & in_req) ? data_out : ZZZ;22:23: always wait (reset)24: begin25: disable rollback_cycle;26: disable memory_cycle;27: data_out = XXX;28: in_ack = 0;29: tri_out_req = ZZZ;30: halt_ack = 0;31: roll_ack = 0;32: for (loop = 0; loop < DMEM_SIZE; loop = loop + 1)33: dmemory [loop] = XXX;34: $readmemh ("inst.hex", dmemory, 0);35: wait (˜reset);36: end37:38: always wait (halt_req & ˜reset)39: begin :rollback_cycle40: #1;41: halt_ack = 1;42: wait (roll_req);43: #1;44: disable memory_cycle;45: data_out = XXX;46: in_ack = 0;47: tri_out_req = ZZZ;48:49: roll_ack = 1;50: fork51: begin52: wait (˜roll_req);53: #1;54: roll_ack = 0;55: end56: begin57: wait (˜halt_req);58: #1;59: halt_ack = 0;60: end61: join62: end63:64: // Input is not acknowledged until output is acknowledged so that65: // register and sequence number can be routed from DWB to DQ without66: // going through DMEM.67:68: always wait (in_req & ˜halt_req & ˜reset)69: begin :memory_cycle70: if (rw_mode) // 1 for read71: begin
142
dmem.v
72: #DLY_DMEM_RD;73: data_out = dmemory [addr[ADDR_WIDTH-1:ADDR_IGNORE]];74:75: if (retry) // correct parity error76: begin77: parity = 0;78: for (loop1=0; loop1<DATA_WIDTH; loop1=loop1+1)79: parity = parity ˆ data_out [loop1];80: data_out [DATA_WIDTH] = parity;81: dmemory [addr[ADDR_WIDTH-1:ADDR_IGNORE]] = data_out;82: end83:84: #1;85: wait (˜halt_req);86: tri_out_req = 1;87: wait (out_ack);88: #1;89: in_ack = 1;90: wait (˜halt_req);91: tri_out_req = ZZZ; // tri-state it92: wait (˜in_req);93: #1;94: in_ack = 0;95: wait (˜out_ack);96: end97: else // 0 for write98: begin99: #DLY_DMEM_WR;100: wait (˜halt_req);101: dmemory [addr[ADDR_WIDTH-1:ADDR_IGNORE]] = data;102: in_ack = 1;103: wait (˜in_req);104: #1;105: wait (˜halt_req);106: in_ack = 0;107: end108: end109:110: endmodule // dmem
143
dq.v
1: // Data Queue (combining DQQ and DQBUS)2:3: module dq (in_seq, in_reg, in_data, in_req, in_ack, arb_req, arb_ack,4: tri_seq, tri_reg, tri_data, tri_req, out_ack,5: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);6:7: ‘include "parameter"8:9: input [SEQ_WIDTH-1:0] in_seq, val_seq;10: input [REG_WIDTH-1:0] in_reg;11: input [DATA_WIDTH:0] in_data;12: output [SEQ_WIDTH-1:0] tri_seq;13: output [REG_WIDTH-1:0] tri_reg;14: output [DATA_WIDTH:0] tri_data;15: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;16: output in_ack, arb_req, tri_req, halt_ack, roll_ack;17:18: reg halt_ack, roll_ack;19:20: wire [SEQ_WIDTH-1:0] mid_seq;21: wire [REG_WIDTH-1:0] mid_reg;22: wire [DATA_WIDTH:0] mid_data;23:24: dqq dqq (in_seq, in_reg, in_data, in_req, in_ack,25: mid_seq, mid_reg, mid_data, mid_req, mid_ack,26: val_seq, halt_req, haltq_ack, roll_req, rollq_ack, reset);27: dqbus dqbus (mid_seq, mid_reg, mid_data, mid_req, mid_ack, arb_req, arb_ack,28: tri_seq, tri_reg, tri_data, tri_req, out_ack,29: val_seq, halt_req, haltb_ack, roll_req, rollb_ack, reset);30:31: always wait (reset)32: begin33: halt_ack = 0;34: roll_ack = 0;35: wait (˜reset);36: end37:38: always @(haltq_ack or haltb_ack)39: if (haltq_ack & haltb_ack)40: halt_ack = 1;41: else if (˜haltq_ack & ˜haltb_ack)42: halt_ack = 0;43:44: always @(rollq_ack or rollb_ack)45: if (rollq_ack & rollb_ack)46: roll_ack = 1;47: else if (˜rollq_ack & ˜rollb_ack)48: roll_ack = 0;49:50: endmodule // dq51:52: // ====================================================================53: // Data Queue Bus Handler54:55: module dqbus (in_seq, in_reg, in_data, in_req, in_ack, arb_req, arb_ack,56: tri_seq, tri_reg, tri_data, tri_req, out_ack,57: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);58:59: ‘include "parameter"60:61: input [SEQ_WIDTH-1:0] in_seq, val_seq;62: input [REG_WIDTH-1:0] in_reg;63: input [DATA_WIDTH:0] in_data;64: output [SEQ_WIDTH-1:0] tri_seq;65: output [REG_WIDTH-1:0] tri_reg;66: output [DATA_WIDTH:0] tri_data;67: input in_req, arb_ack, out_ack, halt_req, roll_req, reset;68: output in_ack, arb_req, tri_req, halt_ack, roll_ack;69:70: reg [SEQ_WIDTH-1:0] out_seq;71: reg [REG_WIDTH-1:0] out_reg;
144
dq.v
72: reg [DATA_WIDTH:0] out_data;73: reg in_ack, arb_req, out_req, halt_ack, roll_ack;74:75: reg [SEQ_WIDTH-1:0] diff;76: reg valid;77:78: assign tri_seq = arb_ack ? out_seq : ZZZ;79: assign tri_reg = arb_ack ? out_reg : ZZZ;80: assign tri_data = arb_ack ? out_data : ZZZ;81: assign tri_req = arb_ack ? out_req : ZZZ;82:83: always wait (reset)84: begin85: disable rollback_cycle;86: disable latch_cycle;87: disable output_cycle;88: out_seq = XXX;89: out_reg = XXX;90: out_data = XXX;91: in_ack = 0;92: arb_req = 0;93: out_req = 0;94: halt_ack = 0;95: roll_ack = 0;96: valid = 0;97: wait (˜reset);98: end99:100: always wait (halt_req & ˜reset)101: begin :rollback_cycle102: #1;103: halt_ack = 1;104: wait (roll_req);105: #1;106: disable latch_cycle;107: disable output_cycle;108: in_ack = 0;109: arb_req = 0;110: out_req = 0;111:112: #DLY_SEQ_COMP;113: diff = out_seq - val_seq; // compare sequence numbers114: if (˜diff [SEQ_WIDTH-1])115: begin116: valid = 0;117: out_seq = XXX;118: out_reg = XXX;119: out_data = XXX;120: end121:122: roll_ack = 1;123: fork124: begin125: wait (˜roll_req);126: #1;127: roll_ack = 0;128: end129: begin130: wait (˜halt_req);131: #1;132: halt_ack = 0;133: end134: join135: end136:137: always wait (in_req & ˜valid & ˜halt_req & ˜reset)138: begin :latch_cycle139: #1;140: wait (˜halt_req);141: out_seq = in_seq;142: out_reg = in_reg;
145
dq.v
143: out_data = in_data;144: in_ack = 1;145: valid = 1;146: wait (˜in_req);147: #1;148: wait (˜halt_req);149: in_ack = 0;150: end151:152: always wait (valid & ˜halt_req & ˜reset)153: begin :output_cycle154: #1;155: arb_req = 1;156: wait (arb_ack);157: #1;158: wait (˜halt_req);159: out_req = 1;160: wait (out_ack);161: #1;162: valid = 0;163: arb_req = 0;164: wait (˜halt_req);165: out_req = 0;166: wait (˜out_ack & ˜arb_ack);167: end168:169: endmodule // dqbus170:171: // ====================================================================172: // Data Queue (real queue)173:174: module dqq (in_seq, in_reg, in_data, in_req, in_ack,175: out_seq, out_reg, out_data, out_req, out_ack,176: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);177:178: ‘include "parameter"179:180: input [SEQ_WIDTH-1:0] in_seq, val_seq;181: input [REG_WIDTH-1:0] in_reg;182: input [DATA_WIDTH:0] in_data;183: output [SEQ_WIDTH-1:0] out_seq;184: output [REG_WIDTH-1:0] out_reg;185: output [DATA_WIDTH:0] out_data;186: input in_req, out_ack, halt_req, roll_req, reset;187: output in_ack, out_req, halt_ack, roll_ack;188:189: reg halt_ack, roll_ack;190:191: wire [SEQ_WIDTH-1:0] seq1;192: wire [REG_WIDTH-1:0] reg1;193: wire [DATA_WIDTH:0] data1;194:195: dq_buffer buf1 (in_seq, in_reg, in_data, in_req, in_ack,196: seq1, reg1, data1, req1, ack1,197: val_seq, halt_req, halt_a1, roll_req, roll_a1, reset);198: dq_buffer buf2 (seq1, reg1, data1, req1, ack1,199: out_seq, out_reg, out_data, out_req, out_ack,200: val_seq, halt_req, halt_a2, roll_req, roll_a2, reset);201:202: always wait (reset)203: begin204: halt_ack = 0;205: roll_ack = 0;206: wait (˜reset);207: end208:209: always @(halt_a1 or halt_a2)210: if (halt_a1 & halt_a2)211: halt_ack = 1;212: else if (˜halt_a1 & ˜halt_a2)213: halt_ack = 0;
146
dq.v
214:215: always @(roll_a1 or roll_a2)216: if (roll_a1 & roll_a2)217: roll_ack = 1;218: else if (˜roll_a1 & ˜roll_a2)219: roll_ack = 0;220:221: endmodule // ckiq222:223: // ====================================================================224:225: module dq_buffer (in_s, in_g, in_d, in_r, in_a,226: out_s, out_g, out_d, out_r, out_a,227: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);228:229: ‘include "parameter"230:231: input [SEQ_WIDTH-1:0] in_s, val_seq;232: input [REG_WIDTH-1:0] in_g;233: input [DATA_WIDTH:0] in_d;234: output [SEQ_WIDTH-1:0] out_s;235: output [REG_WIDTH-1:0] out_g;236: output [DATA_WIDTH:0] out_d;237: input in_r, out_a, halt_req, roll_req, reset;238: output in_a, out_r, halt_ack, roll_ack;239:240: ‘ifdef BEHAV_DQ241:242: // ====================================================================243: // behavioral level244:245: reg [SEQ_WIDTH-1:0] out_s;246: reg [REG_WIDTH-1:0] out_g;247: reg [DATA_WIDTH:0] out_d;248: reg in_a, out_r, halt_ack, roll_ack;249:250: reg [SEQ_WIDTH-1:0] diff;251: reg valid;252:253: always wait (reset)254: begin255: disable rollback_cycle;256: disable input_cycle;257: disable output_cycle;258: out_s = XXX;259: out_g = XXX;260: out_d = XXX;261: in_a = 0;262: out_r = 0;263: halt_ack = 0;264: roll_ack = 0;265: valid = 0;266: wait (˜reset);267: end268:269: always wait (halt_req & ˜reset)270: begin :rollback_cycle271: #1;272: halt_ack = 1;273: wait (roll_req);274: #1;275: disable input_cycle;276: disable output_cycle;277: in_a = 0;278: out_r = 0;279:280: #DLY_SEQ_COMP;281: diff = out_s - val_seq; // compare sequence numbers282: if (˜diff [SEQ_WIDTH-1])283: begin284: valid = 0;
147
dq.v
285: out_s = XXX;286: out_g = XXX;287: out_d = XXX;288: end289:290: roll_ack = 1;291: fork292: begin293: wait (˜roll_req);294: #1;295: roll_ack = 0;296: end297: begin298: wait (˜halt_req);299: #1;300: halt_ack = 0;301: end302: join303: end304:305: always wait (in_r & ˜valid & ˜halt_req & ˜reset)306: begin :input_cycle307: #1;308: wait (˜halt_req);309: out_s = in_s;310: out_g = in_g;311: out_d = in_d;312: in_a = 1;313: valid = 1;314: wait (˜in_r);315: #1;316: wait (˜halt_req);317: in_a = 0;318: end319:320: always wait (valid & ˜halt_req & ˜reset)321: begin :output_cycle322: #1;323: wait (˜halt_req);324: out_r = 1;325: wait (out_a);326: #1;327: valid = 0;328: wait (˜halt_req);329: out_r = 0;330: wait (˜out_a);331: end332:333: endmodule // dq_buffer_b (behavioral level)334:335: ‘else336:337: // ====================================================================338: // gate level339:340: muller2c x1 (in_r, ˜cout2, clear1, cout1);341: dlr x2 (cout1, ˜halt_req, clear1, latch);342: and #1 g1 (a, complete, ˜roll_req);343: muller2c x3 (a, ˜out_a, clear2, cout2);344: dlr x4 (cout2, ˜halt_req, clear1, out_r);345: regcmpl x5 (latch, reset, regclk, complete);346: wire in_a = complete;347:348: or #1 g2 (clear1, roll_req, reset);349: and #1 g3 (b, inval, roll_req);350: or #1 g4 (clear2, b, reset);351:352: buf #18 g5 (dhalt, halt_req); // 4 (x2) + 14 (x5)353: and #1 g6 (halt_ack, dhalt, halt_req);354: and #1 g7 (c, roll_req, ˜cout1, ˜in_a, ˜out_r);355: and #1 g8 (d, clear2, ˜cout2);
148
dq.v
356: and #1 g9 (e, ˜inval, cmp_done);357: or #1 g10 (f, d, e);358: muller3c x6 (c, f, cmp_done, reset, roll_ack);359: seqcmp x7 (out_s, val_seq, roll_req, inval, cmp_done);360:361: ‘ifdef WAVE_DQ362: initial363: #0 $gr_addwaves ("in_r", in_r, "cout1", cout1, "latch", latch, "in_a", in_a,364: "a", a, "cout2", cout2, "out_r", out_r, "out_a", out_a,365: "h_req", halt_req, "h_ack", halt_ack, "r_req", roll_req,366: "clear1", clear1, "clear2", clear2, "r_ack", roll_ack);367: ‘endif368:369: dlr b40 (in_s[0], regclk, reset, out_s[0]),370: b41 (in_s[1], regclk, reset, out_s[1]),371: b42 (in_s[2], regclk, reset, out_s[2]),372: b43 (in_s[3], regclk, reset, out_s[3]);373:374: dl b50 (in_g[0], regclk, out_g[0]),375: b51 (in_g[1], regclk, out_g[1]),376: b52 (in_g[2], regclk, out_g[2]),377: b53 (in_g[3], regclk, out_g[3]),378: b54 (in_g[4], regclk, out_g[4]);379:380: dl b0 (in_d[0], regclk, out_d[0]),381: b1 (in_d[1], regclk, out_d[1]),382: b2 (in_d[2], regclk, out_d[2]),383: b3 (in_d[3], regclk, out_d[3]),384: b4 (in_d[4], regclk, out_d[4]),385: b5 (in_d[5], regclk, out_d[5]),386: b6 (in_d[6], regclk, out_d[6]),387: b7 (in_d[7], regclk, out_d[7]),388: b8 (in_d[8], regclk, out_d[8]),389: b9 (in_d[9], regclk, out_d[9]),390: b10 (in_d[10], regclk, out_d[10]),391: b11 (in_d[11], regclk, out_d[11]),392: b12 (in_d[12], regclk, out_d[12]),393: b13 (in_d[13], regclk, out_d[13]),394: b14 (in_d[14], regclk, out_d[14]),395: b15 (in_d[15], regclk, out_d[15]),396: b16 (in_d[16], regclk, out_d[16]),397: b17 (in_d[17], regclk, out_d[17]),398: b18 (in_d[18], regclk, out_d[18]),399: b19 (in_d[19], regclk, out_d[19]),400: b20 (in_d[20], regclk, out_d[20]),401: b21 (in_d[21], regclk, out_d[21]),402: b22 (in_d[22], regclk, out_d[22]),403: b23 (in_d[23], regclk, out_d[23]),404: b24 (in_d[24], regclk, out_d[24]),405: b25 (in_d[25], regclk, out_d[25]),406: b26 (in_d[26], regclk, out_d[26]),407: b27 (in_d[27], regclk, out_d[27]),408: b28 (in_d[28], regclk, out_d[28]),409: b29 (in_d[29], regclk, out_d[29]),410: b30 (in_d[30], regclk, out_d[30]),411: b31 (in_d[31], regclk, out_d[31]),412: b32 (in_d[32], regclk, out_d[32]); // parity bit413:414: endmodule // dq_buffer_g (gate level)415:416: // ====================================================================417:418: module seqcmp (seq, error, req, inv, ack);419: input [3:0] seq, error;420: input req;421: output inv, ack;422:423: brwend x1 (seq[0], error[0], req, a, na);424: brw x2 (seq[1], error[1], a, na, req, b, nb);425: brw x3 (seq[2], error[2], b, nb, req, c, nc);426: sub x4 (seq[3], error[3], c, nc, req, d, inv, ack);
149
dq.v
427:428: ‘ifdef WAVE_SEQCMP429: initial430: #0 $gr_addwaves ("seq", seq, "error", error, "req", req, "ack", ack,431: "_a", a, "na", na, "b", b, "nb", nb, "c", c, "nc", nc, "d", d,432: "inv", inv);433: ‘endif434:435: endmodule // seqcmp436:437: ‘endif
150
gates.v
1: // Gate-Level Components2:3: module muller2 (in1, in2, out); // 2-input C-element4: input in1, in2;5: output out;6: supply1 vcc;7: supply0 gnd;8:9: nmos #2 m1 (out, gnd, b);10: nmos #1 m2 (b, a, out);11: nmos #1 m3 (a, gnd, in1);12: nmos #1 m4 (a, gnd, in2);13: nmos #1 m6 (b, c, in1);14: nmos #1 m7 (c, gnd, in2);15:16: pullup (b);17: pullup (out);18:19: endmodule // muller220:21: // ====================================================================22:23: module muller3 (in1, in2, in3, out); // 3-input C-element24: input in1, in2, in3;25: output out;26: supply1 vcc;27: supply0 gnd;28:29: nmos #2 m1 (out, gnd, b);30: nmos #1 m2 (b, a, out);31: nmos #1 m3 (a, gnd, in1);32: nmos #1 m4 (a, gnd, in2);33: nmos #1 m5 (a, gnd, in3);34: nmos #1 m6 (b, c, in1);35: nmos #1 m7 (c, d, in2);36: nmos #1 m8 (d, gnd, in3);37:38: pullup (b);39: pullup (out);40:41: endmodule // muller342:43: // ====================================================================44:45: module muller4 (in1, in2, in3, in4, out); // 4-input C-element46: input in1, in2, in3, in4;47: output out;48:49: muller2 x1 (in1, in2, out1);50: muller2 x2 (in3, in4, out2);51: muller2 x3 (out1, out2, out);52:53: endmodule // muller454:55: // ====================================================================56:57: module muller2c (in1, in2, clear, out); // 2-input with clear58: input in1, in2, clear;59: output out;60:61: and #1 g1 (out1, in1, ˜clear);62: and #1 g2 (out2, in2, ˜clear);63: muller2 m1 (out1, out2, out);64:65: endmodule // muller2c66:67: // ====================================================================68:69: module muller3c (in1, in2, in3, clear, out); // 3-input with clear70: input in1, in2, in3, clear;71: output out;
151
gates.v
72:73: and #1 g1 (out1, in1, ˜clear);74: and #1 g2 (out2, in2, ˜clear);75: and #1 g3 (out3, in3, ˜clear);76: muller3 m1 (out1, out2, out3, out);77:78: endmodule // muller3c79:80: // ====================================================================81:82: module dl (D, G, Q); // D-Latch83: input D, G;84: output Q;85: trireg a;86:87: nmos #1 m1 (a, D, G);88: pmos #1 m2 (a, Q, G);89: not #1 g1 (b, a);90: not #1 g2 (Q, b);91:92: endmodule // dl93:94: // ====================================================================95:96: module dls (D, G, S, Q); // D-Latch with Set97: input D, G, S;98: output Q;99: supply1 vcc;100: trireg a;101:102: nmos #1 m1 (a, D, e);103: pmos #1 m2 (a, Q, f);104: nmos #1 m3 (a, vcc, S);105: not #1 g1 (b, a);106: not #1 g2 (Q, b);107: and #1 g3 (e, G, ˜S);108: or #1 g4 (f, G, S);109:110: endmodule // dls111:112: // ====================================================================113:114: module dlr (D, G, R, Q); // D-Latch with Reset115: input D, G, R;116: output Q;117: supply0 gnd;118: trireg a;119:120: nmos #1 m1 (a, D, e);121: pmos #1 m2 (a, Q, f);122: nmos #1 m3 (a, gnd, R);123: not #1 g1 (b, a);124: not #1 g2 (Q, b);125: and #1 g3 (e, G, ˜R);126: or #1 g4 (f, G, R);127:128: endmodule // dlr129:130: // ====================================================================131:132: module dffr (D, CLK, R, Q); // D-Flip Flop with Reset133: input D, CLK, R;134: output Q;135:136: dlr master (D, CLK, R, a);137: dlr slave (a, ˜CLK, R, Q);138:139: endmodule // dffr140:141: // ====================================================================142: // done->complete delay: (1 for g4)+(1 for dl disable)+(1 for safety)
152
gates.v
143: // reset requires 5 time units, critical path in muller3c144:145: module regcmpl (latch, reset, regclk, complete);146: input latch, reset;147: output regclk, complete;148:149: dffr x1 (nosc, latch, reset, osc);150: dls x2 (osc, latch, reset, q1);151: dlr x3 (nosc, latch, reset, q2);152: muller3c x4 (same1, same2, latch, reset, done);153: not #1 g1 (nosc, osc);154: xnor #1 g2 (same1, osc, q1);155: xnor #1 g3 (same2, nosc, q2);156: and #1 g4 (regclk, ˜done, latch);157: and #3 g5 (complete, done, ˜reset);158:159: ‘ifdef WAVE_REGCMPL160: initial161: #0 $gr_addwaves ("latch", latch, "osc", osc, "q1", q1, "q2", q2,162: "same1", same1, "same2", same2, "done", done, "regclk", regclk,163: "complet", complete);164: ‘endif165:166: endmodule // regcmpl167:168: // ====================================================================169: // A minus B, X=borrow in, Z=borrow out, R=request170:171: module brw (A, B, X, NX, R, Z, NZ); // full borrow circuit172: input A, B, X, NX, R;173: output Z, NZ;174: supply1 vcc;175: supply0 gnd;176: trireg Y, NY;177:178: not #1 g1 (NA, A);179: not #1 g2 (NB, B);180: buf #1 g3 (DR, R);181: not #1 g4 (Z, NY);182: not #1 g5 (NZ, Y);183: pmos #1 m1 (NY, vcc, DR);184: pmos #1 m2 (Y, vcc, DR);185: nmos #1 m3 (c, gnd, DR);186:187: nmos #1 m4 (NY, d, X);188: nmos #1 m5 (d, c, NA);189: nmos #1 m6 (d, c, B);190: nmos #1 m7 (NY, e, NA);191: nmos #1 m8 (e, c, B);192: nmos #1 m9 (Y, f, A);193: nmos #1 m10 (f, g, NB);194: nmos #1 m11 (g, c, A);195: nmos #1 m12 (Y, g, NX);196: nmos #1 m13 (g, c, NB);197:198: endmodule // brw199:200: // ====================================================================201: // A minus B, Z=borrow out, R=request202:203: module brwend (A, B, R, Z, NZ); // no borrow in204: input A, B, R;205: output Z, NZ;206: supply1 vcc;207: supply0 gnd;208: trireg Y, NY;209:210: not #1 g1 (NA, A);211: not #1 g2 (NB, B);212: buf #1 g3 (DR, R);213: not #1 g4 (Z, NY);
153
gates.v
214: not #1 g5 (NZ, Y);215: pmos #1 m1 (NY, vcc, DR);216: pmos #1 m2 (Y, vcc, DR);217: nmos #1 m3 (c, gnd, DR);218:219: nmos #1 m4 (NY, d, NA);220: nmos #1 m5 (d, c, B);221: nmos #1 m6 (Y, c, A);222: nmos #1 m7 (Y, c, NB);223:224: endmodule // brwend225:226: // ====================================================================227: // use A/NA for slower signal228:229: module stxor (A, NA, B, NB, R, Z, NZ); // self-timed XOR230: input A, NA, B, NB, R;231: output Z, NZ;232: supply1 vcc;233: supply0 gnd;234: trireg Y, NY;235:236: not #1 g1 (Z, NY);237: not #1 g2 (NZ, Y);238: pmos #1 m1 (NY, vcc, R);239: pmos #1 m2 (Y, vcc, R);240: nmos #1 m3 (c, gnd, R);241:242: nmos #1 m4 (NY, d, NA);243: nmos #1 m5 (d, c, B);244: nmos #1 m6 (NY, e, A);245: nmos #1 m7 (e, c, NB);246: nmos #1 m8 (Y, f, A);247: nmos #1 m9 (f, c, B);248: nmos #1 m10 (Y, f, NB);249: nmos #1 m11 (f, c, NA);250:251: endmodule // stxor252:253: // ====================================================================254: // A minus B, X=borrow in, D=difference, R=request255:256: module sub (A, B, X, NX, REQ, D, ND, ACK); // subtraction bitslice257: input A, B, X, NX, REQ;258: output D, ND, ACK;259: supply1 vcc;260: supply0 gnd;261: trireg D, ND;262:263: not #1 g1 (NA, A);264: not #1 g2 (NB, B);265: buf #1 g3 (DR, REQ);266: or #1 g4 (ACK, D, ND);267:268: stxor x1 (A, NA, B, NB, DR, c, nc);269: stxor x2 (c, nc, X, NX, DR, D, ND);270:271: endmodule // sub
154
iiu.v
1: // Instruction Issuing Unit2:3: module iiu (inst_bus, inst_req, inst_ack, inst_en,4: chkbits, log_req, log_ack, cki_req, cki_ack,5: newpc_req, pc_ack, imem_ack, iq_ack,6: seq_num, reg_rd, reg_rs, reg_rt, res_req, res_ack,7: sim_f, a_bus, b_bus, rfile_req, rfile_ack, ckf_req, ckf_ack,8: alu_func, alu_req, alu_ack,9: mem_rw, mem_req, mem_ack,10: val_seq, halt_req, halt_ack, roll_req, roll_ack,11: retry, stop, reset);12:13: ‘include "parameter"14:15: inout [DATA_WIDTH:0] inst_bus, a_bus, b_bus;16: output [CHECKERS-1:0] chkbits;17: output [SEQ_WIDTH-1:0] seq_num;18: output [REG_WIDTH-1:0] reg_rd, reg_rs, reg_rt;19: output [FUNC_WIDTH-1:0] alu_func;20: input [SEQ_WIDTH-1:0] val_seq;21: input halt_req, roll_req;22: output inst_en, sim_f, mem_rw, retry, stop;23: input inst_req, log_ack, cki_ack, pc_ack, imem_ack, iq_ack, res_ack,24: rfile_ack, ckf_ack, alu_ack, mem_ack, reset;25: output inst_ack, log_req, cki_req, newpc_req, res_req, rfile_req, ckf_req,26: alu_req, mem_req, halt_ack, roll_ack;27:28: reg [CHECKERS-1:0] chkbits;29: reg [SEQ_WIDTH-1:0] seq_num;30: reg [REG_WIDTH-1:0] reg_rd, reg_rs, reg_rt;31: reg [FUNC_WIDTH-1:0] alu_func;32: reg inst_en, sim_f, mem_rw, retry, stop;33: reg inst_ack, log_req, cki_req, newpc_req, res_req, rfile_req, ckf_req,34: alu_req, mem_req, halt_ack, roll_ack;35:36: reg [DATA_WIDTH:0] inst_buf, abus_in, abus_out, bbus_in, bbus_out;37: reg [ADDR_WIDTH:0] int_pc;38: reg [5:0] loop; // data/addr 32 bits max39: reg go_decode, parity;40:41: wire [OP_WIDTH-1:0] opcode = inst_buf [DATA_WIDTH-1:OFFSET_WIDTH];42: wire [REG_WIDTH-1:0] rs = inst_buf [OFFSET_WIDTH-1:OFFSET_WIDTH-REG_WIDTH];43: wire [REG_WIDTH-1:0] rt = inst_buf [OFFSET_WIDTH-REG_WIDTH-1:IMM_WIDTH];44: wire [REG_WIDTH-1:0] rd = inst_buf [IMM_WIDTH-1:IMM_WIDTH-REG_WIDTH];45: wire [OFFSET_WIDTH-1:0] offset = inst_buf [OFFSET_WIDTH-1:0];46: wire [IMM_WIDTH-1:0] imm = inst_buf [IMM_WIDTH-1:0];47: wire [EXTRA_WIDTH-1:0] extra = inst_buf [EXTRA_WIDTH-1:0];48:49:50: assign inst_bus = inst_en ? ZZZ : int_pc;51: assign a_bus = (rfile_req | halt_ack) ? ZZZ : abus_out;52: assign b_bus = rfile_req ? ZZZ : bbus_out;53:54:55: always wait (reset)56: begin57: disable rollback_cycle;58: disable fetch_cycle;59: disable decode_cycle;60: chkbits = XXX;61: seq_num = 0; // initial sequence number62: reg_rs = XXX;63: reg_rt = XXX;64: reg_rd = XXX;65: alu_func = XXX;66: inst_en = 1; // enable instructions67: sim_f = 0;68: mem_rw = XXX;69: retry = 0;70: stop = 0;71: inst_ack = 0; // handshaking signals
155
iiu.v
72: log_req = 0;73: cki_req = 0;74: newpc_req = 0;75: res_req = 0;76: rfile_req = 0;77: ckf_req = 0;78: alu_req = 0;79: mem_req = 0;80: halt_ack = 0;81: roll_ack = 0;82:83: int_pc = 0; // initial internal PC84: abus_out = XXX;85: bbus_out = XXX;86: go_decode = 0;87: wait (˜reset);88: end89:90:91: always wait (halt_req & ˜reset)92: begin :rollback_cycle93: #1;94: wait (˜newpc_req & ˜pc_ack & ˜imem_ack & ˜iq_ack);95: halt_ack = 1;96: wait (roll_req);97: #1;98: disable fetch_cycle;99: disable decode_cycle;100: chkbits = XXX;101: reg_rs = XXX;102: reg_rt = XXX;103: reg_rd = XXX;104: alu_func = XXX;105: inst_en = 0;106: sim_f = 0;107: mem_rw = XXX;108: inst_ack = 0;109: log_req = 0;110: cki_req = 0;111: newpc_req = 0;112: res_req = 0;113: rfile_req = 0;114: ckf_req = 0;115: alu_req = 0;116: mem_req = 0;117:118: abus_out = XXX;119: bbus_out = XXX;120: go_decode = 0;121: retry = 1; // rollback flag122:123: roll_ack = 1;124: wait (˜roll_req);125: #1;126: seq_num = val_seq; // sequence number with error127: int_pc = a_bus; // address with error128: fork129: begin130: roll_ack = 0;131: wait (˜halt_req);132: #1;133: halt_ack = 0;134: end135: pc_handler;136: join137: end138:139:140: always wait (inst_req & ˜go_decode & inst_en & ˜halt_req & ˜reset)141: begin :fetch_cycle142: #1;
156
iiu.v
143: inst_buf = inst_bus;144: bbus_out = inst_bus; // for CKI145: abus_out = int_pc; // for LOG146: #DLY_IIU_PCINC;147: int_pc = int_pc + ADDR_INC;148: parity = 0;149: for (loop=0; loop<ADDR_WIDTH; loop=loop+1)150: parity = parity ˆ int_pc [loop];151: int_pc [ADDR_WIDTH] = parity;152: inst_ack = 1;153: go_decode = 1;154: wait (˜inst_req);155: #1;156: inst_ack = 0;157: end158:159:160: always wait (go_decode & ˜halt_req & ˜reset)161: begin :decode_cycle162: #DLY_IIU_DECOD;163: case (opcode) // determine check bits164: OP_SPECIAL:165: if (extra == SOP_NOP)166: chkbits = 4’b 0001;167: else168: chkbits = 4’b 1011;169: OP_J, OP_TRAP:170: chkbits = 4’b 0001;171: OP_BEQZ, OP_BNEZ, OP_JR, OP_JRF:172: chkbits = 4’b 0011;173: OP_SW, OP_SWF:174: chkbits = 4’b 0111;175: OP_LW:176: chkbits = 4’b 1111;177: OP_JAL:178: chkbits = 4’b 1001;179: default: // JALR, ALU immediate180: chkbits = 4’b 1011;181: endcase182: #1;183:184: wait (˜halt_req);185: log_req = 1; // log instruction186: wait (log_ack);187: #1;188: fork189: begin190: wait (˜halt_req);191: log_req = 0;192: wait (˜log_ack);193: end194: begin195: wait (˜halt_req);196: cki_req = 1; // check instruction197: wait (cki_ack);198: #1;199: wait (˜halt_req);200: cki_req = 0;201: wait (˜cki_ack);202: end203: join204:205: case (opcode)206: OP_SPECIAL: special_handler;207: OP_BEQZ, OP_BNEZ: branch_handler;208: OP_J, OP_JAL, OP_JR, OP_JALR, OP_JRF: jump_handler;209: OP_LW, OP_SW, OP_SWF: memory_handler;210: OP_TRAP: trap_handler;211:212: OP_ADDUI: begin alu_func=FUNC_ADDU; alu_control; end213: OP_SUBUI: begin alu_func=FUNC_SUBU; alu_control; end
157
iiu.v
214: OP_ANDI: begin alu_func=FUNC_AND; alu_control; end215: OP_ORI: begin alu_func=FUNC_OR; alu_control; end216: OP_XORI: begin alu_func=FUNC_XOR; alu_control; end217: OP_LHI: begin alu_func=FUNC_LHI; alu_control; end218: OP_SLLI: begin alu_func=FUNC_SLL; alu_control; end219: OP_SRLI: begin alu_func=FUNC_SRL; alu_control; end220: OP_SRAI: begin alu_func=FUNC_SRA; alu_control; end221: OP_SEQI: begin alu_func=FUNC_SEQ; alu_control; end222: OP_SNEI: begin alu_func=FUNC_SNE; alu_control; end223: OP_SLTI: begin alu_func=FUNC_SLT; alu_control; end224: OP_SGTI: begin alu_func=FUNC_SGT; alu_control; end225: OP_SLEI: begin alu_func=FUNC_SLE; alu_control; end226: OP_SGEI: begin alu_func=FUNC_SGE; alu_control; end227: OP_ADDUIF: begin alu_func=FUNC_ADDUF; alu_control; end228:229: default:230: begin231: $display ("Invalid Opcode %h", inst_buf);232: stop = 1;233: wait (reset);234: end235: endcase236: retry = 0; // reset rollback flag237: go_decode = 0;238: seq_num = seq_num + 1;239: end240:241:242: task special_handler;243: begin244: case (extra)245: SOP_NOP: ;246: SOP_ADDU: begin alu_func=FUNC_ADDU; alu_control; end247: SOP_SUBU: begin alu_func=FUNC_SUBU; alu_control; end248: SOP_AND: begin alu_func=FUNC_AND; alu_control; end249: SOP_OR: begin alu_func=FUNC_OR; alu_control; end250: SOP_XOR: begin alu_func=FUNC_XOR; alu_control; end251: SOP_SLL: begin alu_func=FUNC_SLL; alu_control; end252: SOP_SRL: begin alu_func=FUNC_SRL; alu_control; end253: SOP_SRA: begin alu_func=FUNC_SRA; alu_control; end254: SOP_SEQ: begin alu_func=FUNC_SEQ; alu_control; end255: SOP_SNE: begin alu_func=FUNC_SNE; alu_control; end256: SOP_SLT: begin alu_func=FUNC_SLT; alu_control; end257: SOP_SGT: begin alu_func=FUNC_SGT; alu_control; end258: SOP_SLE: begin alu_func=FUNC_SLE; alu_control; end259: SOP_SGE: begin alu_func=FUNC_SGE; alu_control; end260: SOP_ADDUF: begin alu_func=FUNC_ADDUF; alu_control; end261:262: default:263: begin264: $display ("Invalid Special Opcode %h", inst_buf);265: stop = 1;266: wait (reset);267: end268: endcase269: end270: endtask271:272:273: task alu_control;274: begin275: if ((alu_func == FUNC_ADDUF) & retry) // no error if rollback276: alu_func = FUNC_ADDU;277: reg_rs = rs;278: reg_rt = (opcode == OP_SPECIAL) ? rt : 0;279: reg_rd = (opcode == OP_SPECIAL) ? rd : rt;280: reserve_handler;281: regfile_handler;282: abus_out = abus_in;283: bbus_out = (opcode == OP_SPECIAL) ? bbus_in :284: {{(DATA_WIDTH-IMM_WIDTH){imm[IMM_WIDTH-1]}}, imm};
158
iiu.v
285: alu_handler;286: end287: endtask288:289:290: task memory_handler;291: begin292: reg_rs = rs;293: reg_rt = (opcode == OP_LW) ? 0 : rt;294: reg_rd = (opcode == OP_LW) ? rt : 0;295: reserve_handler;296: regfile_handler;297: mem_rw = (opcode == OP_LW);298: #DLY_IIU_ADD;299: abus_out = abus_in +300: {{(DATA_WIDTH-IMM_WIDTH){imm[IMM_WIDTH-1]}}, imm};301: parity = 0;302: for (loop=0; loop<ADDR_WIDTH; loop=loop+1)303: parity = parity ˆ abus_out [loop];304: abus_out [ADDR_WIDTH] = parity;305: bbus_out = bbus_in;306: if ((opcode == OP_SWF) & ˜retry) // simulate fault307: bbus_out [DATA_WIDTH] = ˜bbus_out [DATA_WIDTH];308: #1;309: wait (˜halt_req);310: mem_req = 1;311: wait (mem_ack);312: #1;313: wait (˜halt_req);314: mem_req = 0;315: wait (˜mem_ack);316: end317: endtask318:319:320: task branch_handler;321: begin322: reg_rs = rs;323: reg_rt = 0;324: reg_rd = 0;325: reserve_handler;326: regfile_handler;327: if (((opcode==OP_BEQZ) && (abus_in==0)) ||328: ((opcode==OP_BNEZ) && (abus_in!=0)))329: begin330: #DLY_IIU_ADD;331: int_pc = int_pc[ADDR_WIDTH-1:0] + {{(DATA_WIDTH-IMM_WIDTH)332: {imm[IMM_WIDTH-1]}}, imm}; // sign ext.333: pc_handler;334: end335: end336: endtask337:338:339: task jump_handler;340: begin341: bbus_out = int_pc[ADDR_WIDTH-1:0]; // for JAL and JALR342:343: if ((opcode==OP_J) || (opcode==OP_JAL))344: begin345: #DLY_IIU_ADD;346: int_pc = int_pc[ADDR_WIDTH-1:0] + {{(DATA_WIDTH-OFFSET_WIDTH)347: {offset[OFFSET_WIDTH-1]}}, offset};348: pc_handler;349: end350: else351: begin352: sim_f = (opcode == OP_JRF) & ˜retry; // simulate fault353: reg_rs = rs;354: reg_rt = 0;355: reg_rd = 0;
159
iiu.v
356: reserve_handler;357: regfile_handler;358: int_pc = abus_in;359: pc_handler;360: end361:362: if ((opcode==OP_JAL) || (opcode==OP_JALR))363: begin364: alu_func = FUNC_PASS;365: reg_rs = 0;366: reg_rt = 0;367: reg_rd = REG_SIZE - 1; // save to highest register368: reserve_handler;369: alu_handler;370: end371: end372: endtask373:374:375: task trap_handler;376: begin377: case (offset)378: TRAP_BLANK: $display();379:380: TRAP_STOP:381: begin382: $display ("Trap Stop %h", inst_buf);383: stop = 1;384: wait (reset);385: end386:387: default:388: if (offset < REG_SIZE)389: begin390: reg_rs = offset;391: reg_rt = 0;392: reg_rd = 0;393: reserve_handler;394: regfile_handler;395: $display ("Reg %d: Dec=%d Hex=%h Bin=%b",396: reg_rs, abus_in [DATA_WIDTH-1:0],397: abus_in , abus_in);398: end399: else400: begin401: $display ("Invalid Trap Service %h", inst_buf);402: stop = 1;403: wait (reset);404: end405: endcase406: end407: endtask408:409:410: task reserve_handler;411: begin412: #1;413: wait (˜halt_req);414: res_req = 1;415: wait (res_ack);416: #1;417: wait (˜halt_req);418: res_req = 0;419: wait (˜res_ack);420: #1;421: end422: endtask423:424:425: task regfile_handler;426: begin
160
iiu.v
427: rfile_req = 1;428: wait (rfile_ack);429: #1;430: wait (˜halt_req);431: abus_in = a_bus;432: bbus_in = b_bus;433: ckf_req = 1; // notify checker434: wait (ckf_ack);435: #1;436: rfile_req = 0;437: wait (˜halt_req);438: ckf_req = 0;439: wait (˜ckf_ack & ˜rfile_ack);440: end441: endtask442:443:444: task alu_handler;445: begin446: parity = 0;447: for (loop=0; loop<DATA_WIDTH; loop=loop+1)448: parity = parity ˆ abus_out [loop];449: abus_out [DATA_WIDTH] = parity;450: parity = 0;451: for (loop=0; loop<DATA_WIDTH; loop=loop+1)452: parity = parity ˆ bbus_out [loop];453: bbus_out [DATA_WIDTH] = parity;454: #1;455: wait (˜halt_req);456: alu_req = 1;457: wait (alu_ack);458: #1;459: wait (˜halt_req);460: alu_req = 0;461: wait (˜alu_ack);462: end463: endtask464:465:466: task pc_handler; // not interrupted by rollback467: begin468: parity = 0;469: for (loop=0; loop<ADDR_WIDTH; loop=loop+1) // calculate parity470: parity = parity ˆ int_pc [loop];471: int_pc [ADDR_WIDTH] = parity;472:473: wait (˜inst_ack); // finish inst fetch cycle474: inst_en = 0;475: #1;476: newpc_req = 1;477: wait (pc_ack & imem_ack & iq_ack);478: #1;479: inst_en = 1;480: newpc_req = 0;481: wait (˜pc_ack & ˜imem_ack & ˜iq_ack);482: end483: endtask484:485: endmodule // iiu
161
imem.v
1: // Instruction Memory2:3: module imem (addr, retry, in_req, in_ack, data, out_req, out_ack,4: cancel_req, cancel_ack, reset);5:6: ‘include "parameter"7:8: input [ADDR_WIDTH:0] addr;9: output [DATA_WIDTH:0] data;10: input retry, in_req, out_ack, cancel_req, reset;11: output in_ack, out_req, cancel_ack;12:13: reg [DATA_WIDTH:0] data, imemory [0:IMEM_SIZE-1];14: reg in_ack, out_req, cancel_ack;15:16: reg [5:0] loop; // DATA_WIDTH = 32 bits max17: reg parity;18:19: initial20: $readmemh ("inst.hex", imemory, 0);21:22: always wait (reset)23: begin24: disable memory_cycle;25: disable cancel_cycle;26: data = XXX;27: in_ack = 0;28: out_req = 0;29: cancel_ack = 0;30: wait (˜reset);31: end32:33: always wait (cancel_req & ˜reset)34: begin :cancel_cycle35: #1;36: disable memory_cycle;37: data = XXX;38: in_ack = 0;39: out_req = 0;40: cancel_ack = 1;41: wait (˜cancel_req);42: #1;43: cancel_ack = 0;44: end45:46: always wait (in_req & ˜cancel_req & ˜reset)47: begin :memory_cycle48: #DLY_IMEM_RD;49: data = imemory [addr[ADDR_WIDTH-1:ADDR_IGNORE]];50:51: if (retry) // correct parity error52: begin53: parity = 0;54: for (loop=0; loop<DATA_WIDTH; loop=loop+1)55: parity = parity ˆ data [loop];56: data [DATA_WIDTH] = parity;57: imemory [addr[ADDR_WIDTH-1:ADDR_IGNORE]] = data;58: end59:60: fork61: begin62: in_ack = 1;63: wait (˜in_req);64: #1;65: in_ack = 0;66: end67: begin68: #1;69: out_req = 1;70: wait (out_ack);71: #1;
162
imem.v
72: out_req = 0;73: wait (˜out_ack);74: end75: join76: end77:78: endmodule // imem
163
iq.v
1: // Instruction Queue2:3: module iq (in_data, in_req, in_ack, tri_data, out_req, out_ack, en_out,4: cancel_req, cancel_ack, reset);5:6: ‘include "parameter"7:8: input [DATA_WIDTH:0] in_data;9: output [DATA_WIDTH:0] tri_data;10: input in_req, out_ack, en_out, cancel_req, reset;11: output in_ack, out_req, cancel_ack;12:13: ‘ifdef BEHAV_IQ14:15: // ====================================================================16: // behavioral level17:18: wire [DATA_WIDTH:0] data1, out_data;19:20: iq_buffer_b buf1 (in_data, in_req, in_ack, data1, req1, ack1, cancel_req,21: cancel_a1, reset);22: iq_buffer_b buf2 (data1, req1, ack1, out_data, out_req, out_ack, cancel_req,23: cancel_a2, reset);24:25: assign tri_data = en_out ? out_data : ZZZ;26:27: reg cancel_ack;28:29: always wait (reset)30: begin31: cancel_ack = 0;32: wait (˜reset);33: end34:35: always @(cancel_a1 or cancel_a2)36: if (cancel_a1 & cancel_a2)37: cancel_ack = 1;38: else if (˜cancel_a1 & ˜cancel_a2)39: cancel_ack = 0;40:41: endmodule // iq (behavioral level)42:43: // ====================================================================44: // behavioral level45:46: module iq_buffer_b (in_d, in_r, in_a, out_d, out_r, out_a, cancel_r,47: cancel_a, reset);48:49: ‘include "parameter"50:51: input [DATA_WIDTH:0] in_d;52: output [DATA_WIDTH:0] out_d;53: input in_r, out_a, cancel_r, reset;54: output in_a, out_r, cancel_a;55:56: reg [DATA_WIDTH:0] out_d;57: reg valid, in_a, out_r, cancel_a;58:59: always wait (reset)60: begin61: disable input_cycle;62: disable output_cycle;63: disable cancel_cycle;64: out_d = XXX;65: valid = 0;66: in_a = 0;67: out_r = 0;68: cancel_a = 0;69: wait (˜reset);70: end71:
164
iq.v
72: always wait (cancel_r & ˜reset)73: begin :cancel_cycle74: #1;75: disable input_cycle;76: disable output_cycle;77: out_d = XXX;78: valid = 0;79: in_a = 0;80: out_r = 0;81: cancel_a = 1;82: wait (˜cancel_r);83: #1;84: cancel_a = 0;85: end86:87: always wait (in_r & ˜valid & ˜cancel_r & ˜reset)88: begin :input_cycle89: #1;90: out_d = in_d;91: in_a = 1;92: valid = 1;93: wait (˜in_r);94: #1;95: in_a = 0;96: end97:98: always wait (valid & ˜cancel_r & ˜reset)99: begin :output_cycle100: #1;101: out_r = 1;102: wait (out_a);103: #1;104: valid = 0;105: out_r = 0;106: wait (˜out_a);107: end108:109: endmodule // iq_buffer_b (behavioral level)110:111: ‘else112:113: // ====================================================================114: // gate level115:116: wire [DATA_WIDTH:0] data1, out_data;117:118: iq_buffer_g buf1 (in_data, in_req, in_ack, data1, req1, ack1, cancel_req,119: cancel_a1, reset);120: iq_buffer_g buf2 (data1, req1, ack1, out_data, out_req, out_ack, cancel_req,121: cancel_a2, reset);122:123: muller2 m1 (cancel_a1, cancel_a2, cancel_ack);124:125: bufif1 #0126: (tri_data[0], out_data[0], en_out),127: (tri_data[1], out_data[1], en_out),128: (tri_data[2], out_data[2], en_out),129: (tri_data[3], out_data[3], en_out),130: (tri_data[4], out_data[4], en_out),131: (tri_data[5], out_data[5], en_out),132: (tri_data[6], out_data[6], en_out),133: (tri_data[7], out_data[7], en_out),134: (tri_data[8], out_data[8], en_out),135: (tri_data[9], out_data[9], en_out),136: (tri_data[10], out_data[10], en_out),137: (tri_data[11], out_data[11], en_out),138: (tri_data[12], out_data[12], en_out),139: (tri_data[13], out_data[13], en_out),140: (tri_data[14], out_data[14], en_out),141: (tri_data[15], out_data[15], en_out),142: (tri_data[16], out_data[16], en_out),
165
iq.v
143: (tri_data[17], out_data[17], en_out),144: (tri_data[18], out_data[18], en_out),145: (tri_data[19], out_data[19], en_out),146: (tri_data[20], out_data[20], en_out),147: (tri_data[21], out_data[21], en_out),148: (tri_data[22], out_data[22], en_out),149: (tri_data[23], out_data[23], en_out),150: (tri_data[24], out_data[24], en_out),151: (tri_data[25], out_data[25], en_out),152: (tri_data[26], out_data[26], en_out),153: (tri_data[27], out_data[27], en_out),154: (tri_data[28], out_data[28], en_out),155: (tri_data[29], out_data[29], en_out),156: (tri_data[30], out_data[30], en_out),157: (tri_data[31], out_data[31], en_out),158: (tri_data[32], out_data[32], en_out); // parity bit159:160: endmodule // iq (gate level)161:162: // ====================================================================163: // gate level164:165: module iq_buffer_g (in_d, in_r, in_a, out_d, out_r, out_a, cancel_r,166: cancel_a, reset);167:168: ‘include "parameter"169:170: input [DATA_WIDTH:0] in_d;171: output [DATA_WIDTH:0] out_d;172: input in_r, out_a, cancel_r, reset;173: output in_a, out_r, cancel_a;174:175: muller2c x1 (in_r, ˜out_r, clear, latch);176: muller2c x2 (complete, ˜out_a, clear, out_r);177: regcmpl x3 (latch, clear, regclk, complete);178: or #1 g1 (clear, cancel_r, reset);179: buf #7 g2 (cancel_a, cancel_r);180: wire in_a = complete;181:182: ‘ifdef WAVE_IQ183: initial184: #0 $gr_addwaves ("in_d", in_d, "out_d", out_d, "in_r", in_r, "in_a", in_a,185: "out_r", out_r, "out_a", out_a, "can_r", cancel_r,186: "can_a", cancel_a);187: ‘endif188:189: dl b0 (in_d[0], regclk, out_d[0]),190: b1 (in_d[1], regclk, out_d[1]),191: b2 (in_d[2], regclk, out_d[2]),192: b3 (in_d[3], regclk, out_d[3]),193: b4 (in_d[4], regclk, out_d[4]),194: b5 (in_d[5], regclk, out_d[5]),195: b6 (in_d[6], regclk, out_d[6]),196: b7 (in_d[7], regclk, out_d[7]),197: b8 (in_d[8], regclk, out_d[8]),198: b9 (in_d[9], regclk, out_d[9]),199: b10 (in_d[10], regclk, out_d[10]),200: b11 (in_d[11], regclk, out_d[11]),201: b12 (in_d[12], regclk, out_d[12]),202: b13 (in_d[13], regclk, out_d[13]),203: b14 (in_d[14], regclk, out_d[14]),204: b15 (in_d[15], regclk, out_d[15]),205: b16 (in_d[16], regclk, out_d[16]),206: b17 (in_d[17], regclk, out_d[17]),207: b18 (in_d[18], regclk, out_d[18]),208: b19 (in_d[19], regclk, out_d[19]),209: b20 (in_d[20], regclk, out_d[20]),210: b21 (in_d[21], regclk, out_d[21]),211: b22 (in_d[22], regclk, out_d[22]),212: b23 (in_d[23], regclk, out_d[23]),213: b24 (in_d[24], regclk, out_d[24]),
166
iq.v
214: b25 (in_d[25], regclk, out_d[25]),215: b26 (in_d[26], regclk, out_d[26]),216: b27 (in_d[27], regclk, out_d[27]),217: b28 (in_d[28], regclk, out_d[28]),218: b29 (in_d[29], regclk, out_d[29]),219: b30 (in_d[30], regclk, out_d[30]),220: b31 (in_d[31], regclk, out_d[31]),221: b32 (in_d[32], regclk, out_d[32]); // parity bit222:223: endmodule // iq_buffer_g (gate level)224:225: ‘endif
167
log.v
1: // Instruction Log2:3: module log (log_seq, log_chkbits, a_bus, log_req, log_ack,4: chk_seq, chk_chkid, chk_error, chk_req, chk_ack,5: val_seq, valmem_req, valmem_ack, valreg_req, valreg_ack,6: halt_req, halt_ack, roll_req, roll_ack, reset);7:8: ‘include "parameter"9:10: input [SEQ_WIDTH-1:0] log_seq, chk_seq;11: input [CHECKERS-1:0] log_chkbits;12: inout [DATA_WIDTH:0] a_bus;13: input [CHKID_WIDTH-1:0] chk_chkid;14: output [SEQ_WIDTH-1:0] val_seq;15: input chk_error, reset;16: input log_req, chk_req, valmem_ack, valreg_ack, halt_ack, roll_ack;17: output log_ack, chk_ack, valmem_req, valreg_req, halt_req, roll_req;18:19: reg [SEQ_WIDTH-1:0] val_seq;20: reg chk_ack, valmem_req, valreg_req, halt_req, roll_req;21:22: reg [SEQ_WIDTH-1:0] clr_seq;23: reg [CHKID_WIDTH-1:0] clr_id;24: reg [DWBS-1:0] buf_dwb;25: reg del_ack, clr_req, arbdel_req, arbdel_ack, arbchk_req, arbchk_ack;26:27: wire [DWBS-1:0] log_dwb = log_chkbits [CHECKERS-1:CHECKERS-DWBS];28: wire [SEQ_WIDTH-1:0] del_seq;29: wire [CHECKERS-1:0] del_chkbits;30: wire [DWBS-1:0] del_dwb;31: wire del_req, clr_ack;32:33: logq logq (log_seq, log_chkbits, log_dwb, a_bus, log_req, log_ack,34: del_seq, del_chkbits, del_dwb, del_req, del_ack,35: clr_seq, clr_id, clr_req, clr_ack,36: halt_req, haltq_ack, roll_req, rollq_ack, reset);37:38: always wait (reset)39: begin40: disable delete_cycle;41: disable checker_cycle;42: disable arb_cycle;43: val_seq = XXX;44: chk_ack = 0;45: valmem_req = 0;46: valreg_req = 0;47: halt_req = 0;48: roll_req = 0;49: del_ack = 0;50: clr_req = 0;51: arbdel_req = 0;52: arbdel_ack = 0;53: arbchk_req = 0;54: arbchk_ack = 0;55: wait (˜reset);56: end57:58: always wait (chk_req & ˜reset)59: begin :checker_cycle60: #1;61: clr_seq = chk_seq;62: clr_id = chk_chkid;63: if (chk_error)64: begin // rollback process65: arbchk_req = 1;66: wait (arbchk_ack);67: #1;68: val_seq = chk_seq; // sequence number with error69: halt_req = 1;70: wait (halt_ack & haltq_ack);71: #1;
168
log.v
72: roll_req = 1;73: disable delete_cycle; // take care of local business74: arbdel_req = 0;75: wait (roll_ack & rollq_ack);76: #1;77: roll_req = 0;78: wait (˜roll_ack & ˜rollq_ack);79: #1;80: halt_req = 0;81: arbchk_req = 0;82: wait (˜halt_ack & ˜haltq_ack & ˜arbchk_ack);83: end84: else85: fork86: begin // clear the check bit87: #1;88: clr_req = 1;89: wait (clr_ack);90: #1;91: clr_req = 0;92: wait (˜clr_ack);93: end94: begin95: chk_ack = 1; // finish K-bus transaction96: wait (˜chk_req);97: #1;98: chk_ack = 0;99: end100: join101: end102:103: always wait (del_req & ˜halt_req & ˜reset)104: begin :delete_cycle105: wait (del_chkbits == 0); // until all bits are cleared106: #1;107: arbdel_req = 1;108: wait (arbdel_ack);109: #1;110: val_seq = del_seq;111: buf_dwb = del_dwb;112: del_ack = 1; // delete from log113: fork114: begin115: wait (˜del_req);116: #1;117: del_ack = 0;118: end119: case (buf_dwb) // validate appropriate DWB120: 2’b01:121: begin122: #1;123: valmem_req = 1;124: wait (valmem_ack);125: #1;126: valmem_req = 0;127: arbdel_req = 0;128: wait (˜valmem_ack & ˜arbdel_ack);129: end130: 2’b10, 2’b11:131: begin132: #1;133: valreg_req = 1;134: wait (valreg_ack);135: #1;136: valreg_req = 0;137: arbdel_req = 0;138: wait (˜valreg_ack & ˜arbdel_ack);139: end140: default:141: begin142: arbdel_req = 0;
169
log.v
143: wait (˜arbdel_ack);144: end145: endcase146: join147: end148:149: always wait ((arbchk_req | arbdel_req) & ˜reset)150: begin :arb_cycle151: #1;152: if (arbchk_req) // invalidate has priority over validate153: begin154: arbchk_ack = 1;155: wait (˜arbchk_req);156: #1;157: arbchk_ack = 0;158: end159: else160: begin161: arbdel_ack = 1;162: wait (˜arbdel_req);163: #1;164: arbdel_ack = 0;165: end166: end167:168: endmodule // log169:170: // ====================================================================171: // Log Queue: Note that address is not used in the output.172:173: module logq (in_seq, in_chkbits, in_dwb, a_bus, in_req, in_ack,174: out_seq, out_chkbits, out_dwb, out_req, out_ack,175: clr_seq, clr_id, clr_req, clr_ack,176: halt_req, halt_ack, roll_req, roll_ack, reset);177:178: ‘include "parameter"179:180: input [SEQ_WIDTH-1:0] in_seq, clr_seq;181: input [CHECKERS-1:0] in_chkbits;182: input [DWBS-1:0] in_dwb;183: inout [DATA_WIDTH:0] a_bus;184: output [SEQ_WIDTH-1:0] out_seq;185: output [CHECKERS-1:0] out_chkbits;186: output [DWBS-1:0] out_dwb;187: input [CHKID_WIDTH-1:0] clr_id;188: input in_req, out_ack, clr_req, halt_req, roll_req, reset;189: output in_ack, out_req, clr_ack, halt_ack, roll_ack;190:191: reg clr_ack, halt_ack, roll_ack;192:193: wire [SEQ_WIDTH-1:0] seq1, seq2, seq3, seq4, seq5, seq6, seq7;194: wire [CHECKERS-1:0] chkbits1, chkbits2, chkbits3, chkbits4, chkbits5,195: chkbits6, chkbits7;196: wire [DWBS-1:0] dwb1, dwb2, dwb3, dwb4, dwb5, dwb6, dwb7;197: wire [ADDR_WIDTH:0] addr1, addr2, addr3, addr4, addr5, addr6, addr7,198: not_used, tri_addr;199: wire [ADDR_WIDTH:0] adpt_addr = a_bus; // "addr" adaptor200:201: assign a_bus = {ZZZ, tri_addr}; // compensate for narrow bus202:203:204: log_buf buf1 (in_seq, in_chkbits, in_dwb, adpt_addr, in_req, in_ack,205: seq1, chkbits1, dwb1, addr1, req1, ack1,206: clr_seq, clr_id, clr_req, clr_a1,207: halt_req, halt_a1, roll_req, roll_a1, tri_addr, reset);208:209: log_buf buf2 (seq1, chkbits1, dwb1, addr1, req1, ack1,210: seq2, chkbits2, dwb2, addr2, req2, ack2,211: clr_seq, clr_id, clr_req, clr_a2,212: halt_req, halt_a2, roll_req, roll_a2, tri_addr, reset);213:
170
log.v
214: log_buf buf3 (seq2, chkbits2, dwb2, addr2, req2, ack2,215: seq3, chkbits3, dwb3, addr3, req3, ack3,216: clr_seq, clr_id, clr_req, clr_a3,217: halt_req, halt_a3, roll_req, roll_a3, tri_addr, reset);218:219: log_buf buf4 (seq3, chkbits3, dwb3, addr3, req3, ack3,220: seq4, chkbits4, dwb4, addr4, req4, ack4,221: clr_seq, clr_id, clr_req, clr_a4,222: halt_req, halt_a4, roll_req, roll_a4, tri_addr, reset);223:224: log_buf buf5 (seq4, chkbits4, dwb4, addr4, req4, ack4,225: seq5, chkbits5, dwb5, addr5, req5, ack5,226: clr_seq, clr_id, clr_req, clr_a5,227: halt_req, halt_a5, roll_req, roll_a5, tri_addr, reset);228:229: log_buf buf6 (seq5, chkbits5, dwb5, addr5, req5, ack5,230: seq6, chkbits6, dwb6, addr6, req6, ack6,231: clr_seq, clr_id, clr_req, clr_a6,232: halt_req, halt_a6, roll_req, roll_a6, tri_addr, reset);233:234: log_buf buf7 (seq6, chkbits6, dwb6, addr6, req6, ack6,235: seq7, chkbits7, dwb7, addr7, req7, ack7,236: clr_seq, clr_id, clr_req, clr_a7,237: halt_req, halt_a7, roll_req, roll_a7, tri_addr, reset);238:239: log_buf buf8 (seq7, chkbits7, dwb7, addr7, req7, ack7,240: out_seq, out_chkbits, out_dwb, not_used, out_req, out_ack,241: clr_seq, clr_id, clr_req, clr_a8,242: halt_req, halt_a8, roll_req, roll_a8, tri_addr, reset);243:244:245: always wait (reset)246: begin247: clr_ack = 0;248: halt_ack = 0;249: roll_ack = 0;250: wait (˜reset);251: end252:253: always @(clr_a1 or clr_a2 or clr_a3 or clr_a4 or clr_a5 or clr_a6 or254: clr_a7 or clr_a8)255: if (clr_a1 & clr_a2 & clr_a3 & clr_a4 & clr_a5 & clr_a6 & clr_a7 &256: clr_a8)257: clr_ack = 1;258: else if (˜clr_a1 & ˜clr_a2 & ˜clr_a3 & ˜clr_a4 & ˜clr_a5 & ˜clr_a6 &259: ˜clr_a7 & ˜clr_a8)260: clr_ack = 0;261:262: always @(halt_a1 or halt_a2 or halt_a3 or halt_a4 or halt_a5 or halt_a6 or263: halt_a7 or halt_a8)264: if (halt_a1 & halt_a2 & halt_a3 & halt_a4 & halt_a5 & halt_a6 &265: halt_a7 & halt_a8)266: halt_ack = 1;267: else if (˜halt_a1 & ˜halt_a2 & ˜halt_a3 & ˜halt_a4 & ˜halt_a5 &268: ˜halt_a6 & ˜halt_a7 & ˜halt_a8)269: halt_ack = 0;270:271: always @(roll_a1 or roll_a2 or roll_a3 or roll_a4 or roll_a5 or roll_a6 or272: roll_a7 or roll_a8)273: if (roll_a1 & roll_a2 & roll_a3 & roll_a4 & roll_a5 & roll_a6 &274: roll_a7 & roll_a8)275: roll_ack = 1;276: else if (˜roll_a1 & ˜roll_a2 & ˜roll_a3 & ˜roll_a4 & ˜roll_a5 &277: ˜roll_a6 & ˜roll_a7 & ˜roll_a8)278: roll_ack = 0;279:280: endmodule // logq281:282: // ====================================================================283:284: module log_buf (in_s, in_k, in_w, in_d, in_r, in_a,
171
log.v
285: out_s, out_k, out_w, out_d, out_r, out_a,286: clr_s, clr_i, clr_r, clr_a,287: halt_r, halt_a, roll_r, roll_a, tri_addr, reset);288:289: ‘include "parameter"290:291: input [SEQ_WIDTH-1:0] in_s, clr_s;292: input [CHECKERS-1:0] in_k;293: input [DWBS-1:0] in_w;294: input [ADDR_WIDTH:0] in_d;295: output [SEQ_WIDTH-1:0] out_s;296: output [CHECKERS-1:0] out_k;297: output [DWBS-1:0] out_w;298: output [ADDR_WIDTH:0] out_d, tri_addr;299: input [CHKID_WIDTH-1:0] clr_i;300: input in_r, out_a, clr_r, halt_r, roll_r, reset;301: output in_a, out_r, clr_a, halt_a, roll_a;302:303: reg [SEQ_WIDTH-1:0] out_s;304: reg [CHECKERS-1:0] out_k;305: reg [DWBS-1:0] out_w;306: reg [ADDR_WIDTH:0] out_d, tri_addr;307: reg in_a, out_r, clr_a, halt_a, roll_a;308:309: reg [SEQ_WIDTH-1:0] diff;310: reg valid;311:312: always wait (reset)313: begin314: disable clear_cycle;315: disable rollback_cycle;316: disable input_cycle;317: disable output_cycle;318: out_s = XXX;319: out_k = XXX;320: out_w = XXX;321: out_d = XXX;322: tri_addr = ZZZ;323: in_a = 0;324: out_r = 0;325: clr_a = 0;326: halt_a = 0;327: roll_a = 0;328: valid = 0;329: wait (˜reset);330: end331:332: always wait (clr_r & ˜reset) // clear check bit333: begin :clear_cycle334: #1;335: clr_a = 1;336: wait (˜clr_r);337: #1;338: if (out_s == clr_s)339: out_k [clr_i] = 0;340: #1;341: clr_a = 0;342: end343:344: always wait (halt_r & ˜reset)345: begin :rollback_cycle346: #1;347: halt_a = 1;348: wait (roll_r);349: #1;350: disable input_cycle;351: disable output_cycle;352: in_a = 0;353: out_r = 0;354: if (valid & (out_s == clr_s))355: tri_addr = out_d; // address to be rolled back
172
log.v
356:357: #DLY_SEQ_COMP;358: diff = out_s - clr_s; // compare sequence numbers359: if (˜diff [SEQ_WIDTH-1])360: begin361: valid = 0; // invalidate entry362: out_s = XXX;363: out_k = XXX;364: out_w = XXX;365: out_d = XXX;366: end367:368: roll_a = 1;369: fork370: begin371: wait (˜roll_r);372: #1;373: roll_a = 0;374: end375: begin376: wait (˜halt_r);377: #1;378: tri_addr = ZZZ; // put back tri-state379: halt_a = 0;380: end381: join382: end383:384: always wait (in_r & ˜valid & ˜clr_r & ˜halt_r & ˜reset)385: begin :input_cycle386: #1;387: wait (˜clr_r & ˜clr_a & ˜halt_r);388: out_s = in_s;389: out_k = in_k;390: out_w = in_w;391: out_d = in_d;392: in_a = 1;393: valid = 1;394: wait (˜in_r);395: #1;396: wait (˜clr_r & ˜clr_a & ˜halt_r);397: in_a = 0;398: end399:400: always wait (valid & ˜clr_r & ˜halt_r & ˜reset)401: begin :output_cycle402: #1;403: wait (˜clr_r & ˜clr_a & ˜halt_r);404: out_r = 1;405: wait (out_a);406: #1;407: valid = 0;408: wait (˜clr_r & ˜clr_a & ˜halt_r);409: out_r = 0;410: wait (˜out_a);411: end412:413: endmodule // log_buf
173
memdwb.v
1: // Memory Delayed Write Buffer: Note initial wait bit = 12:3: module memdwb (in_seq, in_reg, in_addr, in_data, in_rw_mode, in_retry,4: in_req, in_ack,5: chk_req, chk_ack, val_seq, val_req, val_ack,6: out_addr, tri_data, rw_mode, out_retry, out_req, out_ack,7: out_seq, out_reg, tri_dq_req, dq_ack,8: halt_req, halt_ack, roll_req, roll_ack, reset);9:10: ‘include "parameter"11:12: input [SEQ_WIDTH-1:0] in_seq, val_seq;13: input [REG_WIDTH-1:0] in_reg;14: input [DATA_WIDTH:0] in_addr, in_data;15: output [ADDR_WIDTH:0] out_addr;16: output [DATA_WIDTH:0] tri_data;17: output [SEQ_WIDTH-1:0] out_seq;18: output [REG_WIDTH-1:0] out_reg;19: input in_rw_mode, in_retry, reset;20: output rw_mode, out_retry;21: input in_req, chk_ack, val_req, out_ack, dq_ack, halt_req, roll_req;22: output in_ack, chk_req, val_ack, out_req, tri_dq_req, halt_ack, roll_ack;23:24: reg [ADDR_WIDTH:0] out_addr;25: reg [SEQ_WIDTH-1:0] out_seq;26: reg [REG_WIDTH-1:0] out_reg;27: reg rw_mode, out_retry;28: reg in_ack, chk_req, out_req, tri_dq_req, halt_ack, roll_ack;29:30: reg [ADDR_WIDTH:0] rd_addr;31: reg [DATA_WIDTH:0] out_data;32: reg [SEQ_WIDTH-1:0] diff;33: reg go_read, arbrd_req, arbrd_ack, arbwr_req, arbwr_ack, wrq_req,34: com_ack, rdq_req, haltc_ack, rollc_ack;35:36: wire [ADDR_WIDTH:0] adpt_addr = in_addr; // "addr" adaptor37: wire [ADDR_WIDTH:0] com_addr;38: wire [DATA_WIDTH:0] com_data, tri_qout;39:40: memdwbq memdwbq (in_seq, 1’b1, adpt_addr, in_data, wrq_req, wrq_ack,41: com_wait, com_addr, com_data, com_req, com_ack,42: val_seq, val_req, val_ack,43: rd_addr, rdq_req, rdq_ack, match, tri_qout,44: halt_req, haltq_ack, roll_req, rollq_ack, reset);45:46: assign tri_data = rw_mode ? ZZZ : out_data;47:48: always wait (reset)49: begin50: disable rollback_cycle;51: disable input_cycle;52: disable read_cycle;53: disable commit_cycle;54: disable arb_cycle;55: out_addr = XXX;56: out_seq = XXX;57: out_reg = XXX;58: rw_mode = XXX;59: out_retry = 0;60: in_ack = 0;61: chk_req = 0;62: out_req = 0;63: tri_dq_req = ZZZ;64: halt_ack = 0;65: roll_ack = 0;66: out_data = XXX;67: go_read = 0;68: arbrd_req = 0;69: arbrd_ack = 0;70: arbwr_req = 0;71: arbwr_ack = 0;
174
memdwb.v
72: wrq_req = 0;73: com_ack = 0;74: rdq_req = 0;75: haltc_ack = 0;76: rollc_ack = 0;77: wait (˜reset);78: end79:80: always @(haltq_ack or haltc_ack)81: if (haltq_ack & haltc_ack)82: halt_ack = 1;83: else if (˜haltq_ack & ˜haltc_ack)84: halt_ack = 0;85:86: always @(rollq_ack or rollc_ack)87: if (rollq_ack & rollc_ack)88: roll_ack = 1;89: else if (˜rollq_ack & ˜rollc_ack)90: roll_ack = 0;91:92: always wait (halt_req & ˜reset)93: begin :rollback_cycle94: #1;95: haltc_ack = 1;96: wait (roll_req);97: #1;98: disable input_cycle;99: disable read_cycle;100: disable commit_cycle;101: disable arb_cycle;102: out_addr = XXX;103: rw_mode = XXX;104: out_retry = 0;105: in_ack = 0;106: chk_req = 0;107: out_req = 0;108: tri_dq_req = ZZZ;109: out_data = XXX;110: arbrd_req = 0;111: arbrd_ack = 0;112: arbwr_req = 0;113: arbwr_ack = 0;114: wrq_req = 0;115: com_ack = 0;116: rdq_req = 0;117:118: #DLY_SEQ_COMP;119: diff = out_seq - val_seq; // compare sequence numbers120: if (˜diff [SEQ_WIDTH-1])121: begin122: go_read = 0;123: out_seq = XXX;124: out_reg = XXX;125: end126:127: rollc_ack = 1;128: fork129: begin130: wait (˜roll_req);131: #1;132: rollc_ack = 0;133: end134: begin135: wait (˜halt_req);136: #1;137: haltc_ack = 0;138: end139: join140: end141:142: always wait (in_req & ˜halt_req & ˜reset)
175
memdwb.v
143: begin :input_cycle144: #1;145: if (in_rw_mode)146: begin147: wait (˜go_read & ˜halt_req);148: out_seq = in_seq;149: out_reg = in_reg;150: rd_addr = in_addr;151: out_retry = in_retry;152: #1;153: wait (˜halt_req);154: go_read = 1; // start read cycle155: chk_req = 1; // notify checker156: wait (chk_ack);157: #1;158: fork // finish up transactions159: begin160: wait (˜halt_req);161: chk_req = 0;162: wait (˜chk_ack);163: end164: begin165: wait (˜halt_req);166: in_ack = 1;167: wait (˜in_req);168: #1;169: wait (˜halt_req);170: in_ack = 0;171: end172: join173: end174: else175: begin176: wrq_req = 1; // write to queue177: wait (wrq_ack); // wait until actually accepted178: #1;179: chk_req = 1; // notify checker180: wait (chk_ack);181: #1;182: fork // finish up transactions183: begin184: wrq_req = 0;185: chk_req = 0;186: wait (˜wrq_ack & ˜chk_ack);187: end188: begin189: in_ack = 1;190: wait (˜in_req);191: #1;192: in_ack = 0;193: end194: join195: end196: end197:198: always wait (go_read & ˜halt_req & ˜reset)199: begin :read_cycle200: #1;201: rdq_req = 1; // search queue first202: wait (rdq_ack);203: #1;204: arbrd_req = 1; // request memory cycle205: wait (arbrd_ack);206: #1;207: if (match)208: begin // data still in queue209: out_data = tri_qout;210: rdq_req = 0; // release queue211: rw_mode = 0; // write mode to access bus212: #1;213: wait (˜halt_req);
176
memdwb.v
214: tri_dq_req = 1; // write DQ directly215: wait (dq_ack);216: #1;217: go_read = 0;218: wait (˜halt_req);219: tri_dq_req = ZZZ; // tri-state it220: arbrd_req = 0;221: wait (˜dq_ack & ˜arbrd_ack & ˜rdq_ack);222: end223: else224: begin225: rdq_req = 0; // release queue226: out_addr = rd_addr;227: rw_mode = 1; // read mode228: #1;229: wait (˜halt_req);230: out_req = 1;231: wait (out_ack);232: #1;233: go_read = 0;234: wait (˜halt_req);235: out_req = 0;236: arbrd_req = 0;237: wait (˜out_ack & ˜arbrd_ack & ˜rdq_ack);238: end239: end240:241: always wait (com_req & ˜halt_req & ˜reset)242: begin :commit_cycle243: wait (˜com_wait); // wait until validated244: #1;245: arbwr_req = 1; // request memory cycle246: wait (arbwr_ack);247: #1;248: out_addr = com_addr;249: out_data = com_data;250: rw_mode = 0; // write mode251: #1;252: wait (˜halt_req);253: out_req = 1; // write to data memory254: wait (out_ack);255: #1;256: wait (˜halt_req);257: com_ack = 1; // delete from queue258: out_req = 0;259: arbwr_req = 0;260: wait (˜com_req);261: #1;262: wait (˜halt_req);263: com_ack = 0;264: wait (˜out_ack & ˜arbwr_ack);265: end266:267: always wait ((arbrd_req | arbwr_req) & ˜halt_req & ˜reset)268: begin :arb_cycle269: #1;270: if (arbrd_req) // read has priority over write271: begin272: arbrd_ack = 1;273: wait (˜arbrd_req);274: #1;275: arbrd_ack = 0;276: end277: else278: begin279: arbwr_ack = 1;280: wait (˜arbwr_req);281: #1;282: arbwr_ack = 0;283: end284: end
177
memdwb.v
285:286: endmodule // memdwb287:288: // ====================================================================289: // MEM_DWB Queue: Note that sequence number is not used in the output.290:291: module memdwbq (in_seq, in_wait, in_addr, in_data, in_req, in_ack,292: out_wait, out_addr, out_data, out_req, out_ack,293: val_seq, val_req, val_ack,294: read_addr, read_req, read_ack, match, tri_out,295: halt_req, halt_ack, roll_req, roll_ack, reset);296:297: ‘include "parameter"298:299: input [SEQ_WIDTH-1:0] in_seq, val_seq;300: input [ADDR_WIDTH:0] in_addr, read_addr;301: input [DATA_WIDTH:0] in_data;302: output [ADDR_WIDTH:0] out_addr;303: output [DATA_WIDTH:0] out_data, tri_out;304: input in_wait, reset;305: output out_wait, match;306: input in_req, out_ack, val_req, read_req, halt_req, roll_req;307: output in_ack, out_req, val_ack, read_ack, halt_ack, roll_ack;308:309: reg val_ack, read_ack, halt_ack, roll_ack;310:311: reg read_ready, read_en1, read_en2, read_en3, read_en4, haltr_ack, rollr_ack;312:313: wire [SEQ_WIDTH-1:0] seq1, seq2, seq3, not_used;314: wire [ADDR_WIDTH:0] addr1, addr2, addr3;315: wire [DATA_WIDTH:0] data1, data2, data3;316: wire match1, match2, match3, match4;317:318: assign match = match1 | match2 | match3 | match4;319:320:321: memdwb_buf buf1 (in_seq, in_wait, in_addr, in_data, in_req, in_ack,322: seq1, wait1, addr1, data1, req1, ack1,323: val_seq, val_req, val_ack1,324: read_addr, read_req, read_ack1, match1, read_en1, tri_out,325: halt_req, halt_a1, roll_req, roll_a1, reset);326:327: memdwb_buf buf2 (seq1, wait1, addr1, data1, req1, ack1,328: seq2, wait2, addr2, data2, req2, ack2,329: val_seq, val_req, val_ack2,330: read_addr, read_req, read_ack2, match2, read_en2, tri_out,331: halt_req, halt_a2, roll_req, roll_a2, reset);332:333: memdwb_buf buf3 (seq2, wait2, addr2, data2, req2, ack2,334: seq3, wait3, addr3, data3, req3, ack3,335: val_seq, val_req, val_ack3,336: read_addr, read_req, read_ack3, match3, read_en3, tri_out,337: halt_req, halt_a3, roll_req, roll_a3, reset);338:339: memdwb_buf buf4 (seq3, wait3, addr3, data3, req3, ack3,340: not_used, out_wait, out_addr, out_data, out_req, out_ack,341: val_seq, val_req, val_ack4,342: read_addr, read_req, read_ack4, match4, read_en4, tri_out,343: halt_req, halt_a4, roll_req, roll_a4, reset);344:345:346: always wait (reset)347: begin348: disable rollback_cycle;349: disable read_cycle;350: val_ack = 0;351: read_ack = 0;352: halt_ack = 0;353: roll_ack = 0;354: read_ready = 0;355: read_en1 = 0;
178
memdwb.v
356: read_en2 = 0;357: read_en3 = 0;358: read_en4 = 0;359: haltr_ack = 0;360: rollr_ack = 0;361: wait (˜reset);362: end363:364: always @(val_ack1 or val_ack2 or val_ack3 or val_ack4)365: if (val_ack1 & val_ack2 & val_ack3 & val_ack4)366: val_ack = 1;367: else if (˜val_ack1 & ˜val_ack2 & ˜val_ack3 & ˜val_ack4)368: val_ack = 0;369:370: always @(read_ack1 or read_ack2 or read_ack3 or read_ack4)371: if (read_ack1 & read_ack2 & read_ack3 & read_ack4)372: read_ready = 1;373: else if (˜read_ack1 & ˜read_ack2 & ˜read_ack3 & ˜read_ack4)374: read_ready = 0;375:376: always @(halt_a1 or halt_a2 or halt_a3 or halt_a4 or haltr_ack)377: if (halt_a1 & halt_a2 & halt_a3 & halt_a4 & haltr_ack)378: halt_ack = 1;379: else if (˜halt_a1 & ˜halt_a2 & ˜halt_a3 & ˜halt_a4 & ˜haltr_ack)380: halt_ack = 0;381:382: always @(roll_a1 or roll_a2 or roll_a3 or roll_a4 or rollr_ack)383: if (roll_a1 & roll_a2 & roll_a3 & roll_a4 & rollr_ack)384: roll_ack = 1;385: else if (˜roll_a1 & ˜roll_a2 & ˜roll_a3 & ˜roll_a4 & ˜rollr_ack)386: roll_ack = 0;387:388: always wait (halt_req & ˜reset)389: begin :rollback_cycle390: #1;391: haltr_ack = 1;392: wait (roll_req);393: #1;394: disable read_cycle;395: read_ready = 0;396: read_ack = 0;397: read_en1 = 0;398: read_en2 = 0;399: read_en3 = 0;400: read_en4 = 0;401:402: rollr_ack = 1;403: fork404: begin405: wait (˜roll_req);406: #1;407: rollr_ack = 0;408: end409: begin410: wait (˜halt_req);411: #1;412: haltr_ack = 0;413: end414: join415: end416:417: always wait (read_ready & ˜halt_req & ˜reset)418: begin :read_cycle419: #1;420: casex ({match1, match2, match3, match4}) // priority decoder421: 4’b1???: read_en1 = 1;422: 4’b01??: read_en2 = 1;423: 4’b001?: read_en3 = 1;424: 4’b0001: read_en4 = 1;425: endcase426: #1;
179
memdwb.v
427: read_ack = 1;428: wait (˜read_ready);429: #1;430: read_en1 = 0;431: read_en2 = 0;432: read_en3 = 0;433: read_en4 = 0;434: read_ack = 0;435: end436:437: endmodule // memdwbq438:439: // ====================================================================440:441: module memdwb_buf (in_seq, in_wait, in_addr, in_data, in_req, in_ack,442: out_seq, out_wait, out_addr, out_data, out_req, out_ack,443: val_seq, val_req, val_ack,444: read_addr, read_req, read_ack, match, read_en, tri_out,445: halt_req, halt_ack, roll_req, roll_ack, reset);446:447: ‘include "parameter"448:449: input [SEQ_WIDTH-1:0] in_seq, val_seq;450: input [ADDR_WIDTH:0] in_addr, read_addr;451: input [DATA_WIDTH:0] in_data;452: output [SEQ_WIDTH-1:0] out_seq;453: output [ADDR_WIDTH:0] out_addr;454: output [DATA_WIDTH:0] out_data, tri_out;455: input in_wait, read_en, reset;456: output out_wait, match;457: input in_req, out_ack, val_req, read_req, halt_req, roll_req;458: output in_ack, out_req, val_ack, read_ack, halt_ack, roll_ack;459:460: reg [SEQ_WIDTH-1:0] out_seq;461: reg [ADDR_WIDTH:0] out_addr;462: reg [DATA_WIDTH:0] out_data;463: reg out_wait, match;464: reg in_ack, out_req, val_ack, read_ack, halt_ack, roll_ack;465:466: reg [SEQ_WIDTH-1:0] diff;467: reg valid;468:469: assign tri_out = read_en ? out_data : ZZZ;470:471: always wait (reset)472: begin473: disable validate_cycle;474: disable rollback_cycle;475: disable find_cycle;476: disable input_cycle;477: disable output_cycle;478: out_seq = XXX;479: out_addr = XXX;480: out_data = XXX;481: out_wait = XXX;482: match = 0;483: in_ack = 0;484: out_req = 0;485: val_ack = 0;486: read_ack = 0;487: halt_ack = 0;488: roll_ack = 0;489: valid = 0;490: wait (˜reset);491: end492:493: always wait (val_req & ˜reset) // not same time as rollback494: begin :validate_cycle // clear "wait" bit495: #1;496: val_ack = 1;497: wait (˜val_req);
180
memdwb.v
498: #1;499: if (out_seq == val_seq)500: out_wait = 0;501: #1;502: val_ack = 0;503: end504:505: always wait (halt_req & ˜reset)506: begin :rollback_cycle507: #1;508: halt_ack = 1;509: wait (roll_req);510: #1;511: disable find_cycle;512: disable input_cycle;513: disable output_cycle;514: match = 0;515: in_ack = 0;516: out_req = 0;517: read_ack = 0;518:519: #DLY_SEQ_COMP;520: diff = out_seq - val_seq; // compare sequence numbers521: if (˜diff [SEQ_WIDTH-1])522: begin523: valid = 0;524: out_seq = XXX;525: out_addr = XXX;526: out_data = XXX;527: out_wait = XXX;528: end529:530: roll_ack = 1;531: fork532: begin533: wait (˜roll_req);534: #1;535: roll_ack = 0;536: end537: begin538: wait (˜halt_req);539: #1;540: halt_ack = 0;541: end542: join543: end544:545: always wait (read_req & ˜halt_req & ˜reset) // find uncommitted data546: begin :find_cycle547: #1;548: match = valid & (out_addr == read_addr);549: #1;550: read_ack = 1;551: wait (˜read_req);552: #1;553: match = 0;554: read_ack = 0;555: end556:557: always wait (in_req & ˜valid & ˜val_req & ˜read_req & ˜halt_req & ˜reset)558: begin :input_cycle559: #1;560: wait (˜val_req & ˜val_ack & ˜read_req & ˜halt_req);561: out_seq = in_seq;562: out_wait = in_wait;563: out_addr = in_addr;564: out_data = in_data;565: in_ack = 1;566: valid = 1;567: wait (˜in_req);568: #1;
181
memdwb.v
569: wait (˜val_req & ˜val_ack & ˜read_req & ˜halt_req);570: in_ack = 0;571: end572:573: always wait (valid & ˜val_req & ˜read_req & ˜halt_req & ˜reset)574: begin :output_cycle575: #1;576: wait (˜val_req & ˜val_ack & ˜read_req & ˜halt_req);577: out_req = 1;578: wait (out_ack);579: #1;580: valid = 0;581: wait (˜val_req & ˜val_ack & ˜read_req & ˜halt_req);582: out_req = 0;583: wait (˜out_ack);584: end585:586: endmodule // memdwb_buf
182
pc.v
1: // Program Counter2:3: module pc (pc_load, in_retry, load_req, load_ack,4: pc_out, out_retry, out_req, out_ack, reset);5:6: ‘include "parameter"7:8: input [ADDR_WIDTH:0] pc_load;9: output [ADDR_WIDTH:0] pc_out;10: input in_retry, load_req, out_ack, reset;11: output out_retry, load_ack, out_req;12:13: reg [ADDR_WIDTH:0] pc_out;14: reg out_retry, load_ack, out_req;15:16: reg [5:0] loop; // ADDR_WIDTH = 32 bits max17: reg parity;18:19: always wait (reset)20: begin21: disable increment_cycle;22: disable load_cycle;23: out_retry = 0;24: load_ack = 0;25: out_req = 0;26: pc_out = 0;27: wait (˜reset);28: out_req = 1;29: end30:31: always wait (load_req & ˜reset)32: begin :load_cycle33: #1;34: disable increment_cycle;35: out_req = 0;36: pc_out = {pc_load[ADDR_WIDTH:ADDR_IGNORE], {ADDR_IGNORE{1’b0}}};37: out_retry = in_retry; // rollback flag38: load_ack = 1;39: wait (˜load_req);40: #1;41: load_ack = 0;42: out_req = 1;43: end44:45: always wait (out_ack & ˜load_req & ˜reset)46: begin :increment_cycle47: #1;48: out_req = 0;49: #DLY_PC_INC;50: pc_out = pc_out + ADDR_INC;51: out_retry = 0; // reset rollback flag52:53: parity = 0;54: for (loop=0; loop<ADDR_WIDTH; loop=loop+1) // calculate parity55: parity = parity ˆ pc_out [loop];56: pc_out [ADDR_WIDTH] = parity;57:58: wait (˜out_ack);59: #1;60: out_req = 1;61: end62:63: endmodule // pc
183
regdwb.v
1: // Delayed Write Buffer for Register File: Note initial wait bit = 12:3: module regdwb (in_seq, in_reg, in_data, in_req, in_ack,4: out_seq, out_reg, out_data, chk_req, chk_ack, clr_req, clr_ack,5: val_seq, val_req, val_ack,6: rd_reg1, rd_reg2, sim_f, tri_data1, tri_data2, rd_req, rd_ack,7: rf_data1, rf_data2, rdf_req, rdf_ack,8: wrf_reg, wrf_data, wrf_req, wrf_ack,9: halt_req, halt_ack, roll_req, roll_ack, reset);10:11: ‘include "parameter"12:13: input [SEQ_WIDTH-1:0] in_seq, val_seq;14: input [REG_WIDTH-1:0] in_reg, rd_reg1, rd_reg2;15: input [DATA_WIDTH:0] in_data, rf_data1, rf_data2;16: output [SEQ_WIDTH-1:0] out_seq;17: output [REG_WIDTH-1:0] out_reg, wrf_reg;18: output [DATA_WIDTH:0] out_data, tri_data1, tri_data2, wrf_data;19: input sim_f, reset;20: input in_req, chk_ack, clr_ack, val_req, rd_req, rdf_ack, wrf_ack,21: halt_req, roll_req;22: output in_ack, chk_req, clr_req, val_ack, rd_ack, rdf_req, wrf_req,23: halt_ack, roll_ack;24:25: reg [SEQ_WIDTH-1:0] out_seq;26: reg [REG_WIDTH-1:0] out_reg, wrf_reg;27: reg [DATA_WIDTH:0] out_data, wrf_data;28: reg in_ack, chk_req, clr_req, rd_ack, wrf_req, halt_ack, roll_ack;29:30: reg [DATA_WIDTH:0] out_data1, out_data2;31: reg [SEQ_WIDTH-1:0] diff;32: reg go_check, go_clear, inq_req, com_ack, haltc_ack, rollc_ack;33:34: tri [DATA_WIDTH:0] tri_out1, tri_out2;35: wire [REG_WIDTH-1:0] com_reg;36: wire [DATA_WIDTH:0] com_data;37: wire rdf_req = rd_req; // route read request38:39: regdwbq regdwbq (in_seq, 1’b1, in_reg, in_data, inq_req, inq_ack,40: com_wait, com_reg, com_data, com_req, com_ack,41: val_seq, val_req, val_ack,42: rd_reg1, rd_reg2, rd_req, rdq_ack,43: match1, match2, tri_out1, tri_out2,44: halt_req, haltq_ack, roll_req, rollq_ack, reset);45:46: assign tri_data1 = rd_req ? out_data1 : ZZZ;47: assign tri_data2 = rd_req ? out_data2 : ZZZ;48:49: always wait (reset)50: begin51: disable rollback_cycle;52: disable input_cycle;53: disable check_cycle;54: disable clear_cycle;55: disable commit_cycle;56: disable read_cycle;57: out_seq = XXX;58: out_reg = XXX;59: wrf_reg = XXX;60: out_data = XXX;61: wrf_data = XXX;62: in_ack = 0;63: chk_req = 0;64: clr_req = 0;65: rd_ack = 0;66: wrf_req = 0;67: halt_ack = 0;68: roll_ack = 0;69: out_data1 = XXX;70: out_data2 = XXX;71: go_check = 0;
184
regdwb.v
72: go_clear = 0;73: inq_req = 0;74: com_ack = 0;75: haltc_ack = 0;76: rollc_ack = 0;77: wait (˜reset);78: end79:80: always @(haltq_ack or haltc_ack)81: if (haltq_ack & haltc_ack)82: halt_ack = 1;83: else if (˜haltq_ack & ˜haltc_ack)84: halt_ack = 0;85:86: always @(rollq_ack or rollc_ack)87: if (rollq_ack & rollc_ack)88: roll_ack = 1;89: else if (˜rollq_ack & ˜rollc_ack)90: roll_ack = 0;91:92: always wait (halt_req & ˜reset)93: begin :rollback_cycle94: #1;95: haltc_ack = 1;96: wait (roll_req);97: #1;98: disable input_cycle;99: disable check_cycle;100: disable clear_cycle;101: disable commit_cycle;102: disable read_cycle;103: wrf_reg = XXX;104: wrf_data = XXX;105: in_ack = 0;106: chk_req = 0;107: clr_req = 0;108: rd_ack = 0;109: wrf_req = 0;110: out_data1 = XXX;111: out_data2 = XXX;112: inq_req = 0;113: com_ack = 0;114:115: #DLY_SEQ_COMP;116: diff = out_seq - val_seq; // compare sequence numbers117: if (˜diff [SEQ_WIDTH-1])118: begin119: go_check = 0;120: go_clear = 0;121: out_seq = XXX;122: out_reg = XXX;123: out_data = XXX;124: end125:126: rollc_ack = 1;127: fork128: begin129: wait (˜roll_req);130: #1;131: rollc_ack = 0;132: end133: begin134: wait (˜halt_req);135: #1;136: haltc_ack = 0;137: end138: join139: end140:141: always wait (in_req & ˜go_check & ˜go_clear & ˜halt_req & ˜reset)142: begin :input_cycle
185
regdwb.v
143: #1;144: wait (˜halt_req);145: out_seq = in_seq;146: out_reg = in_reg;147: out_data = in_data;148: inq_req = 1;149: wait (inq_ack); // wait until actually accepted150: #1;151: in_ack = 1;152: go_check = 1;153: go_clear = 1;154: wait (˜in_req);155: #1;156: wait (˜halt_req);157: inq_req = 0;158: wait (˜inq_ack);159: #1;160: in_ack = 0;161: end162:163: always wait (go_check & ˜halt_req & ˜reset)164: begin :check_cycle165: #1;166: wait (˜halt_req);167: chk_req = 1;168: wait (chk_ack);169: #1;170: go_check = 0;171: wait (˜halt_req);172: chk_req = 0;173: wait (˜chk_ack);174: end175:176: always wait (go_clear & ˜halt_req & ˜reset)177: begin :clear_cycle178: #1;179: wait (˜halt_req);180: clr_req = 1;181: wait (clr_ack);182: #1;183: go_clear = 0;184: wait (˜halt_req);185: clr_req = 0;186: wait (˜clr_ack);187: end188:189: always wait (com_req & ˜halt_req & ˜reset)190: begin :commit_cycle191: wait (˜com_wait); // wait until validated192: #1;193: wrf_reg = com_reg;194: wrf_data = com_data;195: #1;196: wait (˜halt_req);197: wrf_req = 1; // write to register file198: wait (wrf_ack);199: #1;200: wait (˜rd_req & ˜halt_req);201: com_ack = 1; // delete from queue202: wrf_req = 0;203: wait (˜com_req);204: #1;205: wait (˜rd_req & ˜halt_req);206: com_ack = 0;207: wait (˜wrf_ack);208: end209:210: always wait (rdq_ack & rdf_ack & ˜halt_req & ˜reset)211: begin :read_cycle212: #1;213: out_data1 = match1 ? tri_out1 : rf_data1;
186
regdwb.v
214: out_data2 = match2 ? tri_out2 : rf_data2;215: if (sim_f) // simulate fault216: begin217: out_data1 [DATA_WIDTH] = ˜out_data1 [DATA_WIDTH];218: out_data2 [DATA_WIDTH] = ˜out_data2 [DATA_WIDTH];219: end220: #1;221: rd_ack = 1;222: wait (˜rdq_ack & ˜rdf_ack);223: #1;224: rd_ack = 0;225: end226:227: endmodule // regdwb228:229: // ====================================================================230: // REG_DWB Queue: Note that sequence number is not used in the output.231:232: module regdwbq (in_seq, in_wait, in_reg, in_data, in_req, in_ack,233: out_wait, out_reg, out_data, out_req, out_ack,234: val_seq, val_req, val_ack,235: read_reg1, read_reg2, read_req, read_ack,236: match1, match2, tri_out1, tri_out2,237: halt_req, halt_ack, roll_req, roll_ack, reset);238:239: ‘include "parameter"240:241: input [SEQ_WIDTH-1:0] in_seq, val_seq;242: input [REG_WIDTH-1:0] in_reg, read_reg1, read_reg2;243: input [DATA_WIDTH:0] in_data;244: output [REG_WIDTH-1:0] out_reg;245: output [DATA_WIDTH:0] out_data, tri_out1, tri_out2;246: input in_wait, reset;247: output out_wait, match1, match2;248: input in_req, out_ack, val_req, read_req, halt_req, roll_req;249: output in_ack, out_req, val_ack, read_ack, halt_ack, roll_ack;250:251: reg val_ack, read_ack, halt_ack, roll_ack;252:253: reg read_ready, read_en11, read_en21, read_en31, read_en41, read_en12,254: read_en22, read_en32, read_en42, haltr_ack, rollr_ack;255:256: wire [SEQ_WIDTH-1:0] seq1, seq2, seq3, not_used;257: wire [REG_WIDTH-1:0] reg1, reg2, reg3;258: wire [DATA_WIDTH:0] data1, data2, data3;259: wire match11, match21, match31, match41, match12, match22, match32, match42;260:261: assign match1 = match11 | match21 | match31 | match41;262: assign match2 = match12 | match22 | match32 | match42;263:264:265: regdwb_buf buf1 (in_seq, in_wait, in_reg, in_data, in_req, in_ack,266: seq1, wait1, reg1, data1, req1, ack1,267: val_seq, val_req, val_ack1,268: read_reg1, read_reg2, read_req, read_ack1,269: match11, match12, read_en11, read_en12, tri_out1, tri_out2,270: halt_req, halt_a1, roll_req, roll_a1, reset);271:272: regdwb_buf buf2 (seq1, wait1, reg1, data1, req1, ack1,273: seq2, wait2, reg2, data2, req2, ack2,274: val_seq, val_req, val_ack2,275: read_reg1, read_reg2, read_req, read_ack2,276: match21, match22, read_en21, read_en22, tri_out1, tri_out2,277: halt_req, halt_a2, roll_req, roll_a2, reset);278:279: regdwb_buf buf3 (seq2, wait2, reg2, data2, req2, ack2,280: seq3, wait3, reg3, data3, req3, ack3,281: val_seq, val_req, val_ack3,282: read_reg1, read_reg2, read_req, read_ack3,283: match31, match32, read_en31, read_en32, tri_out1, tri_out2,284: halt_req, halt_a3, roll_req, roll_a3, reset);
187
regdwb.v
285:286: regdwb_buf buf4 (seq3, wait3, reg3, data3, req3, ack3,287: not_used, out_wait, out_reg, out_data, out_req, out_ack,288: val_seq, val_req, val_ack4,289: read_reg1, read_reg2, read_req, read_ack4,290: match41, match42, read_en41, read_en42, tri_out1, tri_out2,291: halt_req, halt_a4, roll_req, roll_a4, reset);292:293:294: always wait (reset)295: begin296: disable read_cycle;297: val_ack = 0;298: read_ack = 0;299: read_ready = 0;300: halt_ack = 0;301: roll_ack = 0;302: read_en11 = 0;303: read_en21 = 0;304: read_en31 = 0;305: read_en41 = 0;306: read_en12 = 0;307: read_en22 = 0;308: read_en32 = 0;309: read_en42 = 0;310: haltr_ack = 0;311: rollr_ack = 0;312: wait (˜reset);313: end314:315: always @(val_ack1 or val_ack2 or val_ack3 or val_ack4)316: if (val_ack1 & val_ack2 & val_ack3 & val_ack4)317: val_ack = 1;318: else if (˜val_ack1 & ˜val_ack2 & ˜val_ack3 & ˜val_ack4)319: val_ack = 0;320:321: always @(read_ack1 or read_ack2 or read_ack3 or read_ack4)322: if (read_ack1 & read_ack2 & read_ack3 & read_ack4)323: read_ready = 1;324: else if (˜read_ack1 & ˜read_ack2 & ˜read_ack3 & ˜read_ack4)325: read_ready = 0;326:327: always @(halt_a1 or halt_a2 or halt_a3 or halt_a4 or haltr_ack)328: if (halt_a1 & halt_a2 & halt_a3 & halt_a4 & haltr_ack)329: halt_ack = 1;330: else if (˜halt_a1 & ˜halt_a2 & ˜halt_a3 & ˜halt_a4 & ˜haltr_ack)331: halt_ack = 0;332:333: always @(roll_a1 or roll_a2 or roll_a3 or roll_a4 or rollr_ack)334: if (roll_a1 & roll_a2 & roll_a3 & roll_a4 & rollr_ack)335: roll_ack = 1;336: else if (˜roll_a1 & ˜roll_a2 & ˜roll_a3 & ˜roll_a4 & ˜rollr_ack)337: roll_ack = 0;338:339: always wait (halt_req & ˜reset)340: begin :rollback_cycle341: #1;342: haltr_ack = 1;343: wait (roll_req);344: #1;345: disable read_cycle;346: read_ready = 0;347: read_ack = 0;348: read_en11 = 0;349: read_en21 = 0;350: read_en31 = 0;351: read_en41 = 0;352: read_en12 = 0;353: read_en22 = 0;354: read_en32 = 0;355: read_en42 = 0;
188
regdwb.v
356:357: rollr_ack = 1;358: fork359: begin360: wait (˜roll_req);361: #1;362: rollr_ack = 0;363: end364: begin365: wait (˜halt_req);366: #1;367: haltr_ack = 0;368: end369: join370: end371:372: always wait (read_ready & ˜halt_req & ˜reset)373: begin :read_cycle374: #1;375: casex ({match11, match21, match31, match41}) // priority decoder376: 4’b1???: read_en11 = 1;377: 4’b01??: read_en21 = 1;378: 4’b001?: read_en31 = 1;379: 4’b0001: read_en41 = 1;380: endcase381: casex ({match12, match22, match32, match42})382: 4’b1???: read_en12 = 1;383: 4’b01??: read_en22 = 1;384: 4’b001?: read_en32 = 1;385: 4’b0001: read_en42 = 1;386: endcase387: #1;388: read_ack = 1;389: wait (˜read_ready);390: #1;391: read_en11 = 0;392: read_en21 = 0;393: read_en31 = 0;394: read_en41 = 0;395: read_en12 = 0;396: read_en22 = 0;397: read_en32 = 0;398: read_en42 = 0;399: read_ack = 0;400: end401:402: endmodule // regdwbq403:404: // ====================================================================405:406: module regdwb_buf (in_seq, in_wait, in_reg, in_data, in_req, in_ack,407: out_seq, out_wait, out_reg, out_data, out_req, out_ack,408: val_seq, val_req, val_ack,409: read_reg1, read_reg2, read_req, read_ack,410: match1, match2, read_en1, read_en2, tri_out1, tri_out2,411: halt_req, halt_ack, roll_req, roll_ack, reset);412:413: ‘include "parameter"414:415: input [SEQ_WIDTH-1:0] in_seq, val_seq;416: input [REG_WIDTH-1:0] in_reg, read_reg1, read_reg2;417: input [DATA_WIDTH:0] in_data;418: output [SEQ_WIDTH-1:0] out_seq;419: output [REG_WIDTH-1:0] out_reg;420: output [DATA_WIDTH:0] out_data, tri_out1, tri_out2;421: input in_wait, read_en1, read_en2, reset;422: output out_wait, match1, match2;423: input in_req, out_ack, val_req, read_req, halt_req, roll_req;424: output in_ack, out_req, val_ack, read_ack, halt_ack, roll_ack;425:426: reg [SEQ_WIDTH-1:0] out_seq;
189
regdwb.v
427: reg [REG_WIDTH-1:0] out_reg;428: reg [DATA_WIDTH:0] out_data;429: reg out_wait, match1, match2;430: reg in_ack, out_req, val_ack, read_ack, halt_ack, roll_ack;431:432: reg [SEQ_WIDTH-1:0] diff;433: reg valid;434:435: assign tri_out1 = read_en1 ? out_data : ZZZ;436: assign tri_out2 = read_en2 ? out_data : ZZZ;437:438: always wait (reset)439: begin440: disable validate_cycle;441: disable rollback_cycle;442: disable find_cycle;443: disable input_cycle;444: disable output_cycle;445: out_seq = XXX;446: out_reg = XXX;447: out_data = XXX;448: out_wait = XXX;449: match1 = 0;450: match2 = 0;451: in_ack = 0;452: out_req = 0;453: val_ack = 0;454: read_ack = 0;455: halt_ack = 0;456: roll_ack = 0;457: valid = 0;458: wait (˜reset);459: end460:461: always wait (val_req & ˜reset) // not same time as rollback462: begin :validate_cycle // clear "wait" bit463: #1;464: val_ack = 1;465: wait (˜val_req);466: #1;467: if (out_seq == val_seq)468: out_wait = 0;469: #1;470: val_ack = 0;471: end472:473: always wait (halt_req & ˜reset)474: begin :rollback_cycle475: #1;476: halt_ack = 1;477: wait (roll_req);478: #1;479: disable find_cycle;480: disable input_cycle;481: disable output_cycle;482: match1 = 0;483: match2 = 0;484: in_ack = 0;485: out_req = 0;486: read_ack = 0;487:488: #DLY_SEQ_COMP;489: diff = out_seq - val_seq; // compare sequence numbers490: if (˜diff [SEQ_WIDTH-1])491: begin492: valid = 0;493: out_seq = XXX;494: out_reg = XXX;495: out_data = XXX;496: out_wait = XXX;497: end
190
regdwb.v
498:499: roll_ack = 1;500: fork501: begin502: wait (˜roll_req);503: #1;504: roll_ack = 0;505: end506: begin507: wait (˜halt_req);508: #1;509: halt_ack = 0;510: end511: join512: end513:514: always wait (read_req & ˜halt_req & ˜reset) // find uncommitted data515: begin :find_cycle516: #1; // do NOT match R0 !!!517: match1 = valid & (out_reg == read_reg1) & (read_reg1 != 0);518: match2 = valid & (out_reg == read_reg2) & (read_reg2 != 0);519: #1;520: read_ack = 1;521: wait (˜read_req);522: #1;523: match1 = 0;524: match2 = 0;525: read_ack = 0;526: end527:528: always wait (in_req & ˜valid & ˜val_req & ˜read_req & ˜halt_req & ˜reset)529: begin :input_cycle530: #1;531: wait (˜val_req & ˜val_ack & ˜read_req & ˜halt_req);532: out_seq = in_seq;533: out_wait = in_wait;534: out_reg = in_reg;535: out_data = in_data;536: in_ack = 1;537: valid = 1;538: wait (˜in_req);539: #1;540: wait (˜val_req & ˜val_ack & ˜read_req & ˜halt_req);541: in_ack = 0;542: end543:544: always wait (valid & ˜val_req & ˜read_req & ˜halt_req & ˜reset)545: begin :output_cycle546: #1;547: wait (˜val_req & ˜val_ack & ˜read_req & ˜halt_req);548: out_req = 1;549: wait (out_ack);550: #1;551: valid = 0;552: wait (˜val_req & ˜val_ack & ˜read_req & ˜halt_req);553: out_req = 0;554: wait (˜out_ack);555: end556:557: endmodule // regdwb_buf
191
regfile.v
1: // Register File (3-port: 1 write, 2 read)2:3: module regfile (wr_reg, wr_data, wr_req, wr_ack,4: rd_reg1, rd_reg2, rd_data1, rd_data2, rd_req, rd_ack,5: halt_req, halt_ack, roll_req, roll_ack, reset);6:7: ‘include "parameter"8:9: input [REG_WIDTH-1:0] wr_reg, rd_reg1, rd_reg2;10: input [DATA_WIDTH:0] wr_data;11: output [DATA_WIDTH:0] rd_data1, rd_data2;12: input wr_req, rd_req, halt_req, roll_req, reset;13: output wr_ack, rd_ack, halt_ack, roll_ack;14:15: reg [DATA_WIDTH:0] rd_data1, rd_data2, rfile [0:REG_SIZE-1];16: reg wr_ack, rd_ack, halt_ack, roll_ack;17: reg [REG_WIDTH:0] loop; // one extra bit for loop termination18:19: always wait (reset)20: begin21: disable rollback_cycle;22: disable write_cycle;23: disable read_cycle;24: rd_data1 = XXX;25: rd_data2 = XXX;26: wr_ack = 0;27: rd_ack = 0;28: halt_ack = 0;29: roll_ack = 0;30: for (loop = 0; loop < REG_SIZE; loop = loop + 1)31: rfile [loop] = XXX;32: wait (˜reset);33: end34:35: always wait (halt_req & ˜reset)36: begin :rollback_cycle37: #1;38: halt_ack = 1;39: wait (roll_req);40: #1;41: disable write_cycle;42: disable read_cycle;43: rd_data1 = XXX;44: rd_data2 = XXX;45: wr_ack = 0;46: rd_ack = 0;47:48: roll_ack = 1;49: fork50: begin51: wait (˜roll_req);52: #1;53: roll_ack = 0;54: end55: begin56: wait (˜halt_req);57: #1;58: halt_ack = 0;59: end60: join61: end62:63: always wait (wr_req & ˜halt_req & ˜reset)64: begin :write_cycle65: #DLY_RF_WR;66: wait (˜halt_req);67: rfile [wr_reg] = wr_data;68: wr_ack = 1;69: wait (˜wr_req);70: #1;71: wait (˜halt_req);
192
regfile.v
72: wr_ack = 0;73: end74:75: always wait (rd_req & ˜halt_req & ˜reset)76: begin :read_cycle77: #DLY_RF_RD;78: rd_data1 = rd_reg1 ? rfile [rd_reg1] : 0;79: rd_data2 = rd_reg2 ? rfile [rd_reg2] : 0;80: #1;81: wait (˜halt_req);82: rd_ack = 1;83: wait (˜rd_req);84: #1;85: wait (˜halt_req);86: rd_ack = 0;87: end88:89: endmodule // regfile
193
restable.v
1: // Reservation Table (for registers)2:3: module restable (seq_num, reg_w, reg_r1, reg_r2, res_req, res_ack,4: reg_clr, clr_req, clr_ack,5: val_seq, halt_req, halt_ack, roll_req, roll_ack, reset);6:7: ‘include "parameter"8:9: input [SEQ_WIDTH-1:0] seq_num, val_seq;10: input [REG_WIDTH-1:0] reg_w, reg_r1, reg_r2, reg_clr;11: input res_req, clr_req, halt_req, roll_req, reset;12: output res_ack, clr_ack, halt_ack, roll_ack;13:14: reg [SEQ_WIDTH-1:0] seq_table [0:REG_SIZE-1];15: reg res_table [0:REG_SIZE-1];16: reg res_ack, clr_ack, halt_ack, roll_ack;17:18: reg [SEQ_WIDTH-1:0] diff;19: reg [REG_WIDTH:0] loop; // one extra bit for loop termination20:21: always wait (reset)22: begin23: disable rollback_cycle;24: disable reserve_cycle;25: disable clear_cycle;26: res_ack = 0;27: clr_ack = 0;28: halt_ack = 0;29: roll_ack = 0;30: for (loop = 0; loop < REG_SIZE; loop = loop + 1)31: res_table [loop] = 0;32: wait (˜reset);33: end34:35: always wait (halt_req & ˜reset)36: begin :rollback_cycle37: #1;38: halt_ack = 1;39: wait (roll_req);40: #1;41: disable reserve_cycle;42: disable clear_cycle;43: res_ack = 0;44: clr_ack = 0;45:46: #DLY_SEQ_COMP;47: for (loop = 1; loop < REG_SIZE; loop = loop + 1)48: begin49: diff = seq_table [loop] - val_seq;50: if (˜diff[SEQ_WIDTH-1])51: res_table [loop] = 0;52: end53:54: roll_ack = 1;55: fork56: begin57: wait (˜roll_req);58: #1;59: roll_ack = 0;60: end61: begin62: wait (˜halt_req);63: #1;64: halt_ack = 0;65: end66: join67: end68:69: always wait (res_req & ˜halt_req & ˜reset)70: begin :reserve_cycle71: #DLY_RT_RES;
194
restable.v
72: wait (˜res_table [reg_w] & ˜res_table [reg_r1] &73: ˜res_table [reg_r2]);74: #1;75: wait (˜halt_req);76: res_table [reg_w] = 1 & (reg_w != 0); // R0 always available77: seq_table [reg_w] = seq_num;78: res_ack = 1;79: wait (˜res_req);80: #1;81: wait (˜halt_req);82: res_ack = 0;83: end84:85: always wait (clr_req & ˜halt_req & ˜reset)86: begin :clear_cycle87: #DLY_RT_CLR;88: wait (˜halt_req);89: res_table [reg_clr] = 0;90: clr_ack = 1;91: wait (˜clr_req);92: #1;93: wait (˜halt_req);94: clr_ack = 0;95: end96:97: endmodule // restable
195
Appendix BAMPIRE Assembler
B.1. Assembly Code Format
The assembly code has the following format. Each part is optional and case-
insensitive.
[label:] [*][instruction] [;comment]
The instruction set is shown in Table 3.1. Each register field is specified as R0 to
R31. The immediate/offset field may be a decimal number, a hexdecimal number (with
’H’ postfix), or a label. A number may also have the ’#’ prefix as shown in [Henn90],
but that character is simply ignored by the assembler. By adding the ’*’ prefix to any
instruction, a bad parity bit is generated for fault simulation.
B.2. Assembler Source Code
The C source code starts on the next page.
196
asm.c
1: /* AMPIRE Assembler */2:3: /* INST_WIDTH must be less than or equal to 32 bits. Even parity is used. */4:5: #include <stdio.h>6: #include <string.h>7:8: #define OP_WIDTH 6 /* bits for opcode field */9: #define REG_WIDTH 5 /* bits for register number */10: #define EXTRA_WIDTH 11 /* bits for "extra" field */11: #define ADDR_WIDTH 8 /* bits for address (for comment only) */12: #define ADDR_INC 4 /* amount of address increment */13:14: #define IMM_WIDTH REG_WIDTH + EXTRA_WIDTH15: #define OFFSET_WIDTH 2 * REG_WIDTH + IMM_WIDTH16: #define INST_WIDTH OP_WIDTH + OFFSET_WIDTH17:18: #define MAX_OPS 50 /* max number of opcodes */19: #define MAX_LINE 100 /* max number of characters per line */20: #define MAX_LABELS 50 /* max number of labels */21: #define MAX_LEN 20 /* max length of labels */22: #define SPACE ’ ’23: #define EOL ’024:25: #define F_NONE 0 /* instruction format codes */26: #define F_ALU 127: #define F_ALUI 228: #define F_LHI 329: #define F_JREG 430: #define F_BRANCH 531: #define F_OFFSET 632: #define F_LOAD 733: #define F_STORE 834: #define F_DATA 935:36: #define T_ABS 0 /* absolute address type */37: #define T_REL 1 /* PC relative type */38:39: char *opname[MAX_OPS]; /* opcode name database */40: int opnum[MAX_OPS]; /* opcode number database */41: int extra[MAX_OPS]; /* extra field database */42: char format[MAX_OPS]; /* instruction format codes */43: int opcount; /* number of opcodes */44:45: char label[MAX_LABELS][MAX_LEN]; /* label name database */46: int addr[MAX_LABELS]; /* label address database */47: int label_count = 0; /* number of labels */48:49: int cur_addr; /* current address */50: int cur_line; /* current line in input file */51: char parity; /* used to create good/bad parity */52: FILE *infile, *outfile;53:54: /* ============================================================== */55:56: main (argc, argv)57: int argc;58: char *argv[];59: {60: extern char format[MAX_OPS];61: extern int label_count, cur_addr, cur_line;62: extern FILE *infile, *outfile;63:64: char org_line[MAX_LINE], work_line[MAX_LINE];65: int pointer, opindex, rd, rs, rt, imm, offset, data;66:67: if (argc != 2) {68: printf ("Usage: asm source_file (output: inst.hex)0);69: exit (1);70: }71:
197
asm.c
72: define_opcodes (); /* define opcode database */73:74: printf ("first pass ...0);75: infile = fopen (argv[1], "r"); /* open input file */76: if (infile == NULL) {77: printf ("*** source file error ***0);78: exit (1);79: }80:81: cur_line = 0;82: cur_addr = 0;83: while (get_line(org_line) != EOF) { /* 1st pass -- scan for labels */84: cur_line++;85: strcpy (work_line, org_line);86: filter_line (work_line);87: pointer = 0;88: opindex = get_opindex (work_line, &pointer);89:90: if (opindex == -2) {91: add_label (work_line); /* add to label database */92: opindex = get_opindex (work_line, &pointer);93: }94:95: if (opindex >= 0)96: cur_addr += ADDR_INC; /* real instruction */97: } /* while */98: fclose (infile);99:100: printf ("%d out of %d label database slots are used.0, label_count,101: MAX_LABELS);102: printf ("second pass ...0);103:104: infile = fopen (argv[1], "r"); /* open input file */105: if (infile == NULL) {106: printf ("*** source file error ***0);107: exit (1);108: }109:110: outfile = fopen ("inst.hex", "w"); /* open output file */111: if (outfile == NULL) {112: printf ("*** output file error ***0);113: exit (1);114: }115:116: cur_line = 0;117: cur_addr = 0;118: while (get_line(org_line) != EOF) { /* 2nd pass -- assemble codes */119: cur_line++;120: strcpy (work_line, org_line);121: filter_line (work_line);122: pointer = 0;123: opindex = get_opindex (work_line, &pointer);124:125: if (opindex == -2) /* skip labels */126: opindex = get_opindex (work_line, &pointer);127:128: if (opindex >= 0) { /* different instruction formats */129: rs = 0;130: rt = 0;131: rd = 0;132: imm = 0;133: offset = 0;134:135: switch (format[opindex]) {136: case F_NONE: {137: break;138: }139: case F_ALU: {140: rd = get_reg (work_line, &pointer);141: rs = get_reg (work_line, &pointer);142: rt = get_reg (work_line, &pointer);
198
asm.c
143: break;144: }145: case F_ALUI: {146: rt = get_reg (work_line, &pointer);147: rs = get_reg (work_line, &pointer);148: imm = get_num (work_line, &pointer, IMM_WIDTH,149: T_ABS);150: break;151: }152: case F_LHI: {153: rt = get_reg (work_line, &pointer);154: imm = get_num (work_line, &pointer, IMM_WIDTH,155: T_ABS);156: break;157: }158: case F_JREG: {159: rs = get_reg (work_line, &pointer);160: break;161: }162: case F_BRANCH: {163: rs = get_reg (work_line, &pointer);164: imm = get_num (work_line, &pointer, IMM_WIDTH,165: T_REL);166: break;167: }168: case F_OFFSET: {169: offset = get_num (work_line, &pointer, OFFSET_WIDTH,170: T_REL);171: break;172: }173: case F_LOAD: {174: rt = get_reg (work_line, &pointer);175: imm = get_num (work_line, &pointer, IMM_WIDTH,176: T_ABS);177: rs = get_reg (work_line, &pointer);178: break;179: }180: case F_STORE: {181: imm = get_num (work_line, &pointer, IMM_WIDTH,182: T_ABS);183: rs = get_reg (work_line, &pointer);184: rt = get_reg (work_line, &pointer);185: break;186: }187: case F_DATA: {188: data = get_num (work_line, &pointer, INST_WIDTH,189: T_ABS);190: break;191: }192: } /* switch */193:194: check_line (work_line, pointer);195: } /* if */196:197: print_line (org_line, opindex, rd, rs, rt, imm, offset, data);198: } /* while */199:200: fclose (outfile);201: fclose (infile);202: printf ("199d Words, or %d Bytes0, cur_addr/ADDR_INC, cur_addr);203: } /* main */204:205: /* ============================================================== */206:207: get_line (line) /* get one line from file, return length or EOF */208: char line[];209: {210: extern int cur_line;211: extern FILE *infile;212:213: int index, letter;
199
asm.c
214:215: for (index=0; (letter=getc(infile)) != EOF && letter != EOL; index++)216: line[index] = letter;217: line[index] = NULL;218:219: if (index >= MAX_LINE) {220: printf ("(%d) Line Too Long:200s0, cur_line, line);221: exit (1);222: }223:224: if (letter == EOF && index == 0)225: return (EOF);226: else227: return (index);228: } /* function get_line */229:230: /* ============================================================== */231:232: /* Convert unneeded characters to spaces, uppercase to lowercase. */233:234: filter_line (line)235: char line[];236: {237: int index = 0;238: char letter;239:240: while ((letter=line[index]) != NULL) {241: if (letter==’’ || letter==’,’ || letter==’(’ || letter==’)’ ||242: letter==’#’)243: line[index] = SPACE;244: if (letter >= ’A’ && letter <= ’Z’)245: line[index] = letter - ’A’ + ’a’;246: index++;247: }248: } /* function filter_line */249:250: /* ============================================================== */251:252: /* If valid, return opcode index with pointer after the opcode.253: * If comment or blank line, return -1. If label, return -2.254: */255:256: get_opindex (line, index)257: char line[];258: int *index;259: {260: extern char *opname[MAX_OPS];261: extern int opcount, cur_line;262: extern char parity;263:264: int wpos = 0, opindex;265: char word[MAX_LINE];266:267: while (line[*index] == SPACE) /* find first non-space */268: (*index)++;269:270: if (line[*index] == ’;’ || line[*index] == NULL) /* comment/blank */271: return (-1);272:273: parity = 0;274: if (line[*index] == ’*’) { /* create bad parity */275: parity = 1;276: (*index)++;277: }278:279: while (line[*index] != SPACE && line[*index] != NULL) /* get 1st word */280: word[wpos++] = line[(*index)++];281: word[wpos] = NULL;282:283: if (word[wpos-1] == ’:’) /* label */284: return (-2);
200
asm.c
285:286: for (opindex=0; opindex<opcount; opindex++) /* find matching opcode */287: if (strcmp(word, opname[opindex]) == 0)288: break;289:290: if (opindex == opcount) { /* no match */291: printf ("(%d) Invalid Opcode:201s0, cur_line, line);292: exit (1);293: }294:295: return (opindex);296: } /* function get_opindex */297:298: /* ============================================================== */299:300: add_label (line) /* add label to database */301: char line[];302: {303: extern char label[MAX_LABELS][MAX_LEN];304: extern int addr[MAX_LABELS];305: extern int label_count, cur_line;306:307: int wpos = 0, index, loop;308: char word[MAX_LINE];309:310: if (label_count >= MAX_LABELS) {311: printf ("Label Database Full, %d Entries0, label_count);312: exit (1);313: }314:315: index = 0;316: while (line[index] == SPACE) /* find first non-space */317: index++;318:319: while (line[index] != ’:’) /* get label */320: word[wpos++] = line[index++];321: word[wpos] = NULL;322:323: if (wpos >= MAX_LEN) {324: printf ("(%d) Label Too Long:201s0, cur_line, line);325: exit (1);326: }327:328: for (loop=0; loop<label_count; loop++) /* see if already defined */329: if (strcmp(word, label[loop]) == 0) {330: printf ("(%d) Label Already Defined:201s0, cur_line, line);331: exit (1);332: }333:334: strcpy (label[label_count], word); /* add it */335: addr[label_count++] = cur_addr;336: } /* function add_label */337:338: /* ============================================================== */339:340: get_reg (line, index) /* return register number */341: char line[];342: int *index;343: {344: extern int cur_line;345:346: int wpos = 0, reg_num;347: char word[MAX_LINE];348:349: while (line[*index] == SPACE) /* skip spaces */350: (*index)++;351:352: while (line[*index] != SPACE && line[*index] != NULL) /* get one word */353: word[wpos++] = line[(*index)++];354: word[wpos] = NULL;355:
201
asm.c
356: if (word[0] != ’r’) {357: printf ("(%d) Register must start with ’r’:202s0, cur_line, line);358: exit (1);359: }360:361: if (sscanf (&word[1], "%d", ®_num) < 1) {362: printf ("(%d) Invalid Register Format:202s0, cur_line, line);363: exit (1);364: }365:366: if (reg_num < 0 || reg_num > (power2(REG_WIDTH)-1)) {367: printf ("(%d) Register Number Out of Range:202s0, cur_line, line);368: exit (1);369: }370:371: return (reg_num);372: } /* function get_reg */373:374: /* ============================================================== */375:376: get_num (line, index, maxbits, type) /* return number */377: char line[];378: int *index, maxbits, type;379: {380: extern int cur_line;381:382: int wpos = 0, num;383: char word[MAX_LINE];384:385: while (line[*index] == SPACE) /* skip spaces */386: (*index)++;387:388: while (line[*index] != SPACE && line[*index] != NULL) /* get one word */389: word[wpos++] = line[(*index)++];390: word[wpos] = NULL;391:392: num = find_label (word); /* find address, -1 if not found */393: if (num >= 0) {394: if (type == T_REL)395: num = num - (cur_addr + ADDR_INC); /* PC relative */396: }397: else398: if (line[(*index)-1] == ’h’) {399: if (sscanf (word, "%x", &num) < 1) {400: printf ("(%d) Invalid Hexadecimal Format:202s0,401: cur_line, line);402: exit (1);403: }404: }405: else406: if (sscanf (word, "%d", &num) < 1) {407: printf ("(%d) Invalid Decimal Format:202s0,408: cur_line, line);409: exit (1);410: }411:412: if (num < -(power2(maxbits-1)) || num > (power2(maxbits-1)-1)) {413: printf ("(%d) %d-bit 2’s Complement Number Out of Range:202s0,414: cur_line, maxbits, line);415: exit (1);416: }417:418: if (num < 0) /* 2’s complement form */419: num = power2 (maxbits) + num;420: return (num);421: } /* function get_num */422:423: /* ============================================================== */424:425: find_label (word) /* return address, or -1 if not found */426: char word[];
202
asm.c
427: {428: extern char label[MAX_LABELS][MAX_LEN];429: extern int addr[MAX_LABELS];430: extern int label_count;431:432: int loop;433:434: for (loop=0; loop<label_count; loop++) /* search */435: if (strcmp(word, label[loop]) == 0)436: return (addr[loop]);437:438: return (-1); /* not found */439: } /* function find_label */440:441: /* ============================================================== */442:443: check_line (line, index) /* check the rest of the line */444: char line[];445: int index;446: {447: extern int cur_line;448:449: while (line[index] == SPACE) /* skip spaces */450: index++;451:452: if (line[index] != ’;’ && line[index] != NULL) {453: printf ("(%d) Too Many Operands/Comments without Semicolon:203s0,454: cur_line, line);455: exit (1);456: }457: } /* function check_line */458:459: /* ============================================================== */460:461: /* To generate correct even parity, ’parity’ should be 0 when this routine462: * is called. The bad parity flag is set in the get_opindex routine.463: */464:465: print_line (org_line, opindex, rd, rs, rt, imm, offset, data)466: char org_line[];467: int opindex, rd, rs, rt, imm, offset, data;468: {469: extern int opnum[MAX_OPS];470: extern int extra[MAX_OPS];471: extern char format[MAX_OPS];472: extern int cur_addr;473: extern char parity;474: extern FILE *outfile;475:476: unsigned int inst;477: int loop, inst_digits, addr_digits;478:479: inst_digits = (INST_WIDTH + 1) / 4; /* number of hex digits */480: if ((INST_WIDTH + 1) % 4 != 0)481: inst_digits++;482: addr_digits = ADDR_WIDTH / 4;483: if (ADDR_WIDTH % 4 != 0)484: addr_digits++;485:486: if (opindex >= 0) {487: if (format[opindex] == F_DATA)488: inst = data;489: else490: inst = opnum[opindex] * power2 (OFFSET_WIDTH) +491: rs * power2 (REG_WIDTH + IMM_WIDTH) +492: rt * power2 (IMM_WIDTH) +493: rd * power2 (EXTRA_WIDTH) +494: extra[opindex] + imm + offset;495:496: for (loop=0; loop<INST_WIDTH; loop++) /* find parity bit */497: if (inst & power2(loop))
203
asm.c
498: parity ˆ= 1; /* XOR */499:500: if (INST_WIDTH < 32) {501: inst += parity * power2 (INST_WIDTH);502: fprintf (outfile, "%.*X", inst_digits, inst);503: }504: else {505: fprintf (outfile, "%d", parity);506: fprintf (outfile, "%.*X", inst_digits-1, inst);507: }508:509: fprintf (outfile, " // ");510: fprintf (outfile, "%.*X", addr_digits, cur_addr);511: cur_addr += ADDR_INC;512: }513: else { /* no instruction */514: for (loop=0; loop<inst_digits; loop++)515: putc (SPACE, outfile);516: fprintf (outfile, " // ");517: for (loop=0; loop<addr_digits; loop++)518: putc (SPACE, outfile);519: }520:521: fprintf (outfile, "%s0, org_line);522: } /* function print_line */523:524: /* ============================================================== */525:526: power2 (exp) /* return power of 2 */527: int exp;528: {529: int result = 1, loop;530:531: for (loop=0; loop<exp; loop++)532: result = result * 2;533:534: return (result);535: } /* function power2 */536:537: /* ============================================================== */538:539: /* instruction format code:540: * NONE opcode541: * ALU opcode rd, rs, rt542: * ALUI opcode rt, rs, imm543: * LHI opcode rt, imm544: * JREG opcode rs545: * BRANCH opcode rs, imm546: * OFFSET opcode offset547: * LOAD opcode rt, imm(rs)548: * STORE opcode imm(rs), rt549: * DATA opcode data_word550: */551:552: define_opcodes ()553: {554: extern char *opname[MAX_OPS];555: extern int opnum[MAX_OPS];556: extern int extra[MAX_OPS];557: extern char format[MAX_OPS];558: extern int opcount;559:560: int x = 0;561:562: opname[x]="j"; opnum[x]=2; extra[x]=0; format[x++]=F_OFFSET;563: opname[x]="jal"; opnum[x]=3; extra[x]=0; format[x++]=F_OFFSET;564: opname[x]="beqz"; opnum[x]=4; extra[x]=0; format[x++]=F_BRANCH;565: opname[x]="bnez"; opnum[x]=5; extra[x]=0; format[x++]=F_BRANCH;566: opname[x]="addui"; opnum[x]=9; extra[x]=0; format[x++]=F_ALUI;567: opname[x]="subui"; opnum[x]=11; extra[x]=0; format[x++]=F_ALUI;568: opname[x]="andi"; opnum[x]=12; extra[x]=0; format[x++]=F_ALUI;
204
asm.c
569: opname[x]="ori"; opnum[x]=13; extra[x]=0; format[x++]=F_ALUI;570: opname[x]="xori"; opnum[x]=14; extra[x]=0; format[x++]=F_ALUI;571: opname[x]="lhi"; opnum[x]=15; extra[x]=0; format[x++]=F_LHI;572: opname[x]="trap"; opnum[x]=17; extra[x]=0; format[x++]=F_OFFSET;573: opname[x]="jr"; opnum[x]=18; extra[x]=0; format[x++]=F_JREG;574: opname[x]="jalr"; opnum[x]=19; extra[x]=0; format[x++]=F_JREG;575: opname[x]="slli"; opnum[x]=20; extra[x]=0; format[x++]=F_ALUI;576: opname[x]="srli"; opnum[x]=22; extra[x]=0; format[x++]=F_ALUI;577: opname[x]="srai"; opnum[x]=23; extra[x]=0; format[x++]=F_ALUI;578: opname[x]="seqi"; opnum[x]=24; extra[x]=0; format[x++]=F_ALUI;579: opname[x]="snei"; opnum[x]=25; extra[x]=0; format[x++]=F_ALUI;580: opname[x]="slti"; opnum[x]=26; extra[x]=0; format[x++]=F_ALUI;581: opname[x]="sgti"; opnum[x]=27; extra[x]=0; format[x++]=F_ALUI;582: opname[x]="slei"; opnum[x]=28; extra[x]=0; format[x++]=F_ALUI;583: opname[x]="sgei"; opnum[x]=29; extra[x]=0; format[x++]=F_ALUI;584: opname[x]="lw"; opnum[x]=35; extra[x]=0; format[x++]=F_LOAD;585: opname[x]="sw"; opnum[x]=43; extra[x]=0; format[x++]=F_STORE;586: opname[x]="adduif"; opnum[x]=49; extra[x]=0; format[x++]=F_ALUI;587: opname[x]="jrf"; opnum[x]=50; extra[x]=0; format[x++]=F_JREG;588: opname[x]="swf"; opnum[x]=51; extra[x]=0; format[x++]=F_STORE;589:590: opname[x]="nop"; opnum[x]=0; extra[x]=0; format[x++]=F_NONE;591: opname[x]="sll"; opnum[x]=0; extra[x]=4; format[x++]=F_ALU;592: opname[x]="srl"; opnum[x]=0; extra[x]=6; format[x++]=F_ALU;593: opname[x]="sra"; opnum[x]=0; extra[x]=7; format[x++]=F_ALU;594: opname[x]="adduf"; opnum[x]=0; extra[x]=25; format[x++]=F_ALU;595: opname[x]="addu"; opnum[x]=0; extra[x]=33; format[x++]=F_ALU;596: opname[x]="subu"; opnum[x]=0; extra[x]=35; format[x++]=F_ALU;597: opname[x]="and"; opnum[x]=0; extra[x]=36; format[x++]=F_ALU;598: opname[x]="or"; opnum[x]=0; extra[x]=37; format[x++]=F_ALU;599: opname[x]="xor"; opnum[x]=0; extra[x]=38; format[x++]=F_ALU;600: opname[x]="seq"; opnum[x]=0; extra[x]=40; format[x++]=F_ALU;601: opname[x]="sne"; opnum[x]=0; extra[x]=41; format[x++]=F_ALU;602: opname[x]="slt"; opnum[x]=0; extra[x]=42; format[x++]=F_ALU;603: opname[x]="sgt"; opnum[x]=0; extra[x]=43; format[x++]=F_ALU;604: opname[x]="sle"; opnum[x]=0; extra[x]=44; format[x++]=F_ALU;605: opname[x]="sge"; opnum[x]=0; extra[x]=45; format[x++]=F_ALU;606:607: opname[x]=".word"; opnum[x]=0; extra[x]=0; format[x++]=F_DATA;608:609: opcount = x;610: printf ("%d out of %d opcode database slots are used.0, x, MAX_OPS);611: } /* function define_data */
205
Bibliography
[Bell78] C. G. Bell, A. Kotok, T. N. Hastings, and R. Hill, ‘‘The Evolution of theDECsystem-10,’’ in Computer Engineering, a DEC View of HardwareSystems Design (Bell, Mudge, and McNamara), Digital Press, Bedford,MA (1978).
[Berk91] C. H. v. Berkel, ‘‘Beware the Isochronic Fork,’’ Nat. Lab. UnclassifiedReport UR 003/91, Philips Research Lab., Eindhoven, The Netherlands(1991).
[Burn87] S. M. Burns and A. J. Martin, ‘‘Syntax-Directed Translation ofConcurrent Programs into Self-Timed Circuits,’’ Advanced Research inVLSI: Proceedings of the Fifth MIT Conference, Cambridge, MA, pp.35-50 (March 1987).
[Cade91] Cadence, Verilog-XL Reference Manual, Cadence Design Systems, Inc.,Lowell, MA (1991).
[Cast82] X. Castillo, S. R. McConnel, and D. P. Siewiorek, ‘‘Derivation andCalibration of a Transient Error Reliability Model,’’ IEEE Transactionson Computers C-31(7), pp. 658-671 (July 1982).
[Ciac81] M. L. Ciacelli, ‘‘Fault Handling on the IBM 4341 Processor,’’ 11thFault Tolerant Computing Symposium, Portland, Maine, pp. 9-12 (June1981).
[Dall86] W. J. Dally and C. L. Seitz, ‘‘The Torus Routing Chip,’’ DistributedComputing 1(4), pp. 187-196 (October 1986).
[Fran83] E. H. Frank and R. F. Sproull, ‘‘A Self-Timed Static RAM,’’ ThirdCaltech Conference on VLSI, Pasadena, CA, pp. 275-285 (March1983).
[Henn90] J. L. Hennessy and D. A. Patterson, Computer Architecture: AQuantitative Approach,Morgan Kaufmann, San Mateo, CA (1990).
[Host91] L. B. Hostetler and B. Mirtich, ‘‘DLXsim ! A Simulator for DLX,’’Documentation in the DLX Simulator Software Package (May 1, 1991).
[Jaco90] G. M. Jacobs and R. W. Brodersen, ‘‘A Fully Asynchronous DigitalSignal Processor Using Self-Timed Circuits,’’ IEEE Journal of Solid-
206
State Circuits 25(6), pp. 1526-1537 (December 1990).
[Mart85] A. J. Martin, ‘‘The Design of a Self-Timed Circuit for DistributedMutual Exclusion,’’ 1985 Chapel Hill Conference on VLSI, Chapel Hill,NC, pp. 245-260 (March 1985).
[Mart86] A. J. Martin, ‘‘Compiling Communicating Processes into Delay-Insensitive VLSI Circuits,’’ Distributed Computing 1(4), pp. 226-234(October 1986).
[Mart89] A. J. Martin, S. M. Burns, T. K. Lee, D. Borkovic, and P. J.Hazewindus, ‘‘The First Asynchronous Microprocessor: The TestResults,’’ Computer Architecture News 17(4), pp. 95-110 (June 1989).
[Meng89] T. H.-Y. Meng, R. W. Brodersen, and D. G. Messerschmitt, ‘‘AutomaticSynthesis of Asynchronous Circuits from High-Level Specifications,’’IEEE Transactions on Computer-Aided Design 8(11), pp. 1185-1205(November 1989).
[Moln85] C. E. Molnar, T.-P. Fang, and F. U. Rosenberger, ‘‘Synthesis of Delay-Insensitive Modules,’’ 1985 Chapel Hill Conference on VLSI, ChapelHill, NC, pp. 67-86 (March 1985).
[Patt83] D. A. Patterson, P. Garrison, M. Hill, D. Lioupis, C. Nyberg, T. Sippel,and K. V. Dyke, ‘‘Architecture of a VLSI Instruction Cache For aRISC,’’ 10th Annual Symposium on Computer Architecture, Stockholm,Sweden, pp. 108-116 (June 1983).
[Seit80] C. L. Seitz, ‘‘System Timing,’’ in Introduction to VLSI Systems (Meadand Conway), Addison-Wesley, Reading, MA (1980).
[Siew92] D. P. Siewiorek, D. Ciplickas, J. Willis, A. Gupta, and J. Quinlan,‘‘Laboratory Experiences with Verilog Simulation in an UndergraduateComputer Architectdure Course,’’ Proceedings of the Annual OpenVerilog International User Group Meeting, Santa Clara, CA (March24-25, 1992).
[Stro85] R. E. Strom and S. Yemini, ‘‘Optimistic Recovery in DistributedSystems,’’ ACM Transactions on Computer Systems 3(3), pp. 204-226(August 1985).
[Suth89] I. E. Sutherland, ‘‘Micropipelines,’’ Communications of the ACM 32(6),pp. 720-738 (June 1989).
207
[Tami88] Y. Tamir, M. Tremblay, and D. A. Rennels, ‘‘The Implementation andApplication of Micro Rollback in Fault-Tolerant VLSI Systems,’’ 18thFault-Tolerant Computing Symposium, Tokyo, Japan, pp. 234-239(June 1988).
[Tami90a] Y. Tamir and M. Tremblay, ‘‘High-Performance Fault-Tolerant VLSISystems Using Micro Rollback,’’ IEEE Transactions on Computers C-39(4), pp. 548-554 (April 1990).
[Tami90b] Y. Tamir, M. Liang, T. Lai, and M. Tremblay, ‘‘The UCLA MirrorProcessor: A Building Block for Self-Checking Self-RepairingComputing Nodes,’’ CS Department Technical Report #CSD-900040,University of California, Los Angeles, CA (November 1990).
[Trem89] M. Tremblay and Y. Tamir, ‘‘Support for Fault Tolerance in VLSIProcessors,’’ International Symposium on Circuits and Systems,Portland, OR, pp. 388-393 (May 1989).
[TRW92] TRW, ‘‘RH32 Spaceborne Data Processor,’’ Preliminary ProductAnnouncement, TRW Space Communications Division, RedondoBeach, CA (March 1992).
[Wils92] R. Wilson, ‘‘VLSI Meet Sees CPUs Speed Up,’’ Electronic EngineeringTimes, p. 1 (June 8, 1992).
[Wint92] K. D. Winters, ‘‘ASIC Design Experience in Undergraduate LogicDesign and Computer Architecture Courses,’’ Engineering ResearchLaboratory Report #92005, Montana State University, Bozeman, MT(April 10, 1992).
208