Download pdf - Ch01 Notes

E&CE 327: Digital Systems EngineeringCourse Notes

(with Solutions)

2015t1 (Winter)

Instructor: Rodolfo Pellizzoni

Notes by: Mark Aagaard

University of WaterlooDept of Electrical and Computer Engineering

ECE-327: 2015t1 (Winter)0.0 1 ii

Contents

1 Fundamentals of VHDL 131.1 Introduction to VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.1.1 Levels of Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.1.2 VHDL Origins and History . . . . . . . . . . . . . . . . . . . . . . . . . . 141.1.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.1.4 Synthesis of a Simulation-Based Language . . . . . . . . . . . . . . . . . 171.1.5 Solution to Synthesis Sanity . . . . . . . . . . . . . . . . . . . . . . . . . 181.1.6 Standard Logic 1164 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.2 Comparison of VHDL to Other Hardware Description Languages . . . . . . . . . 191.2.1 VHDL Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.2.2 VHDL Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.2.3 VHDL and Other Languages . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.2.3.1 VHDL vs Verilog . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.3.2 VHDL vs System Verilog . . . . . . . . . . . . . . . . . . . . . 201.2.3.3 VHDL vs SystemC . . . . . . . . . . . . . . . . . . . . . . . . 201.2.3.4 Summary of VHDL Evaluation . . . . . . . . . . . . . . . . . . 21

1.3 Overview of Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.3.1 Syntactic Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.3.2 Library Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.3.3 Entities and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.3.4 Concurrent Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.3.5 Component Declaration and Instantiations . . . . . . . . . . . . . . . . . . 271.3.6 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.3.7 Sequential Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.3.8 A Few More Miscellaneous VHDL Features . . . . . . . . . . . . . . . . 30

1.4 Concurrent vs Sequential Statements . . . . . . . . . . . . . . . . . . . . . . . . . 301.4.1 Concurrent Assignment vs Process . . . . . . . . . . . . . . . . . . . . . . 301.4.2 Conditional Assignment vs If Statements . . . . . . . . . . . . . . . . . . 301.4.3 Selected Assignment vs Case Statement . . . . . . . . . . . . . . . . . . . 311.4.4 Coding Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.5 Overview of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.5.1 Combinational Process vs Clocked Process . . . . . . . . . . . . . . . . . 341.5.2 Latch Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351.5.3 Combinational vs Flopped Signals . . . . . . . . . . . . . . . . . . . . . . 37

iii

CONTENTS iv

1.6 VHDL Execution: Delta-Cycle Simulation . . . . . . . . . . . . . . . . . . . . . . 371.6.1 Simple Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371.6.2 Temporal Granularities of Simulation . . . . . . . . . . . . . . . . . . . . 381.6.3 Zero-Delay Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391.6.4 Intuition Behind Delta-Cycle Simulation . . . . . . . . . . . . . . . . . . 40

1.6.4.1 Introduction to Delta-Cycle Simulation . . . . . . . . . . . . . . 401.6.4.2 Intuitive Rules for Delta-Cycle Simulation . . . . . . . . . . . . 411.6.4.3 Example of Delta-Cycles: Back-to-Back Buffers . . . . . . . . . 421.6.4.4 Example of Projected Assignment: Back-to-Back Buffers . . . . 431.6.4.5 Example of Projected Assignment: Back-to-Back Flip-Flops . . 431.6.4.6 Example of Projected Assignment with Combinational Loop . . 45

1.6.5 VHDL Delta-Cycle Simulation . . . . . . . . . . . . . . . . . . . . . . . . 481.6.5.1 Informal Description of Algorithm . . . . . . . . . . . . . . . . 481.6.5.2 Example: VHDL Simulation of Back-to-Back Buffers . . . . . . 501.6.5.3 Definitions and Algorithm . . . . . . . . . . . . . . . . . . . . . 511.6.5.4 Example: Delta-Cycle Simulation of Back-to-Back Flip-Flops . 521.6.5.5 Example: VHDL Simulation of Combinational Loop . . . . . . 551.6.5.6 Rules and Observations for Drawing Delta-Cycle Simulations . . 56

1.6.6 External Inputs and Flip-Flops . . . . . . . . . . . . . . . . . . . . . . . . 571.7 Register-Transfer-Level Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 60

1.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601.7.2 Technique for Register-Transfer Level Simulation . . . . . . . . . . . . . . 621.7.3 Examples of RTL Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 63

1.7.3.1 RTL Simulation Example 1 . . . . . . . . . . . . . . . . . . . . 631.8 Simple RTL Simulation in Software . . . . . . . . . . . . . . . . . . . . . . . . . 66

1.8.1 Introductory Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661.8.2 Regs and Comb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671.8.3 Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701.8.4 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

1.9 Variables in VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731.9.1 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741.9.2 Usage of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

1.10 Delta-Cycle Simulation with Delays . . . . . . . . . . . . . . . . . . . . . . . . . 751.10.1 Transport and Inertial Delay . . . . . . . . . . . . . . . . . . . . . . . . . 751.10.2 Delayed Assignment Semantics . . . . . . . . . . . . . . . . . . . . . . . 761.10.3 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791.10.4 Waveform Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

1.11 VHDL and Hardware Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . 841.11.1 Basic Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841.11.2 Deprecated Building Blocks for RTL . . . . . . . . . . . . . . . . . . . . 85

1.11.2.1 An Aside on Flip-Flops and Latches . . . . . . . . . . . . . . . 851.11.2.2 Deprecated Hardware . . . . . . . . . . . . . . . . . . . . . . . 85

1.11.3 Hardware and Code for Flops . . . . . . . . . . . . . . . . . . . . . . . . 861.11.3.1 Flops with Waits and Ifs . . . . . . . . . . . . . . . . . . . . . . 861.11.3.2 Flops with Synchronous Reset . . . . . . . . . . . . . . . . . . 86

v CONTENTS

1.11.3.3 Flops with Chip-Enable . . . . . . . . . . . . . . . . . . . . . . 871.11.3.4 Flop with Chip-Enable and Mux on Input . . . . . . . . . . . . . 871.11.3.5 Flops with Chip-Enable, Muxes, and Reset . . . . . . . . . . . . 88

1.11.4 An Example Sequential Circuit . . . . . . . . . . . . . . . . . . . . . . . 881.12 Synthesizable vs Non-Synthesizable Code . . . . . . . . . . . . . . . . . . . . . . 92

1.12.1 Initial Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931.12.2 Wait For . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931.12.3 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931.12.4 Bits and Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931.12.5 Assignments before Wait Statement . . . . . . . . . . . . . . . . . . . . . 941.12.6 Different Wait Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 941.12.7 Multiple “if rising edge” in Process . . . . . . . . . . . . . . . . . . . . . 941.12.8 “if rising edge” and “wait” in Same Process . . . . . . . . . . . . . . . . . 961.12.9 “if rising edge” with “else” Clause . . . . . . . . . . . . . . . . . . . . . . 961.12.10 Loop with Both Comb and Clocked Paths . . . . . . . . . . . . . . . . . . 961.12.11 “wait” Inside of a “for loop” . . . . . . . . . . . . . . . . . . . . . . . . . 98

1.13 Guidelines for Desirable Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . 981.13.1 Know Your Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 991.13.2 Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001.13.3 Asynchronous Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011.13.4 Combinational Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011.13.5 Using a Data Signal as a Clock . . . . . . . . . . . . . . . . . . . . . . . . 1021.13.6 Using a Clock Signal as Data . . . . . . . . . . . . . . . . . . . . . . . . . 1021.13.7 Tri-State Buffers and Signals . . . . . . . . . . . . . . . . . . . . . . . . . 1031.13.8 Multiple Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

2 Additional Features of VHDL 1432.1 Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

2.1.1 Numeric Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1442.1.2 Bit-String Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

2.2 Arrays and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1462.2.1 Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1462.2.2 Indexing, Slicing, Concatenation, Aggregates . . . . . . . . . . . . . . . . . 148

2.3 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.3.1 Arithmetic Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.3.2 Arithmetic Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1522.3.3 Overloading of Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . 1532.3.4 Widths for Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . 1542.3.5 Overloading of Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 1562.3.6 Widths for Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1572.3.7 Type Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1582.3.8 Shift and Rotate Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

2.4 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1632.4.1 Enumerated Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1632.4.2 Defining New Array Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

CONTENTS vi

3 Overview of FPGAs 1653.1 Generic FPGA Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

3.1.1 Generic FPGA Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1663.1.2 Lookup Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1703.1.3 Interconnect for Generic FPGA . . . . . . . . . . . . . . . . . . . . . . . . . 1713.1.4 Blocks of Cells for Generic FPGA . . . . . . . . . . . . . . . . . . . . . . . 1743.1.5 Special Circuitry in FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

3.2 Area Estimation for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1773.2.1 Area for Circuit with one Target . . . . . . . . . . . . . . . . . . . . . . . . 1783.2.2 Algorithm to Allocate Gates to Cells . . . . . . . . . . . . . . . . . . . . . . 1813.2.3 Area for Arithmetic Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 186

4 Intro to RTL Design with VHDL 1914.1 Function Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

4.1.1 Karnaugh Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1924.1.2 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1934.1.3 Multi-Output Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1944.1.4 Don’t-Cares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1954.1.5 Don’t Cares on Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1964.1.6 Consistency and ‘Unused’ . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

4.2 Finite State Machines in VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2024.2.1 HDL Coding Styles for State Machines . . . . . . . . . . . . . . . . . . . . 2024.2.2 State Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2034.2.3 Traditional State-Machine Notation . . . . . . . . . . . . . . . . . . . . . . 2044.2.4 Our State-Machine Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 2054.2.5 Bounce Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2064.2.6 Registered Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2114.2.7 Summary and Analysis of Explicit vs Implicit . . . . . . . . . . . . . . . . . 2154.2.8 More Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2164.2.9 Semantic and Syntax Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 2234.2.10 VHDL Constructs and Patterns . . . . . . . . . . . . . . . . . . . . . . . . 2264.2.11 Translating VHDL to FSM . . . . . . . . . . . . . . . . . . . . . . . . . . 2304.2.12 Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

4.3 Dataflow Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2334.3.1 Dataflow Diagrams Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 2334.3.2 Dataflow Diagram Execution . . . . . . . . . . . . . . . . . . . . . . . . . . 2364.3.3 Dataflow Diagrams, Hardware, and Behaviour . . . . . . . . . . . . . . . . . 2404.3.4 Performance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2444.3.5 Area Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2454.3.6 Design Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2474.3.7 Parcels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2514.3.8 Bubbles and Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

4.4 Hnatyshyn with Registered Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 2594.4.1 Leftovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2594.4.2 Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

vii CONTENTS

4.4.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2654.4.4 Data-Dependency Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2664.4.5 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2674.4.6 Area Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2684.4.7 Assign Names to Registered Signals . . . . . . . . . . . . . . . . . . . . . . 2694.4.8 VHDL #1: Big and Obviously Correct . . . . . . . . . . . . . . . . . . . . . 2714.4.9 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2734.4.10 VHDL #2: Post-Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 2794.4.11 Explicit State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2814.4.12 VHDL Implementation #3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

4.5 Design Example: Hnatyshyn with Combinational Inputs and Outputs . . . . . . . . . 2934.5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2954.5.2 Dataflow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2964.5.3 Maximum Throughput Design . . . . . . . . . . . . . . . . . . . . . . . . . 3014.5.4 Minimum Area Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3064.5.5 Minimum Area Design with ASAP Parcels . . . . . . . . . . . . . . . . . . 3084.5.6 Minimum Area Design with Unpredictable Bubbles . . . . . . . . . . . . . . 311

4.6 Hnatyshyn with Registered Inputs and Combinational Output . . . . . . . . . . . . . 3164.6.1 Dataflow Diagram and Behaviour . . . . . . . . . . . . . . . . . . . . . . . 316

4.7 Hnatyshyn with Registered Inputs and Outputs . . . . . . . . . . . . . . . . . . . . . 3254.7.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3264.7.2 Data-Dependency Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3274.7.3 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3284.7.4 Area Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3294.7.5 Assign Names to Registered Signals . . . . . . . . . . . . . . . . . . . . . . 3304.7.6 VHDL #1: Big and Obviously Correct . . . . . . . . . . . . . . . . . . . . . 3324.7.7 Tangent: Combinational Outputs . . . . . . . . . . . . . . . . . . . . . . . . 3344.7.8 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3354.7.9 VHDL #2: Post-Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3414.7.10 Separate Datapath and Control . . . . . . . . . . . . . . . . . . . . . . . . 343

4.8 Design Example: Hnatyshyn with Bubbles . . . . . . . . . . . . . . . . . . . . . . . 3544.8.1 Control Table — Standard Method . . . . . . . . . . . . . . . . . . . . . . . 3604.8.2 Control Table — Valid Bit Shortcut . . . . . . . . . . . . . . . . . . . . . . 365

4.9 Example: LeBlanc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3714.9.1 System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3724.9.2 Design for ASAP Parcels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

4.9.2.1 Implicit State Machine . . . . . . . . . . . . . . . . . . . . . . . . 3824.9.2.2 Explicit State Machine . . . . . . . . . . . . . . . . . . . . . . . . 3854.9.2.3 Datapath Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 3874.9.2.4 Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 3884.9.2.5 Buggy Implementation . . . . . . . . . . . . . . . . . . . . . . . . 389

4.9.3 Design for Unpredictable Bubbles . . . . . . . . . . . . . . . . . . . . . . . 3914.9.3.1 Implicit State Machine . . . . . . . . . . . . . . . . . . . . . . . . 3934.9.3.2 Explicit State Machine . . . . . . . . . . . . . . . . . . . . . . . . 3964.9.3.3 Datapath Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

CONTENTS viii

4.9.3.4 Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 4084.9.3.5 Buggy Implementation . . . . . . . . . . . . . . . . . . . . . . . . 409

5 Intermediate RTL Design 4135.1 Inter-Parcel Variables: Hnatyshyn with Internal State . . . . . . . . . . . . . . . . . 414

5.1.1 Requirements and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4155.1.2 High-Level Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4165.1.3 Dataflow Diagrams and Waveforms . . . . . . . . . . . . . . . . . . . . . . 4175.1.4 Implicit State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4185.1.5 Adding Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4195.1.6 Control Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4235.1.7 VHDL Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4285.1.8 Summary of Bubbles and Inter-Parcel Variables . . . . . . . . . . . . . . . . 431

5.2 Hnatyshyn for a Finite Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4325.2.1 Introduction to Hnatyshyn-Finite . . . . . . . . . . . . . . . . . . . . . . . . 4325.2.2 Requirements, Goals, and Constraints . . . . . . . . . . . . . . . . . . . . . 4345.2.3 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4355.2.4 High-Level Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4375.2.5 Transient-State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4405.2.6 Add Support for Bubbles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4435.2.7 Linearize Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446

5.3 Memory Arrays and RTL Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4535.3.1 Memory Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4535.3.2 Memory Arrays in VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . 4575.3.3 Data Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

5.4 Design Example: Massey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4635.5 Design Example: Vanier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

5.5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4645.5.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4655.5.3 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4665.5.4 Reschedule to Meet Requirements . . . . . . . . . . . . . . . . . . . . . . . 4675.5.5 Optimization: Reduce Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . 4695.5.6 Assign Names to Registered Values . . . . . . . . . . . . . . . . . . . . . . 4715.5.7 VHDL Implementation #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 4725.5.8 Tangent: Combinational Outputs . . . . . . . . . . . . . . . . . . . . . . . . 4735.5.9 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4755.5.10 VHDL Implementation #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 4785.5.11 Separate Datapath and Control . . . . . . . . . . . . . . . . . . . . . . . . 4795.5.12 “Don’t-Care” Instantiations . . . . . . . . . . . . . . . . . . . . . . . . . . 4815.5.13 VHDL Implementation #3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 4875.5.14 VHDL Implementation #4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 4895.5.15 Notes and Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491

ix CONTENTS

6 Advanced RTL Design: Optimization 4936.1 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494

6.1.1 Introduction to Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4956.1.2 Partially Pipelined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5006.1.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5026.1.4 Design Example: Pipelined Massey . . . . . . . . . . . . . . . . . . . . . . 5036.1.5 Overlapping Pipeline Stages . . . . . . . . . . . . . . . . . . . . . . . . . . 507

6.2 Staggering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5156.3 Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5166.4 General Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521

6.4.1 Strength Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5216.4.1.1 Arithmetic Strength Reduction . . . . . . . . . . . . . . . . . . . . 5216.4.1.2 Boolean Strength Reduction . . . . . . . . . . . . . . . . . . . . . 522

6.4.2 Replication and Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5236.4.2.1 Mux-Pushing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5236.4.2.2 Common Subexpression Elimination . . . . . . . . . . . . . . . . 5246.4.2.3 Computation Replication . . . . . . . . . . . . . . . . . . . . . . . 526

6.4.3 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5276.5 Customized State Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528

7 Performance Analysis 5297.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5307.2 Defining Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5317.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5347.4 Comparing Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537

7.4.1 General Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5377.4.2 Example: Performance of Printers . . . . . . . . . . . . . . . . . . . . . . . 545

7.5 Clock Speed, CPI, Program Length, and Performance . . . . . . . . . . . . . . . . . 5467.5.1 Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5467.5.2 Example: CISC vs RISC and CPI . . . . . . . . . . . . . . . . . . . . . . . 5477.5.3 Effect of Instruction Set on Performance . . . . . . . . . . . . . . . . . . . . 551

7.6 Effect of Time to Market on Relative Performance . . . . . . . . . . . . . . . . . . . 556

CONTENTS x

8 Timing Analysis 5638.1 Delays and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564

8.1.1 Background Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5648.1.2 Clock-Related Timing Definitions . . . . . . . . . . . . . . . . . . . . . . . 565

8.1.2.1 Clock Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5658.1.2.2 Clock Skew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5678.1.2.3 Clock Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569

8.1.3 Storage-Related Timing Definitions . . . . . . . . . . . . . . . . . . . . . . 5718.1.3.1 Flops and Latches . . . . . . . . . . . . . . . . . . . . . . . . . . 571

8.1.4 Propagation Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5748.1.5 Timing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5758.1.6 Review: Timing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 577

8.2 Timing Analysis of Simple Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . 5788.2.1 Structure and Behaviour of Multiplexer Latch . . . . . . . . . . . . . . . . . 5788.2.2 Strategy for Timing Analysis of Storage Devices . . . . . . . . . . . . . . . 5818.2.3 Clock-to-Q Time of a Latch . . . . . . . . . . . . . . . . . . . . . . . . . . . 5828.2.4 From Load Mode to Store Mode . . . . . . . . . . . . . . . . . . . . . . . . 5838.2.5 Setup Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5848.2.6 Hold Time of a Multiplexer Latch . . . . . . . . . . . . . . . . . . . . . . . 5908.2.7 Example of a Bad Latch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5938.2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596

8.3 Advanced Timing Analysis of Storage Elements . . . . . . . . . . . . . . . . . . . . 5978.4 Critical Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598

8.4.1 Introduction to Critical and False Paths . . . . . . . . . . . . . . . . . . . . . 6008.4.1.1 Example of Critical Path in Full Adder . . . . . . . . . . . . . . . 6018.4.1.2 Longest Path and Critical Path . . . . . . . . . . . . . . . . . . . . 6038.4.1.3 Criteria for Critical Path Algorithms . . . . . . . . . . . . . . . . . 606

8.4.2 Longest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6078.4.2.1 Algorithm to Find Longest Path . . . . . . . . . . . . . . . . . . . 6078.4.2.2 Longest Path Example . . . . . . . . . . . . . . . . . . . . . . . . 608

8.4.3 Monotone Speedup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6098.5 False Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6138.6 Analog Timing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614

8.6.1 Defining Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6158.6.2 Modeling Circuits for Timing . . . . . . . . . . . . . . . . . . . . . . . . . . 6198.6.3 Example: Two Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6238.6.4 Ex: Two Bufs with Both Caps . . . . . . . . . . . . . . . . . . . . . . . . . 628

8.7 Elmore Delay Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6328.7.1 Elmore Delay as an Approximation . . . . . . . . . . . . . . . . . . . . . . 6328.7.2 A More Complicated Example . . . . . . . . . . . . . . . . . . . . . . . . . 635

8.8 Practical Usage of Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 639

xi CONTENTS

9 Power Analysis and Power-Aware Design 6419.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

9.1.1 Importance of Power and Energy . . . . . . . . . . . . . . . . . . . . . . . . 6429.1.2 Power vs Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6439.1.3 Batteries, Power and Energy . . . . . . . . . . . . . . . . . . . . . . . . . . 644

9.1.3.1 Do Batteries Store Energy or Power? . . . . . . . . . . . . . . . . 6449.1.3.2 Battery Life and Efficiency . . . . . . . . . . . . . . . . . . . . . . 6459.1.3.3 Battery Life and Power . . . . . . . . . . . . . . . . . . . . . . . . 646

9.2 Power Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6499.2.1 Switching Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6519.2.2 Short-Circuited Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6539.2.3 Leakage Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6549.2.4 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6559.2.5 Note on Power Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

9.3 Overview of Power Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . 6559.4 Voltage Reduction for Power Reduction . . . . . . . . . . . . . . . . . . . . . . . . 6609.5 Data Encoding for Power Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 666

9.5.1 How Data Encoding Can Reduce Power . . . . . . . . . . . . . . . . . . . . 6669.5.2 Example Problem: Sixteen Pulser . . . . . . . . . . . . . . . . . . . . . . . 670

9.5.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 6709.5.2.2 Additional Information . . . . . . . . . . . . . . . . . . . . . . . . 6719.5.2.3 Answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673

9.6 Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6809.6.1 Introduction to Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . 6819.6.2 Implementing Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . 6829.6.3 Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6839.6.4 Effectiveness of Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . 6849.6.5 Example: Reduced Activity Factor with Clock Gating . . . . . . . . . . . . . 6889.6.6 Calculating PctBusy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690

9.6.6.1 Valid Bits and Busy . . . . . . . . . . . . . . . . . . . . . . . . . . 6909.6.6.2 Calculating LenBusy . . . . . . . . . . . . . . . . . . . . . . . . . 6929.6.6.3 From LenBusy to PctBusy . . . . . . . . . . . . . . . . . . . . . . 694

9.6.7 Example: Pipelined Circuit with Clock-Gating . . . . . . . . . . . . . . . . . 6969.6.8 Clock Gating in ASICs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7029.6.9 Alternatives to Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . 703

9.6.9.1 Use Chip Enables . . . . . . . . . . . . . . . . . . . . . . . . . . . 7039.6.9.2 Operand Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704

CONTENTS xii

10 Review 70510.1 Overview of the Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70610.2 VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707

10.2.1 VHDL Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70710.2.2 VHDL Example Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 708

10.3 RTL Design Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70910.3.1 Design Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70910.3.2 Design Example Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 710

10.4 Performance Analysis and Optimization . . . . . . . . . . . . . . . . . . . . . . . 71110.4.1 Performance Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71110.4.2 Performance Example Problems . . . . . . . . . . . . . . . . . . . . . . . . 712

10.5 Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71310.5.1 Timing Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71310.5.2 Timing Example Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 714

10.6 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71510.6.1 Power Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71510.6.2 Power Example Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 716

10.7 Formulas to be Given on Final Exam . . . . . . . . . . . . . . . . . . . . . . . . . 717

Chapter 1

Fundamentals of VHDL

1.1 Introduction to VHDL

1.1.1 Levels of Abstraction

There are many different levels of abstraction for working with hardware:

• Quantum: Schrodinger’s equations describe movement of electrons and holes through mate-rial.

• Energy band: 2-dimensional diagrams that capture essential features of Schrodinger’s equa-tions. Energy-band diagrams are commonly used in nano-scale engineering.

• Transistor: Signal values and time are continous (analog). Each transistor is modeled by aresistor-capacitor network. Overall behaviour is defined by differential equations in terms ofthe resistors and capacitors. Spice is a typical simulation tool.

• Switch: Time is continuous, but voltage may be either continuous or discrete. Linear equa-tions are used, rather than differential equations. A rising edge may be modeled as a linearrise over some range of time, or the time between a definite low value and a definite highvalue may be modeled as having an undefined or rising value.

• Gate: Transistors are grouped together into gates (e.g. AND, OR, NOT). Voltages are discretevalues such as pure Boolean (0 or 1) or IEEE Standard Logic 1164, which has representationsfor different types of unknown or undefined values. Time may be continuous or may bediscrete. If discrete, a common unit is the delay through a single inverter (e.g. a NOT gatehas a delay of 1 and AND gate has a delay of 2).

13

1.1.2 VHDL Origins and History 14

• Register transfer level: The essential characteristic of the register transfer level is that thebehaviour of hardware is modeled as assignments to registers and combinational signals.Equations are written where a register signal is a function of other signals (e.g. c = aand b;). The assignments may be either combinational or registered. Combinational as-signments happen instanteously and registered assignments take exactly one clock cycle.There are variations on the pure register-transfer level. For example, time may be measuredin clock phases rather than clock cycles, so as to allow assignments on either the rising orfalling edge of a clock. Another variation is to have multiple clocks that run at differentspeeds — a clock on a bus might run at half the speed of the primary clock for the chip.

• Transaction level: The basic unit of computation is a transaction, such as executing an in-struction on a microprocessor, transfering data across a bus, or accessing memory. Timeis usually measured as an estimate (e.g. a memory write requires 15 clock cycles, or abus transfer requires 250 ns.). The building blocks of the transaction level are processors,controllers, memory arrays, busses, intellectual property (IP) blocks (e.g. UARTs). Thebehaviour of the building blocks are described with software-like models, often written inbehavioural VHDL, SystemC, or SystemVerilog. The transaction level has many similaritiesto a software model of a distributed system.

• Electronic-system level: Looks at an entire electronic system, with both hardware and soft-ware.

In this course, we will focus on the register-transfer level. In the second half of the course, we willlook at how analog phenomenon, such as timing and power, affect the register-transfer level. Inthese chapters we will occasionally dip down into the transistor, switch, and gate levels.

1.1.2 VHDL Origins and History

VHDL = VHSIC Hardware Description LanguageVHSIC = Very High Speed Integrated Circuit

The VHSIC Hardware Description Language (VHDL) is a formal notation intendedfor use in all phases of the creation of electronic systems. Because it is both machinereadable and human readable, it supports the development, verification, synthesis andtesting of hardware designs, the communication of hardware design data, and themaintenance, modification, and procurement of hardware.

Language Reference Manual (IEEE Design Automation Standards Committee,1993a)

15 CHAPTER 1. FUNDAMENTALS OF VHDL

• development• verification• synthesis• testing• hardware designs

• communication

•maintenance

•modification

• procurement

VHDL is a lot more than synthesis of digitalhardware

VHDL History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

•Developed by the United States Department of Defense as part of the very high speed integratedcircuit (VHSIC) program in the early 1980s.• The Department of Defense intended VHDL to be used for the documentation, simulation and

verification of electronic systems.•Goals:

– improve design process over schematic entry– standardize design descriptions amongst multiple vendors– portable and extensible• Inspired by the ADA programming language

– large: 97 keywords, 94 syntactic rules– verbose (designed by committee)– static type checking, overloading– complicated syntax: parentheses are used for both expression grouping and array indexing

Example:a <= b * (3 + c); -- integera <= (3 + c); -- 1-element array of integers

• Standardized by IEEE in 1987 (IEEE 1076-1987), revised in 1993, 2000.• In 1993 the IEEE standard VHDL package for model interoperability, STD_LOGIC_1164

(IEEE Standard 1164-1993), was developed.– std_logic_1164 defines 9 different values for signals• In 1997 the IEEE standard packages for arithmetic over std logic and bit signals were

defined (IEEE Standard 1076.3–1997).– numeric_std defines arithmetic over std logic vectors and integers.

Note: This is the package that you should use for arithmetic. Don’tuse std logic arith — it has less uniform support for mixed inte-ger/signal arithmetic and has a greater tendency for differences betweentools.

– numeric_bit defines arithmetic over bit vectors and integers. We won’t use bitsignals in this course, so you don’t need to worry about this package.

1.1.3 Semantics 16

1.1.3 Semantics

The original goal of VHDL was to simulate circuits. The semantics of the language define circuitbehaviour.

a

b

c

c <= a AND b; simulation

But now, VHDL is used in simulation and synthesis. Synthesis is concerned with the structure ofthe circuit.

Synthesis: converts one type of description (behavioural) into another, lower level, description(usually a netlist).

a

b cc <= a AND b; synthesis

Synthesis is a computer-aided design (CAD) technique that transforms a designer’s concise, high-level description of a circuit into a structural description of a circuit.

CAD Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CAD Tools allow designers to automate lower-level design processes in implementing the desiredfunctionality of a system.

NOTE: EDA = Electronic Design Automation. In digital hardware design EDA = CAD.

Synthesis vs Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

For synthesis, we want the code we write to define the structure of the hardware that is generated.

The VHDL semantics define the behaviour of the hardware that is generated, not the structureof the hardware. The scenario below complies with the semantics of VHDL, because the twosynthesized circuits produce the same behaviour. If the two synthesized circuits had differentbehaviour, then the scenario would not comply with the VHDL Standard.


a

b c

a

b cc <= a AND b;

a

b

c

different

structure

same

behaviour

synthesis

simulation

a

b

c

simulation

synthesis

a

b

c

simulation

same

behaviour

1.1.4 Synthesis of a Simulation-Based Language•Not all of VHDL is synthesizable

– c <= a AND b; (synthesizable)– c <= a AND b AFTER 2ns; (NOT synthesizable)∗ how do you build a circuit with exactly 2ns of delay through an AND gate?∗more examples of non-synthesizable code are in section 1.12

– See section 1.12 for more details

•Different synthesis tools support different subsets of VHDL• Some tools generate erroneous hardware for some code

– behaviour of hardware differs from VHDL semantics

• Some tools generate unpredictable hardware (Hardware that has the correct behaviour, but un-desirable or weird structure).• There is an IEEE standard (1076.6) for a synthesizable subset of VHDL, but tool vendors do

not yet conform to it. (Most vendors still do not have full support for the 1993 extensions toVHDL!). For more info, see http://www.vhdl.org/siwg/.

1.1.5 Solution to Synthesis Sanity 18

1.1.5 Solution to Synthesis Sanity• Pick a high-quality synthesis tool and study its documentation thoroughly• Learn the idioms of the tool•Different VHDL code with same behaviour can result in very different circuits•Be careful if you have to port VHDL code from one tool to another•KISS: Keep It Simple Stupid

– VHDL examples will illustrate reliable coding techniques for the synthesis tools from Synop-sys, Mentor Graphics, Altera, Xilinx, and most other companies as well.

– Follow the coding guidelines and examples from lecture– As you write VHDL, think about the hardware you expect to get.

Note: If you can’t predict the hardware, then the hardware probablywon’t be very good (small, fast, correct, etc)

1.1.6 Standard Logic 1164

At the core of VHDL is a package named STANDARD that defines a type named bit with valuesof ’0’ and ’1’. For simulation, it helpful to have additional values, such as “undefined” and“high impedance”. Many companies created their own (incompatible) definitions of signal typesfor simulation. To regain compatibility amongst packages from different companies, the IEEEdefined std logc 1164 to be the standard type for signal values in VHDL simulation.

’U’ uninitialized’X’ strong unknown’0’ strong 0’1’ strong 1’Z’ high impedance’W’ weak unknown’L’ weak 0’H’ weak 1’--’ don’t care

The most common values are: ’U’, ’X’, ’0’, ’1’.

If you see ’X’ in a simulation, it usually means that there is a mistake in your code.

Every VHDL file that you write should begin with: library ieee;use ieee.std_logic_1164.all;

Note: std logic vs boolean The std logic values ’1’ and ’0’ are notthe same as the boolean values true and false. For example, you mustwrite if a = ’1’ then .... The code if a then ... will not type-check if a is of type std logic.


From a VLSI perspective, a weak value will come from a smaller gate. One aspect of VHDL thatwe don’t touch on in ece327 is ”resolution”, which describes how to determine the value of a signalif the signal is driven by ¡b¿more than one¡/b¿ process. (In ece327, we restrict ourselves to havingeach signal be driven by (be the target of) exactly one process). The std logic 1164 library providesa resolution function to deal with situation where different processes drive the same signal withdifferent values. In this situation, a strong value (e.g. ’1’) will overpower a weak value (e.g. ’L’).If two processes drive the signal with different strong values (e.g. ’1’ and ’0’) the signal resolvesto a strong unknown (’X’). If a signal is driven with two different weak values (e.g. ’H’ and ’L’),the signal resolves to a weak unknown (’W’).

1.2 Comparison of VHDL to Other Hardware Description Lan-guages

1.2.1 VHDL Disadvantages• Some VHDL programs cannot be synthesized•Different tools support different subsets of VHDL.•Different tools generate different circuits for same code•VHDL is verbose

– Many characters to say something simple

•VHDL is complicated and confusing– Many different ways of saying the same thing– Constructs that have similar purpose have very different syntax (case vs. select)– Constructs that have similar syntax have very different semantics (variables vs signals)

•Hardware that is synthesized is not always obvious (when is a signal a flip-flop vs latch vscombinational)– The infamous latch inference problem (See section 1.5.2 for more information)

1.2.2 VHDL Advantages•VHDL supports unsynthesizable constructs that are useful in writing high-level models, test-

benches and other non-hardware or non-synthesizable artifacts that we need in hardware design.VHDL can be used throughout a large portion of the design process in different capacities, fromspecification to implementation to verification.•VHDL has static typechecking — many errors can be caught before synthesis and/or simulation.

(In this respect, it is more similar to Java than to C.)•VHDL has a rich collection of datatypes•VHDL is a full-featured language with a good module system (libraries and packages).•VHDL has a well-defined standard.

1.2.3 VHDL and Other Languages 20

1.2.3 VHDL and Other Languages

1.2.3.1 VHDL vs Verilog

•Verilog is a “simpler” language: smaller language, simple circuits are easier to write•VHDL has more features than Verilog

– richer set of data types and strong type checking– VHDL offers more flexibility and expressivity for constructing large systems.

• The VHDL Standard is more standard than the Verilog Standard– VHDL and Verilog have simulation-based semantics– Simulation vendors generally conform to VHDL standard– Some Verilog constructs give different behaviours in simulation and synthesis

•VHDL is used more than Verilog in Europe and Japan•Verilog is used more than VHDL in North America•VHDL is used more in FPGAs than in ASICs• South-East Asia, India, South America: ?????

1.2.3.2 VHDL vs System Verilog

• System Verilog is a superset of Verilog. It extends Verilog to make it a full object-orientedhardware modelling language• Syntax is based on Verilog and C++.•As of 2007, System Verilog is used almost exclusively for test benches and simulation. Very

few people are trying to use it to do hardware design.• System Verilog grew out of Superlog, a proposed language that was based on Verilog and C.

Basic core came from Verilog. C-like extensions included to make language more expressive andpowerful. Developed by originally the company Co-Design Automation and then standardizedby Accellera, an organization aimed at standardizing EDA languages. Co-Design was purchasedby Synopsys and now Synopsys is the leading proponent of System Verilog.

1.2.3.3 VHDL vs SystemC

• System C looks like C — familiar syntax•C is often used in algorithmic descriptions of circuits, so why not try to use it for synthesizable

code as well?• If you think VHDL is hard to synthesize, try C....• SystemC simulation is slower than advertised


1.2.3.4 Summary of VHDL Evaluation

•VHDL is far from perfect and has lots of annoying characteristics•VHDL is a better language for education than Verilog because the static typechecking enforces

good software engineering practices• The richness of VHDL will be useful in creating concise high-level models and powerful test-

benches

1.3 Overview of Syntax

This section is just a brief overview of the syntax of VHDL, focusing on the constructs that aremost commonly used. For more information, read a book on VHDL and use online resources.(Look for “VHDL” under the “Documentation” tab in the E&C 327 web pages.)

1.3.1 Syntactic Categories

There are five major categories of syntactic constructs.(There are many, many minor categories and subcategories of constructs.)

• Library units (section 1.3.2)– Top-level constructs (packages, entities, architectures)

•Concurrent statements (section 1.3.4)– Statements executed at the same time (in parallel)

• Sequential statements (section 1.3.7)– Statements executed in series (one after the other)

• Expressions– Arithmetic (section 2.3), Boolean, Vectors , etc

•Declarations– Components , signals, variables, types, functions, ....

1.3.2 Library Units 22

1.3.2 Library Units

Library units are the top-level syntactic constructs in VHDL. They are used to define and includelibraries, declare and implement interfaces, define packages of declarations and otherwise bindtogether VHDL code.• Package body

– define the contents of a library

• Packages– determine which parts of the library are externally visible

•Use clause– use a library in an entity/architecture or another package– technically, use clauses are part of entities and packages, but they proceed the entity/package

keyword, so we list them as top-level constructs

• Entity (section 1.3.3)– define interface to circuit

•Architecture (section 1.3.3)– define internal signals and gates of circuit

1.3.3 Entities and Architecture

Each hardware module is described with an Entity/Architecture pair

architecture

entity

architecture

entity

Figure 1.1: Entity and Architecture

• Entity: interface– names, modes (in / out), types of

externally visible signals of circuit

•Architecture: internals

– structure and behaviour of module


library ieee;use ieee.std_logic_1164.all;

entity and_or isport (

a, b, c : in std_logic ;z : out std_logic

);end entity;

Figure 1.2: Example of an entity

1.3.3 Entities and Architecture 24

The syntax of VHDL is defined using a variation on Backus-Naur forms (BNF).

[ { use_clause } ]entity ENTITYID is

[ port ({ SIGNALID : (in | out) TYPEID [ := expr ] ; }

);][ { declaration } ]

[ begin{ concurrent_statement } ]

end [ entity ] ENTITYID ;

Figure 1.3: Simplified grammar of entity

architecture main of and_or issignal x : std_logic;

beginx <= a AND b;z <= x OR (a AND c);

end architecture;

Figure 1.4: Example of architecture

[ { use_clause } ]architecture ARCHID of ENTITYID is

[ { declaration } ]begin

[ { concurrent_statement } ]end [ architecture ] ARCHID ;

Figure 1.5: Simplified grammar of architecture


1.3.4 Concurrent Statements•An architecture contains concurrent statements•Concurrent statements execute in parallel

– Concurrent statements make VHDL fundamentally different from most software languages.– Hardware (gates) naturally execute in parallel — VHDL mimics the behaviour of real hard-

ware.– At each infinitesimally small moment of time, each gate:

1. samples its inputs2. computes the value of its output3. drives the output

architecture main of bowser isbegin x1 <= a AND b; x2 <= NOT x1; z <= NOT x2;end main;

architecture main of bowser isbegin z <= NOT x2; x2 <= NOT x1; x1 <= a AND b;end main;

a

b

zx1 x2

Figure 1.6: The order of concurrent statements doesn’t matter

1.3.4 Concurrent Statements 26

conditional assignment . . . <= . . . when . . . else . . .;

• normal assignment (. . . <= . . .)• if-then-else style (uses when)

c <= a+b when sel=’1’ else a+c when sel=’0’ else "0000";

selected assignment with . . . select. . . <= . . . when . . . | . . .,

. . . when . . . | . . .,

. . .

. . . when . . . | . . .;

• case/switch style assignment

with color select d <= "00" when red , "01" when . . .;

component instantiation . . .: . . . port map ( . . . => . . ., . . . );

• use an existing circuit• section 1.3.5

add1 : adder port map( a => f, b => g, s => h, co => i);for-generate . . .: for . . . in . . . generate

. . .end generate;

• replicate some hardware

bgen: for i in 1 to 7 generate b(i)<=a(7-i); end generate;if-generate . . .: if . . . generate

. . .end generate;

• conditionally create some hardware

okgen : if optgoal /= fast then generateresult <= ((a and b) or (d and not e)) or g;

end generate;fastgen : if optgoal = fast then generateresult <= ’1’;

end generate;process process . . . begin

. . .end process;

• the body of a process is executed sequentially• sections 1.3.6, 1.6

Figure 1.7: The most commonly used concurrent statements


1.3.5 Component Declaration and Instantiations

There are two different syntaxes for component declaration and instantiation. The VHDL-93 syn-tax is much more concise than the VHDL-87 syntax.

Not all tools support the VHDL-93 syntax. For E&CE 327, some of the tools that we use do notsupport the VHDL-93 syntax, so we are stuck with the VHDL-87 syntax.

1.3.6 Processes

• Processes are used to describe complex and potentially unsynthesizable behaviour•A process is a concurrent statement (section 1.3.4).• The body of a process contains sequential statements (section 1.3.7)• Processes are the most complex and difficult to understand part of VHDL (sections 1.5 and 1.6)

process (a, b, c)begin

y <= a AND b;if (a = ’1’) then

z1 <= b AND c;z2 <= NOT c;

elsez1 <= b OR c;z2 <= c;

end if;end process;

processbegin

y <= a AND b;z <= ’0’;wait until rising_edge(clk);if (a = ’1’) then

z <= ’1’;y <= ’0’;wait until rising_edge(clk);

elsey <= a OR b;

end if;end process;

Figure 1.8: Examples of processes

• Processes must have either a sensitivity list or at least one wait statement on each execution paththrough the process.• Processes cannot have both a sensitivity list and a wait statement.

1.3.6 Processes 28

Sensitivity List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The sensitivity list contains the signals that are read in the process.

A process is executed when a signal in its sensitivity list changes value.

An important coding guideline to ensure consistent synthesis and simulation results is to includeall signals that are read in the sensitivity list. If you forget some signals, you will either end upwith unpredictable hardware and simulation results (different results from different programs) orundesirable hardware (latches where you expected purely combinational hardware). For more onthis topic, see sections 1.5.2 and 1.6.

There is one exception to this rule: for a process that implements a flip-flop with an if rising edgestatement, it is acceptable to include only the clock signal in the sensitivity list — other signalsmay be included, but are not needed.

[ PROCLAB : ] process ( sensitivity_list )[ { declaration } ]

begin{ sequential_statement }

end process [ PROCLAB ] ;

Figure 1.9: Simplified grammar of process


1.3.7 Sequential Statements

Used inside processes and functions.

wait wait until . . .;signal assignment . . . <= . . .;if-then-else if . . . then . . . elsif . . . end if;

case case . . . iswhen . . . | . . . => . . .;when . . . => . . .;end case;

loop loop . . . end loop;

while loop while . . . loop . . . end loop;

for loop for . . . in . . . loop . . . end loop;

next next . . .;

Figure 1.10: The most commonly used sequential statements

1.3.8 A Few More Miscellaneous VHDL Features 30

1.3.8 A Few More Miscellaneous VHDL Features

Some constructs that are useful and will be described in later chapters and sections:report : print a message on stderr while simulatingassert : assertions about behaviour of signals, very useful with report statements.generics : parameters to an entity that are defined at elaboration time.attributes : predefined functions for different datatypes. For example: high and low indices of a

vector.

1.4 Concurrent vs Sequential Statements

All concurrent assignments can be translated into sequential statements. But, not all sequentialstatements can be translated into concurrent statements.

1.4.1 Concurrent Assignment vs Process

The two code fragments below have identical behaviour:

architecture main of tiny isbegin

b <= a;end main;

architecture main of tiny isbegin

process (a) beginb <= a;

end process;end main;

1.4.2 Conditional Assignment vs If Statements

The two code fragments below have identical behaviour:

Concurrent Statements

t <= <val1> when <cond>else <val2>;

Sequential Statementsif <cond> then

t <= <val1>;else

t <= <val2>;end if


1.4.3 Selected Assignment vs Case Statement

The two code fragments below have identical behaviour

Concurrent Statementswith <expr> selectt <= <val1> when <choices1>,

<val2> when <choices2>,<val3> when <choices3>;

Sequential Statementscase <expr> is

when <choices1> =>t <= <val1>;



end case;

1.4.4 Coding Style

Code that’s easy to write with sequential statements, but difficult with concurrent:

Sequential Statements

case <expr> iswhen <choice1> =>

if <cond> theno <= <expr1>;

elseo <= <expr2>;

end if;when <choice2> =>

. . .end case;

Concurrent Statements

Overall structure:with <expr> select

t <= ... when <choice1>,... when <choice2>;

Failed attempt:with <expr> select

t <= -- want to write:-- <val1> when <cond>-- else <val2>-- but conditional assignment-- is illegal herewhen c1,. . .when c2;

Concurrent statement with correct behaviour, but messy:t <= <expr1> when (expr = <choice1> AND <cond>)

else <expr2> when (expr = <choice1> AND NOT <cond>)else . . .

;

1.5. OVERVIEW OF PROCESSES 32

1.5 Overview of Processes

Processes are the most difficult VHDL construct to understand. This section gives an overview ofprocesses. section 1.6 gives the details of the semantics of processes.•Within a process, statements are executed almost sequentially•Among processes, execution is done in parallel•Remember: a process is a concurrent statement!

entity ENTITYID isinterface declarations

end ENTITYID;

architecture ARCHID of ENTITYID isbegin

concurrent statements ⇐=process begin

sequential statements ⇐=end process;

concurrent statements ⇐=end ARCHID;

Figure 1.11: Sequential statements in a process

Key concepts in VHDL semantics for processes:•VHDL mimics hardware•Hardware (gates) execute in parallel• Processes execute in parallel with each other•All possible orders of executing processes must produce the same simulation results (wave-

forms)• If a signal is not assigned a value, then it holds its previous value

All orders of executing concurrent statements mustproduce the same waveforms

It doesn’t matter whether you are running on a single-threaded operating system, on a multi-threaded operating system, on a massively parallel supercomputer, or on a special hardware emu-lator with one FPGA chip per VHDL process — all simulations must be the same.

These concepts are the motivation for the semantics of executing processes in VHDL (section 1.6)and lead to the phenomenon of latch-inference (section 1.5.2).


architecture

procA: process

stmtA1;

stmtA2;

stmtA3;

end process;

procB: process

stmtB1;

stmtB2;

end process;

execution sequence

A1

A2

A3

B1

B2

execution sequence

A1

A2

A3

B1

B2

execution sequence

A1

A2

A3

B1

B2

single threaded:procA before procB

single threaded:procB before procA

multithreaded: procAand procB in parallel

Figure 1.12: Different process execution sequences

Figure 1.13: All execution orders must have same behaviour

sections 1.5.1–1.5.3 discuss the hardware generated by processes.

sections 1.6–?? discuss the behaviour and execution of processes.

1.5.1 Combinational Process vs Clocked Process 34

1.5.1 Combinational Process vs Clocked Process

Each well-written synthesizable process is either combinational or clocked. Some synthesizableprocesses that do not conform to our coding guidelines are both combinational and clocked. Forexample, in a flip-flop with an asynchronous reset, the output is a combinational function of thereset signal and a clocked function of the data input signal. We will deal with only with processesthat follow our coding conventions, and so we will continue to say that each process is eithercombinational xor clocked.

Combinational process: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .• Executing the process takes part of one clock cycle• Target signals are outputs of combinational circuitry•A combinational processes must have a sensitivity list•A combinational process must not have any wait statements•A combinational process must not have any rising_edges, or falling_edges• The hardware for a combinational process is just combinational circuitry

Clocked process: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• Executing the process takes one (or more) clock cycles• Target signals are outputs of flops• Process contains one or more wait or if rising edge statements•Hardware contains combinational circuitry and flip flops

Note: Clocked processes are sometimes called “sequential processes”,but this can be easily confused with “sequential statements”, so in E&CE 327we’ll refer to synthesizable processes as either “combinational” or “clocked”.


Example Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Combinational Process

process (a,b,c)p1 <= a;if (b = c) then

p2 <= b;else

p2 <= a;end if;

end process;

Clocked Processesprocessbegin

wait until rising_edge(clk);b <= a;

end process;process (clk)begin

if rising_edge(clk) thenb <= a;

end if;end process;

1.5.2 Latch Inference

The semantics of VHDL require that if a signal is assigned a value on some passes through aprocess and not on other passes, then on a pass through the process when the signal is not assigneda value, it must maintain its value from the previous pass.

process (a, b, c)begin

if (a = ’1’) thenz1 <= b;z2 <= b;

elsez1 <= c;

end if;end process;

a

b

c

z1

z2

Figure 1.14: Example of latch inference

1.5.2 Latch Inference 36

When a signal’s value must be stored, VHDL infers a latch or a flip-flop in the hardware to storethe value.

If you want a latch or a flip-flop for the signal, then latch inference is good.

If you want combinational circuitry, then latch inference is bad.

Loop, Latch, Flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

b

a

z

Combinational loop

b z

a EN

Latch

b z

a

D Q

Flip-flop

Question: Write VHDL code for each of the above circuits

Answer:

combinational loopif a = ’1’ thenz <= b;

elsez <= z;

end if;

latchif a = ’1’ then

z <= b;end if;

flopif rising edge(a) then

z <= b;end if;


Causes of Latch Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Usually, “latch inference” refers to the unintentional creation of latches.

The most common cause of unintended latch inference is missing assignments to signals in if-then-else and case statements.

Latch inference happens during elaboration. When using the Synopsys tools, look for:Inferred memory devices

in the output or log files.

1.5.3 Combinational vs Flopped Signals

Signals assigned to in combinational processes are combinational.

Signals assigned to in clocked processes are outputs of flip-flops.

1.6 VHDL Execution: Delta-Cycle Simulation

In this section we go through the detailed semantics of how processes execute. These semanticsform the foundation for the simulation and synthesis of VHDL. The semantics define the simulationbehaviour, and the duty of synthesis is to produce hardware that has the same behaviour as thesimulation of the original VHDL code.

1.6.1 Simple Simulation

Throughout the discussion of simulation, we must keep in mind the fundamental observation aboutthe behaviour of hardware:

Hardware runs in parallel: At each infinitesimally small moment of time, each gate:1. samples its inputs2. computes the value of its output3. drives the output

Before diving into the details of processes, we briefly review gate-level simulation with a simpleexample, which we will then explore in excruciating detail through the semantics of VHDL.

With knowledge of just basic gate-level behaviour, we simulate the circuit below with waveformsfor a and b and calculate the behaviour for c, d, and e.

1.6.2 Temporal Granularities of Simulation 38

a

b

c d

e

a

b

c

d

e

0ns 10ns 12ns 15ns

Different Programs, Same Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

There are many different VHDL programs that will synthesize to this circuit. Three examples are:

process (a,b)begin

c <= a and b;end process;process (b,c,d)begin

d <= not c;e <= b and d;

end process;

process (a,b,c,d)begin

c <= a and b;d <= not c;e <= b and d;

end process;

process (a,b)begin

c <= a and b;end process;process (c)begin

d <= not c;end process;process (b,d)begin

e <= b and d;end process;

The goal of the VHDL semantics is that all of these programs will have the same behaviour.The two main challenges to make this happen are: a value change on a signal must propagateinstantaneously, and all gates must operate in parallel. We will return to these points in section 1.6.4

1.6.2 Temporal Granularities of Simulation

There are several different granularities of time to analyze VHDL behaviour. In this course, wewill discuss three major granularities: clock cycles, timing simulation, and “delta cycles”.register-transfer-level

• smallest unit of time is a clock cycle• combinational logic has zero delay• flip-flops have a delay of one clock cycle• used for simulation early in the design cycle• fastest simulation run times

timing simulation


• smallest unit of time is a nano, pico, or fempto second• combinational logic and wires have delay as computed by timing analysis tools• flip-flops have setup, hold, and clock-to-Q timing parameters• used for simulation when fine-tuning design and confirming that timing contraints are

satisfied• slow simulation times for large circuits

delta cycles• units of time are artifacts of VHDL semantics and simulation software• simulation cycles, delta cycles, and simulation steps are infinitesimally small amounts of

time•VHDL semantics are defined in terms of these concepts

For the remainder of section 1.6, we will look at only the delta-cycle view of the world.

1.6.3 Zero-Delay Simulation

Register-transfer-level and delta-cycle simulation are both examples of zero-delay simulation.

In zero-delay simulation, a sequence of dependent events must appear to happen instantaneously(in zero time). In particular, the effect of an event must propagate instantaneously through combi-national circuitry.

Zero-delay simulation might appear to be simpler than simulation with delays through gates (tim-ing simulation), but in reality, zero-delay simulation algorithms are more complicated than algo-rithms for timing simulation. The reason is that in zero-delay simulation, a sequence of dependentevents must appear to happen instantaneously (in zero time).

There are two fundamental rules for zero-delay simulation:1. Events appear to propagate through combinational circuitry instantaneously.2. All of the gates appear to operate in parallel

The rules for zero-delay simulation say “appear to operate in parallel”, rather than “operate inparallel”, because software executes sequentially, or in serial, as opposed to in parallel (the nextparagraph discusses concurrent software languages). A simulator cannot simulate multiple gatesin parallel. Instead, the simulator must simulate the gates one at a time, but make the waveformsappear as if all of the gates were simulated in parallel.

The characterization of software as purely sequential is, of course, just our simple-minded hardware-oriented perspective on software. In reality, many software languages support multithreading andother forms of parallel execution. However, even moderately sized circuits have more gates thanwould make for a reasonable number of concurrent processes on even a massively parallel super-computer. So, all reasonalble semantics for simulation must provide some mechanism for sequen-tial execution to appear to be parallel execution.

There are many different ways to implement zero-delay simulation. We will study two examples:VHDL’s delta-cycle simulation, and a register-transfer-level simulation algorithm.

1.6.4 Intuition Behind Delta-Cycle Simulation 40

1.6.4 Intuition Behind Delta-Cycle Simulation

1.6.4.1 Introduction to Delta-Cycle Simulation

• To make it appear that events propagate instantaneously through combinational circuitry:VHDL introduces the delta cycle– Infinitesimally small artificial unit of time– In each delta cycle, every gate in the circuit

1. samples its input signals2. computes its result value3. drives the result value on its output signal

• To make it appear that gates operate in parallel: VHDL introduces the projected assign-ment– the effect of simulating a gate remains invisible until the beginning of the next delta cycle

In each delta cycle, the simulator will simulate each gate whose input changed, and thus the outputof the gate must be recomputed to reflect the new input value. The change on the output will causethe next combinational gate to be simulated in the next delta cycle. Events appear to propagateinstantaneously through combinational logic, because all of the delta cycles needed for an event topropagate through the combinational logic are collapsed into a single moment in time.

Recall that at each infinitesimally small moment of time, each gate samples its inputs, computesthe value of its output, and then drives the output. This sequence of sample, compute, and driveoccurs within a delta cycle and is done in parallel for all of the gates. Within a delta cycle, gatesoperate independently. Signals propagate from one gate to another in a sequence of delta cycles.

The simulator must create the appearance that the gates operate in parallel, or independently, withina delta cycle, even though the simulator will in fact simulate one gate at a time. One way to definethat the sequential execution of a set of processes preserves the appearance that the processes areexecutedly independently is that the order in which the processes are executed does not matter. Inother words, the simulator gives the same results regardless of the order in which the processesare executed. Because the simulator always gives the same results, regardless of the order ofprocesses, all orders of execution give the same results as the order in which all processes executesimultaneously (i.e., truly concurrently).

To preserve the illusion that the gates ran in parallel within a delta cycle, with the projected as-signment in VHDL, the effect of simulating a gate remains invisible until the beginning of the nextdelta cycle. Thus, there are no dependencies between gates within a delta cycle.


1.6.4.2 Intuitive Rules for Delta-Cycle Simulation

1. Simulate a gate if any of its inputs changed. (If no input changed, then the current value of theoutput is correct and the output can stay at the same value.)

2. Each gate is simulated at most once per delta cycle.3. When a gate is executed, the projected (i.e., new) value of the output remains invisible until the

beginning of the next delta cycle.4. Increment time when there is no need for another delta cycle (no gate had an input change value

in the current delta cycle).


1.6.4.3 Example of Delta-Cycles: Back-to-Back Buffers

Back-to-back buffers illustrate how VHDL simulation uses delta-cycles to achieve the illusion thatevents propagate instantaneously through combinational circuitry. Without going into the detailsof how delta-cycle simulation works, in this example, it takes three delta cycles for the rising edgeon a to propagate through the circuit. Because a delta-cycle is an infinitesimally small amount oftime, in “real” simulation time (the lower waveform), the rising edges on a, b, and c all appear tohappen at exactly 1 ns.

process (a) beginb <= a;

end process;

process (b) beginc <= b;

end process;

a b

c

a

b

c

1nsδ-cycle δ-cycle δ-cycle

2ns

a

b

c

1ns 2ns

Delta-cycle simulation

Simple simulation


1.6.4.4 Example of Projected Assignment: Back-to-Back Buffers

We now extend the the back-to-back buffers to include the projected assignments.

a

b

c

1ns 2ns

Del

ta-c

ycl

e si

mu

lati

on

wit

h p

roje

cted

valu

es

Sim

ple

sim

ula

tion

1ns 2ns

a

b

c

δ-cycle δ-cycle δ-cycle δ-cycle

projected value (not visible)

current value (visible)

sample current value of inputsS

C compute new projected value

D drive new value (make it visible)

by copying from projected value

to current value

Two copies of each signal: To update a signal with new value:

S

C

D S

C

D

In VHDL, the current value of a signal is updated in the delta-cycle after the projected value iscomputed. This requires one more delta cycle than in the previous, simplified, example of deltacycle simulation with back-to-back buffers, which did not include projected assignment.

1.6.4.5 Example of Projected Assignment: Back-to-Back Flip-Flops

Back-to-back flip-flops illustrate how VHDL uses projected assignments to create the illusion thatgates operate in parallel. Both processes (p b and p c) are sensitive to the clock signal, so bothprocesses will run in the delta-cycle after the clock changes value. It is this delta cycle that wefocus on. When we execute p b, the process will see a=’1’ and will compute a new value of


’1’ for b. Because p b and p c must appear to execute in parallel within a delta cycle, the newvalue of b=’1’ must not be visible to p c in this delta cycle.

When p b runs, the new value for b remains invisible until the beginning of the next delta cycle.Hence, p c will see the old value of b, which is ’0’. The value of ’1’ will propagate from b toc on the next rising edge of the clock, which is not shown.

p_b: process (clk) beginif rising_edge(clk) then

b <= a;end if;

end process;

p_c: process (clk) beginif rising_edge(clk) then

c <= b;end if;

end process;

a b

cD Q D Q

clk

clk

b

c

9ns

a

a

clk

b

10ns 11ns

δ δ

c

δ

proc_b and proc_c appear

to execute in parallel

9ns

10ns 11ns

run both proc_b and proc_c in

the same delta cycle

proc_c must see old value of b

This example illustrates gates appearing to operate in parallel, because the two flip-flops appearto execute at the same time, triggered by the rising edge of the clock. Using the definition of“appearing to execute independently”, that the order in which we run the processes within a deltacycle does not affect the values on the signals at the end of the delta cycle, we can see that regardlessof which order we run the processes, p c sees the old value of b.


1.6.4.6 Example of Projected Assignment with Combinational Loop

This circuit demonstrates how projected assignment simulates combinational loops correctly.

We begin with a truly parallel simulation of the circuit. This figure uses a thick arrow to denotewhen a value is being computed. All of the three gates that are simulated (b, c, and d) sampletheir inputs at the same time, then compute their new value in parallel, and drive their result at thesame time. Because the computation is done in parallel, this figure uses a different notation (thethick arrows) to denote the computation is done.

a

b c d

a

b

c

d

1nsδ δ-cycle δ-cycle δ

1

0

1

1

Fin

al v

alue

simulation of gates b,c, and d

done truly in parallel, therefore

no need for projected assignment

Recall the following about the simulation done in a delta cycle:• The simulation of the gates must be done in parallel.• The execution of each gate must be independent of the other gates.• The choice of in which order to simulate the gates must not affect the result.

We now simulate the circuit using projected assignment with two different orders of gates andthen without projected assignment with two different orders. When we use projected assignment,both orders give the same result, but without projected assignment, different orders give differentresults.

The key is that, with projected assignment, values are visible at beginning of the next delta cycle,while with the incorrect semantics, the assignments are visible as soon as they are executed. Bymaking the assignments invisible until the next delta cycle, the order in which the assignments aredone does not matter.


Combinational Loop: Correct Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

a

b c d

a

b

c

d

1nsδ δ-cycle δ-cycle δ

1

0

1

1

Fin

al v

alu

e

Execution order: b, c, d

a

b

c

d

1nsδ-cycle δ-cycleδ δ

1

0

1

1

Fin

al v

alue

Execution order: b, d, c


Combinational Loop: Incorrect Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

a

b c d

1

0 0 0

Circuit with final values

a

b

c

d

1nsδ-cycleδ

1

0

0

0

Fin

al v

alu

e

δ-cycle

Execution order: b, c, d

a

b c d

1

0 1 1

Circuit with final values

a

b

c

d

1nsδ δ-cycle

1

0

1

1

Fin

al v

alue

δ-cycle

Execution order: b, d, c

Interestingly, even without projected assignment, the final values in each simulation are consistentwith the static functionality of the gates in the circuit (e.g., the output of the AND gate d is ’1’

1.6.5 VHDL Delta-Cycle Simulation 48

exactly when both inputs are ’1’.) This demonstrates that simply checking the final values of asimulation is not a sufficient technique to determine if the simulation was done correctly.

This circuit, and other combinational loops, such as set-reset latches constructed from cross-coupled NOR gates, also demonstrate some of the difficulties and counter-intuitive behaviour thatarise in comparing simulation results that use different notions of time. By assigning differentdelay values to the gates the final values on the gates might be either the same or different from thezero-delay simulation.

1.6.5 VHDL Delta-Cycle Simulation

We have already covered the two most important concepts in the delta-cycle simulation algorithm:delta cycles as an infinitessimally small amount of time and projected assignments where the effectof an assigment to a signal becomes visible at the beginning of the next delta cycle. This sectionflushes out these concepts by connecting them to the syntax of VHDL and filling in some details.

The algorithm presented here is a simplification of the actual algorithm in Section 12.6 of theVHDL Standard. The two most significant simplifications are that this algorithm does not supportdelayed assignments or resolution.

To support delayed assignments, each signal’s projected value is generalized to a projected wave-form (more precisely, VHDL’s term is “projected output waveform”), which is a list containing thevalues and times for multiple projected assignments in the future.

Resolution allows multiple processes to write to the same signal. This is usally a mistake, butit is useful for tri-state busses, where all but one of the processes write a value of ’Z’, and theone process that has been granted permission to write to bus writes a ’1’ or a ’0’. To supportresolution, each signal’s projected value would become a set of values, where each value representsthe value written by one process. At the end of the simulation cycle, the set of values are resolvedinto a single final value. The values of ’1’ and ’Z’ resolve to ’1’. Similarly, ’0’ and ’Z’resolve to ’0’. However, ’1’ and ’0’ resolve to ’X’, indicating that the processes are drivingconflicting values.

In our presentation, we begin with an informal description of the delta-cycle simulation algorithmand illustrate the algorithm with the back-to-back buffer example. We then give the definitions anda somewhat more formal presentation of the delta-cycle simulation algorithm and do the back-to-back flip flops and combinational loop examples.

1.6.5.1 Informal Description of Algorithm

• Processes have three modes:Resumed : The process has work to do and is waiting its turn to execute.Executing : The process is running.Suspended : The process is idle and has no work to do.


•A simulation run is initialization followed by a sequence of simulation rounds• Initialization:

– Each process starts off resumed. This gets the simulation started by giving each process achance to execute.

– Each signal starts off with its default value. The default value of a std logic signal is ’U’,unless the signal is given an initial value in its declaration (e.g., signal s : std logic := ’0’;).

• In each simulation round:– Increment time at the beginning of the simulation round. Time then remains constant until the

next round.– Resume all processes that are waiting for the current time– A simulation round is a sequence of simulation cycles.

• In each simulation cycle:– Copy projected value of signals to current value.– Resume processes based on sensitivity lists and wait conditions.– Execute each resumed process.– If no projected assignment changed the value of a signal, then increment time and start next

simulation round.

At the beginning of the simulation of a normal circuit, processes that drive the clock and otherexternal inputs will assign values to their signals, then the changes to these external inputs willpropagate through the circuit


1.6.5.2 Example: VHDL Simulation of Back-to-Back Buffers

We repeat the back-to-back buffer example from section 1.6.4.4, but now add the time scales(simulation rounds, simulation cycles, and simulation steps) and process modes, to make it a fullydetailed delta-cycle simulation.

proc_a : process begina <= ’0’;wait for 1 ns;a <= ’1’;wait;

end process;

proc_b : process (a)begin

b <= a;end process;

proc_c : process (b)begin

c <= b;end process;

a b

c

old new

’U’ ’0’

’U’ ’U’

’0’ ’1’

graphical

symbol

valuestext

0

U

1

a

b

c

Time

Sim rounds

Sim cycles

proc_a

proc_b

proc_c

U

UU

U

U

U

U

R

R

R

E S

E S

E S

R E S

R E S

0ns 1ns

R E S

R E S

R E S

δ δ δ

2ns

δ δ δ

Time measured in nanoseconds, simulation rounds, and simulation cycles

Process modes: R=resumed, E=executing, S=suspended

First simulation cycle in each simulation round is not a delta cycle

Each column is a simulation step: process mode change or signal change.

Simulation cycle ends when

all processes are suspended.

Simulation round ends (and time increments)

when simulation cycle has no assignments

to projected values.

Initial values: processes=R, signals=’U’

U

U


1.6.5.3 Definitions and Algorithm

Notes on Simulation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .•At a wait statement, the process will suspend even if the condition is true in the current simula-

tion cycle. The process will resume the next time that a signal in the condition changes and thecondition is true.• If we execute multiple assignments to the same signal in the same process in the same simulation

cycle, only the last assignment actually takes effect — all but the last assignment are ignored.• In a simulation round, the first simulation cycle is not a delta cycle.• The mode of a process is determined implicitly by keeping track of the set of processes that

are resumed (the resume set) and the process(es) that is(are) executing. All other processes aresuspended.

VHDL Simulation Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Definition simulation step: Executing one sequential assignment or process modechange.

Definition simulation cycle: The operations that occur in one iteration of the simulationalgorithm.

Definition delta cycle: A simulation cycle that does not advance simulation time.Equivalently: A simulation cycle with zero-delay assignments where the assignmentcauses a process to resume.

Definition simulation round: A sequence of simulation cycles that all have the samesimulation time.

Note: Official and unofficial terminology Simulation cycle and delta cycleare official definitions in the VHDL Standard. Simulation step and simulationround are not standard definitions. We use them because we need words toassociate with the concepts that they describe.


More Formal Description of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

(∗ initialization ∗)set all signals to default value;add to resume set all processes;set time to 0 ns;

(∗ simulation loop ∗)while time < ∞ {

(∗ begin simulation round ∗)add to resume set all processes that are waiting for current time;

while time does not change {(∗ begin simulation cycle ∗)copy projected values of signals to current values;add to resume set any process that:

is sensitive to a signal that changed valueor whose wait-condition became true;

execute all processes in resume set;(∗ assign to projected values of signals ∗)(∗ execute until suspend on a wait statement or sensitivity list ∗)

clear resume set; (∗ resume set = 6© ∗)if none of the executing processes performed a signal assignment then {

increment time to the minimum of the wait times for processes;}

}

}

1.6.5.4 Example: Delta-Cycle Simulation of Back-to-Back Flip-Flops

We now do a full delta-cycle simulation of back-to-back flip flops. This expands on the simplersimulation we did in section 1.6.4.5, where we did delta-cycles and projected assignments infor-mally, but did not include the time scales or process modes.


proc_a : processbegin

a <= ’0’;wait for 9 ns;a <= ’1’;wait;

end process;

proc_clk : processbegin

clk <= ’0’;wait for 10 ns;clk <= ’1’;wait for 10 ns;

end process;

proc_flops : processbegin

wait until rising_edge(clk);b <= a;c <= b;

end process;“re”=rising edge

a b

cD Q D Q

clk

a

clk

b

Time

Sim rounds

Sim cycles

proc_a

proc_flops

0ns

c

U

U

R

R

R

E S

E S

E S

proc_clk

U

U

U

9ns

R E S

10ns

R E S

R E S

U

20ns

R E S

30ns

δδ δ δ δ

U

U

U

Back-to-Back Flip-Flops with If-Rising Edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

We now repeat the back-to-back flip flops example, but compare “if-rising-edge” based code to


“wait until rising-edge” based code. The values on the flip-flops is the same. But, because the“if-rising-edge” process has the clock in its sensitivity list, this process executes whenever thereis a change on the clock signal. In comparison, the “wait until rising-edge” process executes onlywhen there is a rising edge on the clock. When the “if-rising-edge” process executes after a fallingedge of the clock, the process suspends without executing any signal assignments, because theassignments are within the then clause of the “if-rising-edge”.

proc_flops1 : processbegin

wait until rising_edge(clk);b1 <= a;c1 <= b1;

end process;

proc_flops2 : process (clk)begin

if rising_edge(clk) thenb2 <= a;c2 <= b2;

end if;end process;

a

clk

b1

Time

Sim rounds

Sim cycles

proc_a

proc_flops1

c1

proc_clk

10ns

R E S

R E S

20ns

R E S

30ns

δ δ

proc_flops2 R E S

b2

c2

R E S

U

U

UU

δ

U

U

U

U

U

U


1.6.5.5 Example: VHDL Simulation of Combinational Loop

We now do a full VHDL delta-cycle simulation of the combinational loop example that we didinformally in section 1.6.4.6. Notice that the last simulation cycles at 0 ns and 1 ns do not containany signal assignments or process mode changes. These simulation cycles are needed, because theVHDL simulation semantics require a simulation cycle to follow any simulation cycle in which aprojected assignment occurs, even if the projected assignment did not change the projected valueof the signal.

a

b c d

proc_a : process begina <= ’0’;wait for 1 ns;a <= ’1’;wait;

end process;

proc_b : process (a)begin

b <= not( a );end process;

proc_c : process (a,b,d)begin

c <= not( a ) or b or d;end process;

proc_d : process (a,c)begin

d <= a and c;end process;


a

b

c

Time

Sim rounds

Sim cycles

proc_a

proc_b

proc_c

0ns

d

proc_d

R

R

R

E S

E S

E S

R E S

U

U

U

U

U

U

U

U

U

U

U

R

R

R

E S

E S

E S

R

R

E S

E S

δ δ

1ns

δ

These assignments cause this simulation cycle.

1.6.5.6 Rules and Observations for Drawing Delta-Cycle Simulations

The VHDL Language Reference Manual gives only a textual description of the VHDL semantics.The conventions for drawing the waveforms are just our own.• Each column is a simulation step.• In a simulation step, either exactly one process changes mode or exactly one signal changes

value, except in the first two simulation steps of each simulation cycle, when multiple currentvalues may be updated and multiple processes may resume.• If a projected assignment assigns the same value as the signal’s current projected value, the

assignment must still be shown, because this assignment will force another simulation cycle inthe current simulation round.• If a signal’s current value is updated with the same value as it currently has, this assignment is

not shown, because it will not trigger any sensitivity lists.•Assignments to signals may be denoted by either the number/letter of the new value or one of

the edge symbols:

U 0 1

new value

old

valu

e U

0

1


Some observations about delta-cycle simulation waveforms that can be helpful in checking that asimulation is correct:• In the first simulation step of the first simulation cycle of a simulation round (i.e., the first

simulation step of a simulation round), at least one process will resume. This is contrast to thefirst simulation step of all other simulation cycle, where current values of signals are updatedwith projected values.•At the end of a simulation cycle all processes are suspended.• In the last simulation cycle of a simulation round either no signals change value, or any signal

that changes value is not in the sensitivity list of any process.

1.6.6 External Inputs and Flip-Flops

In our work so far with delta-cycle simulation, we have worked through the mechanics of simula-tion. This example applies knowledge of delta-cycle simulation at a conceptual level. We couldanswer the question by thinking about the semantics of delta-cycle simulation or by mechanicalydoing the simulation.

Question: Do the signals b1 and b2 have the same behaviour from 10–20 ns?

architecture mathilde of sauve issignal clk, a, b : std_logic;

beginprocess begin

clk <= ’0’;wait for 10 ns;clk <= ’1’;wait for 10 ns;

end process;process begin

wait for 10 ns;a1 <= ’1’;


wait until rising_edge(clk);a2 <= ’1’;


wait until rising_edge( clk );b1 <= a1;b2 <= a2;

end process;end architecture;

1.6.6 External Inputs and Flip-Flops 58

Answer:

The signals b1 and b2 will have the same behaviour if a1 and a2 have thesame behaviour. The difference in the code between a1 and a2 is that a1 iswaiting for 10ns and a2 is waiting until a rising edge of the clock. There is arising edge of the clock at 10ns, so we might be tempted to conclude(incorrectly) that both a1 and a2 transition from ’U’ to 1 at exactly 10ns andtherefore have exactly the same behaviour.

The difference between the behaviour of a1 and a2 is that in the firstsimulation cycle for 10ns, the process for a1 resumes, while the process fora2 resumes only after the rising edge of clock.

The signal a1 is waiting for 10ns, so in the first simulation cycle for 10ns, theprocess for a1 resumes and executes. Also in the first simulation cycle for10ns, the clock toggles from 0 to 1. This rising edge causes the processes fora2, b1, and b2 to resume and execute in the second simulation cycle.

In the second simulation cycle for 10ns:• a2 changes from ’U’ to 1.• b1 sees the value of 1 for a1, because a1 became 1 in the first simulation

cycle.• b2 sees the old value of ’U’ for a2, because the process for a2 did not

execute in the first simulation cycle.


a1

clk

a2

Time

Sim rounds

Sim cycles

proc_clk

proc_a2

b1

proc_a1

10ns

E SR

20ns

δ δ

proc_b

R

E S

b2U

U

U

U

U

U

U

E S

U

U

R

R

E S

1.7. REGISTER-TRANSFER-LEVEL SIMULATION 60

1.7 Register-Transfer-Level Simulation

Delta-cycle simulation is very tedious for both humans and computers. For many circuits, thecomplexity of delta-cycle simulation is not needed and register-transfer-level simulation, which ismuch simpler, can be used instead.

The major complexities of delta-cycle simulation come from running a process multiple timeswithin a single simulation round and keeping track of the modes of the proceses. Register-transfer-level simulation avoids both of these complexities. By evaluating each signal only once per sim-ulation round, an entire simulation round can be reduced to a single column in a timing diagram.The disadvantage of register-transfer-level simulation is that it does not work for all VHDL pro-grams — in particular, it does not support combinational loops.

1.7.1 Overview

In delta-cycle simulations, we often simulated the same process multiple times within the samesimulation round. In looking at the circuit though, we mentally can calculate the output valueby evaluating each gate only once per simulation round. For both humans and computers (or thehumans waiting for results from computers), it is desirable to avoid the wasted work of simulatinga gate when the output will remain at ’U’ or will change again later in the same simulation round.

In register-transfer-level simulation, we evaluate each gate only once per simulation round. Register-transfer-level simulation is simpler and faster than delta-cycle simuation, because it avoids deltacycles and provisional assignments.

In delta-cycle simulation, we evaluate a gate multiple times in a single simulation round if theprocess that drives the gate is active in multiple simulation cycles, which happens when the processis triggered in multiple simulation cycles. To avoid this, we must evaluate a signal only after all ofthe signals that it depends on have stable values, that is, the signals will not change value later inthe simulation round.

A combinational loop is a circuit that contains a cyclic path through the circuit that includes onlycombinational gates. Combinational loops can cause signals to oscillate, which in delta-cyclesimulation with zero-delay assignments, corresponds to an infinite sequence of delta cycles. Weimmediately see that when doing zero-delay simulation of a combinational loop such asa <= not(a);, the change on a will trigger the process to re-run and re-evaluate a an infinitenumber of times. Hence, register-transfer-level simulation does not support combinational loops.

To make register-transfer simulation work, we preprocess the VHDL program and transform it sothat each process is dependent upon only those processes that appear before it. This dependencyordering is called topological ordering. If a circuit has combinational loops, we cannot sort theprocesses into a topological order.

The register-transfer level is a coarser level of temporal abstraction than the delta-cycle level.In delta-cycle simulation, many delta-cycles can elapse without an increment in real time (e.g.


nanoseconds). In register-transfer-level simulation, all of the events that take place in the samemoment of real time take place at same moment in the simulation. In other words, all of the eventsthat take place at the same time are drawn in the same column of the waveform diagram.

Register-transfer-level simulation can be done for legal VHDL code, either synthesizable or unsyn-thesizable, so long as the code does not contain combinational loops. For any piece of VHDL codewithout combinational loops, the register-transfer-level simulation and the delta-cycle simulationwill have same value for each signal at the end of each simulation round.

By sorting the processes in topological order, when we execute a process, all of the signals that theprocess depends on will have already been evaluated, and so we know that we are reading the final,stable values that each signal will have for that moment in time. This is good, because for mostprocesses, we want to read the most recent values of signals. The exceptions are timed processesthat are dependent upon other timed processes running at the same moment in time and clockedprocesses that are dependent upon other clocked processes.

process begina <= ’0’;wait for 10 ns;a <= ’1’;...

end process;

process beginb <= ’0’;wait for 10 ns;b <= a;...

end process;

Question: In this code, what valueshould b have at 10 ns — does itread the new value of a or the oldvalue?

Answer:Both processes will execute in

the same simulation cycle at 10ns. The statement b <= a willsee the value of a from theprevious simulation cycle, whichis before a <= ’1’; isevaluated. The signal b will be’0’ at 10 ns.

As the above example illustrates, if a clocked process reads the values of signals from processesthat resume at the same time, it must read the previous value of those signals. Similarly, if aclocked process reads the values of signals from processes that are sensitive to the same clock,those processes will all resume in the same simulation cycle — the cycle immediately after therising-edge of the clock (assuming that the processes use if rising edge or wait untilrising edge statements). Because the processes run in the same simulation cycle, they all readthe previous values of the signals that they depend on. If this were not the case, then the VHDLcode for pair of back-to-back flip flops would not operate correctly, because the output of the firstflip-flop would appear immediately at the output of the second flip-flop.

Simulation rounds begin with incrementing time, which triggers timed processes. Therefore, thefirst processes in the topological order are the timed processes. Timed processes may be run in anyorder, and they read the previous values of signals that they depend on. This gives the same effect

1.7.2 Technique for Register-Transfer Level Simulation 62

as in delta-cycle simulation, where the timed processes would run in the same simulation cycle andread the values that signals had before the simulation cycle began.

We then sort the clocked and combinational processes based on their dependencies, so that eachprocess appears (is run) after all of the processes on which it depends.

Although a clocked process may read many signals, we say that a clocked process is dependentupon only its clock signal. It is the change in the clock signal that causes the process to resume.So, as long as the process is run after the clock signal is stable, we can be sure that it will not needto be run again at this time step. Clocked processes may be run in any order. They read the currentvalue of their clock signal and the previous value of the other signals that they depend on. Aswith timed processes, this gives the same effect as in delta-cycle simulation, where the clock edgewould trigger the clocked processes to run in the same simulation cycle and the processes wouldread the values that signals had before the simulation cycle began.

1.7.2 Technique for Register-Transfer Level Simulation

1. Pre-processing

(a) Separate processes into timed, clocked, and combinational

(b) Decompose each combinational process into separate processes with one target signalper process

(c) Sort combinational processes into topological order based on dependencies

2. For each moment of real time:

(a) Run timed processes in any order, reading old values of signals.

(b) Run clocked processes in any order, reading new values of timed signals and old valuesof registered signals.

(c) Run combinational processes in topological order, reading new values of signals.


Combinational Process Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

proc(a,b,c)if a = ’1’ then

d <= b;e <= c;

elsed <= not b;e <= b and c;

end if;end process;

proc(a,b,c)if a = ’1’ then

d <= b;else

d <= not b;end if;

end process;proc(a,b,c)

if a = ’1’ thene <= c;

elsee <= b and c;

end if;end process;

Original code After decomposition

1.7.3 Examples of RTL Simulation

1.7.3.1 RTL Simulation Example 1

We revisit an earlier example from delta-cycle simulation, but change the code slightly and doregister-transfer-level simulation.

1. Original code:proc1: process (a, b, c) begin

d <= NOT c;c <= a AND b;

end process;

proc2: process (b, d) begine <= b AND d;

end process;

proc3: process begina <= ’1’;b <= ’0’;wait for 3 ns;b <= ’1’;wait for 99 ns;

end process;

2. Decompose combinational processes into single-target processes:

1.7.3 Examples of RTL Simulation 64

proc1d: process (c) begind <= NOT c;

end process;

proc1c: process (a, b) beginc <= a AND b;

end process;


end process;

proc1c: process (a, b) beginc <= a AND b;

end process;

proc1d: process (c) begind <= NOT c;

end process;


end process;

Decomposed Sorted

3. To sort combinational processes into topological order, move proc1d after proc1c, be-cause d depends on c.

4. Run timed process (proc3) until suspend at wait for 3 ns;.• The signal a gets ’1’ from 0 to 3 ns.• The signal b gets ’0’ from 0 to 3 ns.

5. Run proc1c• The signal c gets a AND b (’1’ AND ’0’ = ’0’) from 0 to 3 ns.

6. Run proc1d• The signal d gets NOT c (NOT 0 = ’1’) from 0 to 3 ns.

7. Run proc2• The signal e gets b AND d (0 AND 1 = ’0’) from 0 to 3 ns.

8. Run the timed process until suspend at wait for 99 ns;, which takes us from 3ns to102ns.

9. Run combinational processes in topological order to calculate values on c, d, e from 3ns to102ns.

Waveforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

a

b

c

d

e

U

U

U

U

U

1

0

0

1

0

1

1

0

0ns 1ns 2ns 3ns 102ns


Example: Procs with Multiple Waits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

huey: process

begin

clk <= ’1’;

wait for 10 ns;

clk <= ’0’;

wait for 10 ns;

end process;

dewey: process

begin

a <= to_unsigned(0,4);

wait until re(clk);

while (a < 4) loop

a <= a + 1;

wait until re(clk);

end loop;

end process;

louie: process

begin

wait until re(clk);

d <= ’1’;

if (a >= 2) then

d <= ’0’;

wait until re(clk);

end if;

end process;

clk

a

d

I 0 5 10 15 20 25 30 35 40 45 50 55 60 7065 75 80 85 90 95 100 110 120

U

U

U

0

0

1

1

1

1

2 3 4

1 0

0 1

1 0 1

30 50 70 9090

11010

10 30 50

70

90

110

10

20

30

40

50

60

70

80

90

100

110

1.8. SIMPLE RTL SIMULATION IN SOFTWARE 66

A Related Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Small changes to the code can cause significant changes to the behaviour.

riri: process

begin

clk <= ’1’;

wait for 10 ns;

clk <= ’0’;

wait for 10 ns;

end process;

fifi: process

begin

a <= to_unsigned(0,4);

wait until re(clk);

while (a < 4) loop

a <= a + 1;

wait until re(clk);

end loop;

end process;

loulou: process

begin

wait until re(clk);

d <= ’1’;

if (a < 2) then

d <= ’0’;

wait until re(clk);

end if;

end process;

clk

a

d

I 0 5 10 15 20 25 30 35 40 45 50 55 60 7065 75 80 85 90 95 100 110 120

1.8 Simple RTL Simulation in Software

This section describes how we can use a software programming language to write cycle-accuratemodels of hardware systems. We use Python as our example language, but any imperative languagemay be used.

1.8.1 Introductory Examples


Two Registers, Two Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

a

b1

#-----------------------a = 1b = 2#-----------------------for t in range( 20 ) :

#-----------------print( "%s" % (t a, b) )#-----------------a = a + bb = b + 1

#-----------------------

Cyclic Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

We add a cyclic dependency between the registers a and b. Remember that registers must exe-cute in parallel and that VHDL achieves the illusion of parallel execution through projected as-signments. We mimic projected assignments in software by introducing a next copy of eachvariable.

a

b

#-----------------------a = 1b = 2#-----------------------for t in range( 20 ) :

#-----------------print( "%s" % (t, a, b) )#-----------------# projected assignments to nexta_next = a + bb_next = a - b#-----------------# update current from nexta = a_nextb = b_next#-----------------

1.8.2 Regs and Comb

We explore several alternative coding styles to support circuits with both combinational and regis-tered hardware with cyclic depencencies among the registers.

1.8.2 Regs and Comb 68

Regs and Comb (use next) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In our first approach, we follow the projected assignment coding style.

The disadvantage of this approach is that it requires us to initialize combinational variables.

a

b2

1

c

d

#-----------------------a = 1b = 2c = 2 * bd = c + 1#-----------------------for t in range( 20 ) :

#-----------------# execute registersa_next = a + db_next = a - c#-----------------# drive registersa = a_nextb = b_next#-----------------# execute combc = 2 * bd = c + 1

Regs and Comb (best style) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

We improve upon the previous version of the code by eliminating the need to initialize the combi-national variables. We recognize that in the steady-state of the simulation run, the instructions inthe simulation loop just execute one after the other, and it does not matter which instruction is atthe top of the loop. We eliminate the combinational initialization code by rotating the instructionsin the loop such that the combinational datapath instructions are at the top of the loop. We can thendelete the combinational initialization without affecting the behaviour of the system.


a

b2

1

c

d

#-----------------------a = 1b = 2#-----------------------for t in range( 20 ) :

#-----------------# execute combc = 2 * bd = c + 1#-----------------# execute registersa_next = a + db_next = a - c#-----------------# drive registersa = a_nextb = b_next

Regs and Comb (RTL sim style) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

As an alternative to what we have seen so far, here we follow the style of our RTL simulationalgorithm, where registered variables read the old values of other registered variables. We addvariables that keep track of the old, or previous, values of registered variables. When we read avariable, we read its old value and then update the current values with the old values at the endof the clock cycle. This approach has some disadvantages. We show it to illustrate the dualitybetween next and prev values of variables: two mechanisms to achieve the same behaviour.

This approach has two disadvantages:•We must initialize combinational variables.•When reading variables, we must distinguish between registered and combinational variables.

It is generally preferable to distinguish between registered and combinational variables when writ-ing to a variable rather than when reading it. Two main reasons are:• For realistic-size circuits, we generally read a variable more times than we assign to it, so it

requires less typing and is less susceptible to mistakes.• In hardware, the distinction between combinational and registered signals is made in the circuitry

that drives the signal, not the other gates that read the signal.

1.8.3 Inputs 70

a

b2

1

c

d

#-----------------------# initialize registersa = 1b = 2#-----------------------# initialize combc = 2 * bd = c + 1#-----------------------for t in range( 20 ) :

#-----------------# hold old valuesa_prev = ab_prev = b#-----------------# execute registersa = a_prev + db = a_prev - c#-----------------# execute combc = 2 * bd = c + 1

1.8.3 Inputs

We model inputs by loading values into an array, then reading the values one-by-one from thearray.

a

b

i_c

#-----------------------f = open( "input.txt", ’r’ )i = 0for v in f.readlines() :

i_c[i] = v.strip()i = i + 1

#-----------------------a = 1b = 2#-----------------------for t in range( 20 ) :

c = i_c[t]#-----------------print( "%s" % (t, a, b, c) )#-----------------a = a + bb = b + c


1.8.4 Pipeline

Pipelines are a special style of system that allows us to dispense with the next version of vari-ables. By assigning values to registers in reverse topological order (from the end of the pipelineback to the front), we can assign values directly to the current-value variables.

We first show a model for the pipeline written using next-variables. The assignments are done inreverse order, from back to front. This is possible because the use of the next-variables ensuresthat there are not any dependencies between these assignments, and so all orders of execution willproduce the same results.

With this order of assignments to the next-variables, there is no dependency between the assignentto a next-variable in execution and the driving of the current variable. For example, we read fromc in the line before we write to c next, and c is not read until we drive it from c next. Thus,we can remove c next and the executing assignment can drive c directly.

We can see that this technique is correct also by remembering rules for register-transfer-level sim-ulation: registered signals read the old values of registered signals and combinational signals readthe new values of all signals. By doing the registered assignments in reverse order, each variablesees the old values of the other registers.

Fa G

b

H

c

eI

d

1.8.4 Pipeline 72

#-----------------------# initializationb = 0c = 0d = 0#-----------------------for t in range( 20 ) :

#-----------------print( "%s" % ... )#-----------------# execute regsd_next = H( c )c_next = G( b )b_next = F( a[t] )#-----------------# drive regsb = b_nextc = c_nextd = d_next#-----------------# execute combe = I( d )

With next variables

#-----------------------# initializationb = 0c = 0d = 0#-----------------------for t in range( 20 ) :

#-----------------print( "%s" % ... )#-----------------# execute and driveb = F( a[t] )c = G( b )d = H( c )e = I( d )

Without next variables

Pipeline with Comb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

When we introduce combinational variables (e.g., c below) into the design, the order of assign-ments becomes a bit more complicated, but is still systematic. We do the registers in reverse-order,then within each stage (between adjacent registers), we do the combinational signals in topologicalorder.

We need to initialize the registers, but do not need to initialize the combinational variables, becausethe combinational variables are executed before they are read.


G Hc

Jd

e

f

I

Fa

b

#-----------------------# initialize registersb = 0d = 0e = 0#-----------------------for t in range( 20 ) :

f = J( d, e )c = G( b )d = H( c )e = I( c )b = F( a )

Pipeline with Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

When we introduce feedback into a pipeline (e.g., from c to a below), we no longer can rely solelyon performing the registered assignments in reverse topological order.

The variable that is fed back to an earlier stage must use a projected assignment to break thefeedback loop. All other variables are unaffected.

a

G

b

H

c d

F I

#-----------------------for t in range( 20 ) :

#-----------------print( "%s" % (t, ...) )#-----------------d = I( c )c_next = H( b )b = G( a )a = F( c )#-----------------c = c_next

1.9 Variables in VHDL

This is an advanced section.It is not covered in the course

and will not be tested.

Variables in VHDL have the same semantics as variables in a software language. Variables maybe declared inside processes, functions, and procedures. Variables should not be declared insidearchitectures. For a variable to be declared in an architecture, it must be a shared variable, andshared variables are not synthesizable.

1.9.1 Semantics 74

1.9.1 Semantics•Variables are updated immediately. More precisely, in contrast to signals, variables have only a

current value, not a separate projected value and current value. The value of a variable is visible(driven) in the same simulation cycle, immediately after the variable assignment statement isexecuted. This behaviour causes variables to act like combinational hardware.•Variables hold their value until they are assigned a new value. In this respect, variables act like

registers or latches.

1.9.2 Usage of Variables

The inconsistent behaviour in variables acting like both combinational hardware and registers/latchesmakes variables potentially risky to use in code that is intended to be synthesized.•Difficult to predict what hardware will be synthesized.•May get quite different hardware from different tools.• Easy to write code that is synthesizable by some tools and not by others.

Any behaviour and circuit that can be modeled using variables can be modeled using only signals.Variables are never necessary; they are only a convenience to be exploited when using signalswould be cumbersome.

Recommendation: use variables only when you need combinational hardware inside a clockedprocess.

The example below illustrates the acceptable use of a variable, and an equivalent circuit using onlysignals.

processvariable v : std_logic;

beginwait until rising_edge(clk);r1 <= a;r2 <= b;v := r1 xor r2;r3 <= not v;

end process;Intermediate variable

v <= r1 xor r2;

processbegin

wait until rising_edge(clk);r1 <= a;r2 <= b;r3 <= not s;

end process;Intermediate signal

The dual combinational/registered nature of variables can be seen in the program below, wherethe variable v is synthesized into two separate pieces of hardware, one combinational and oneregistered.


processvariable v : std_logic;

beginwait until rising_edge(clk);if a = ’1’ then

v := b;else

v := c;wait until rising_edge(clk);

end if;z <= v;

end process;

1.10 Delta-Cycle Simulation with Delays

This is an advanced section.It is not covered in the course

and will not be tested.

Assignments with delays (e.g., b <= a after 2 ns;) are used to model delays through gatesand wires in circuits. Simulation with delays is often called timing simulation, because the simu-lation captures both values and the timing of the circuit.

1.10.1 Transport and Inertial Delay

Transport delay models the time it takes for an edge or value to propagate along the gates andwires between the signals that are read and the target signal.

Inertial delay models the phenomenon that physical devices have “inertia” and cannot switch in-stantaneously from one value to another. Glitches or pulses that are shorter in duration than theinertial delay are deleted.

Current time when

assignment is executed.

transport delay

inertial delay

rejection window

a assigned value of ’1’

Existing values of a before rejection window

are unaffected by assignment.

Assign a a value of ’1’ with transport and inertial delays.

1.10.2 Delayed Assignment Semantics 76

The rejection window is the period of time before the new value arrives where old values will bedeleted if they would result in a glitch or pulse. The difference between the rejection window andthe inertial delay is a delay value that is relative to the transport delay, while the rejection windowis absolute window of time (start time and stop time):

Tp = transport delay

T i = inertial delay

Tr = rejection window

Tr.begin = T p−T i

T r.end = T p

A sample assignment with the value ’1’ showing a transport delay of 10 ns and an inertial delayof 3 ns:

b <= reject 3 ns inertial ’1’ after 10 ns;

10ns

3ns

The default value for inertial delay is 0 ns. The two statements below are equivalent:

b <= ’1’ after 10 ns;b <= reject 0 ns inertial ’1’ after 10 ns;

10ns

The keyword reject may be ommitted if the inertial delay is equal to the transport delay. Thetwo statements below are equivalent:

b <= inertial ’1’ after 10 ns;b <= reject 10 ns inertial ’1’ after 10 ns;

10ns

1.10.2 Delayed Assignment Semantics

The use of delayed assignments requires us to extend the notion of projected values (section 1.6) toprojected waveforms. The VHDL Language Reference Manual uses the phrase “projected outputwaveform”, but for simplicity we use just “projected waveform”. Each signal has a projectedwaveform, which describes the delayed assignments that are projected to happen in the future.More precisely, a waveform is a sequence of transactions, and a transaction is a (value, time) pair.

When a signal assignment is executed, some of the target’s existing transactions may be deleted,according to the rules below:


Time relative to rejection windowBefore During After

Existing projected transactions Preserved * Deleted

* Existing transactions that occur during the rejection window are pre-served if they:• have the same value as the first new transaction• and are not followed by existing transactions with different values.

Simple Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The figure below shows some simple examples of how the target signal’s projected waveform (thelhs) is updated by the expression on the right hand side of the assignment.

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

Right-hand side of assignment (rhs)

Ex

isti

ng p

roje

cted

wav

efo

rm o

f ta

rget

sig

nal

(lh

s)

1.10.2 Delayed Assignment Semantics 78

Complex Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Most of the complexities in understanding the rules for projected waveforms stem from issues ofwhich transactions are deleted from the target’s existing projected waveform.

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

lhs

rhs

res

Sampling Signals in Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Signals in the right-hand-side expression are evaluated at the time that the assignment is executed.


a

b

c

c <= a after 2 ns, b after 4 ns;

Current time when assignment is executed.

Current value

Current value

Projected waveform

t t+2ns t+4ns

1.10.3 Simulation Examples

In delta-cycle simulation, when we increment time, we need to check both for processes that needto resume (as before), and signals that need to update their current value.

As we execute delayed assignments, the projected waveform of the target signals evolve with theaddition and deletion of transactions.

Transport Delay 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

This example illustrates simulation with a transport delay. The signal a and b do not have delayedassignments, so we simulate these signals exactly as we have done before. For the projected valueof c, we need to keep track of a sequence of projected values that will occur in the future, and sowe need to keep track of multipe (value,time) pairs.

process begina <= ’0’;b <= ’0’;wait for 10 ns;a <= ’1’;wait for 4 ns;b <= ’1’;wait;

end process;

process (a,b) beginc <= a xor b after 5 ns;

end process;

1.10.3 Simulation Examples 80

a

b

c

Time

Sim rounds

Sim cycles

proc_1

proc_2

0ns

R

R

U

U

U

U

U

U

E S

E

(U,5)

S R E S

(0,5)

5 10ns

R E S

R E S

(1,15)

14ns

R E S

R E S

(0,19)

15ns

(U,5) (1,15) (0,19)

19ns

δ δ δ

0 ns+1δ This is the second assignment of a value to c at 5 ns, and so this assignent overwritesthe previous one.

5 ns The projected value of ’0’ for c is copied to the visible value. This is an unusual simulationround, because no processes executed.

14 ns+1δ We have two transactions in our projected waveform for c: a value of ’1’ at 15 ns anda value of ’0’ at 19 ns.

15 ns We update the current value of c from its projected waveform and so delete the transactionthat was scheduled for 15 ns.

19 ns Similar to at 15 ns, we update the current value of c and delete the corresponding transac-tion.

Transport Delay 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

process begina <= ’0’;b <= ’1’;c <= ’1’;wait for 5 ns;c <= ’0’;wait;

end process;

process (a,b,c) beginif c = ’1’ then

d <= a after 9 ns;else

d <= b after 3 ns;end if;

end process;


a

b

c

Time

Sim rounds

Sim cycles

proc_1

proc_2

0ns

R

R

U

U

U

U

U

U

E S

E S R E S

3ns 5ns

R E S

R E S

δ δ

dU

U

(U,3) (0,9) (0,9)(U,3) (1,8)(0,9)

8ns

5 ns+1δ We execute d <= b after 3 ns. The existing transaction for d is (’0’,9 ns).The new transaction is (’1’, 8 ns). Because the existing transaction is projected tooccur after the new transaction, the existing transaction is deleted.

Transport and Inertial Delay 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Including an inertial delay does not affect the behaviour, so long as there are no pulses whose widthis less than the inertial delay. In this example, c has ’1’ pulse that is 4 ns long and an inertialdelay of 3 ns.


end process;

process (a,b) beginc <= reject 3 ns inertial a xor b after 5 ns;

end process;

1.10.3 Simulation Examples 82

a

b

c

Time

Sim rounds

Sim cycles

proc_1

proc_2

0ns

R

R

U

U

U

U

U

U

E S

E

(U,5)

S R E S

(0,5)

5 10ns

R E S

R E S

(1,15)

14ns

R E S

R E S

(0,19)

15ns

(U,5) (1,15) (0,19)

19ns

δ δ δ

Transport and Inertial Delay 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In this second example of transport and inertial delay, there is only 1 ns between a change on a at10 ns and a change on b at 11 ns, which would result in a 1 ns pulse on a. But, the inertial delay of3 ns cancels out this pulse at 11 ns+1δ when the assignment to b becomes visible.


end process;

process (a,b) beginc <= reject 3 ns inertial a xor b after 5 ns;

end process;

a

b

c

Time

Sim rounds

Sim cycles

proc_1

proc_2

0ns

R

R

U

U

U

U

U

U

E S

E

(U,5)

S R E S

(0,5)

5 10ns

R E S

R E S

(1,15)

11ns

R E S

R E S

(0,16)

16ns

(U,5) (1,15)


1.10.4 Waveform Expressions

The right-hand-side expressions may contain multiple transactions:

a <= ’0’ after 2 ns, ’1’ after 5 ns;

lhs

rhs

res

Old projected waveform of target

Waveform expression

New projected waveform of target

There is only one inertial delay for the waveform expression:

a <= reject 1 ns inertial ’0’ after 2 ns, ’1’ after 5 ns;

lhs

rhs

res


Waveform expression


t+2ns t+5ns

The delays between transactions in the waveform expression do not need to be consistent with theinertial delay, in that the delays between transactions may be less than the inertial delay:

a <= reject 3 ns inertial ’0’ after 5 ns, ’1’ after 7 ns;

lhs

rhs

res


Waveform expression


t+2ns t+5ns t+7ns

1.11. VHDL AND HARDWARE BUILDING BLOCKS 84

1.11 VHDL and Hardware Building Blocks

This section outlines the building blocks for register transfer level design and how to write VHDLcode for the building blocks.

1.11.1 Basic Building Blocks

(also: n-to-1 muxes)

2:1 mux

WE

A0

DI0

DO0

A1 DO1

WE

A

DI

DO CE

S

R D Q

Hardware VHDLAND, OR, NAND, NOR, XOR,XNOR

and, or, nand, nor, xor, xnor

multiplexer if-then-else, case statement,selected assignment, conditional as-signment

adder, subtracter, negater +, -, -shifter, rotater sll, srl, sla, sra, rol, rorflip-flop wait until, if-then-else,

rising edge

memory array, register file, queue 2-d array or library component

Figure 1.15: RTL Building Blocks


1.11.2 Deprecated Building Blocks for RTL

Some of the common gates you have encountered in previous courses should be avoided whensynthesizing register-transfer-level hardware, particularly if FPGAs are the implementation tech-nology.

1.11.2.1 An Aside on Flip-Flops and Latches

flip-flop Edge sensitive: output only changes on rising (or falling) edge of clocklatch Level sensitive: output changes whenever clock is high (or low)

A common implementation of a flip-flop is a pair of latches (Master/Slave flop).

Latches are sometimes called “transparent latches”, because they are transparent (input directlyconnected to output) when the clock is high.

The clock to a latch is sometimes called the “enable” line.

There is more information in the course notes on timing analysis for storage devices (section 8.3).

1.11.2.2 Deprecated Hardware

Latches•Use flops, not latches• Latch-based designs are susceptible to timing problems• The transparent phase of a latch can let a signal “leak” through a latch — causing the

signal to affect the output one clock cycle too early• It’s possible for a latch-based circuit to simulate correctly, but not work in real hardware,

because the timing delays on the real hardware don’t match those predicted in synthesis

T, JK, SR, etc flip-flops• Limit yourself to D-type flip-flops• Some FPGA and ASIC cell libraries include only D-type flip flops. Others, such as Al-

tera’s APEX FPGAs, can be configured as D, T, JK, or SR flip-flops.

Tri-State Buffers•Use multiplexers, not tri-state buffers• Tri-state designs are susceptible to stability and signal integrity problems•Getting tri-state designs to simulate correctly is difficult, some library components don’t

support tri-state signals• Tri-state designs rely on the code never letting two signals drive the bus at the same time• It can be difficult to check that bus arbitration will always work correctly•Manufacturing and environmental variablity can make real hardware not work correctly

even if it simulates correctly

1.11.3 Hardware and Code for Flops 86

• Typical industrial practice is to avoid use of tri-state signals on a chip, but allow tri-statesignals at the board level

Note: Unfortunately and surprisingly, PalmChip has been awarded aUS patent for using uni-directional busses (i.e. multiplexers) for system-on-chip designs. The patent was filed in 2000, so all fourth-year designprojects since 2000 that use muxes on FPGAs will need to pay royalties toPalmChip

1.11.3 Hardware and Code for Flops

1.11.3.1 Flops with Waits and Ifs

The two code fragments below synthesize to identical hardware (flops).

If

process (clk)beginif rising_edge(clk) then

q <= d;end if;

end process;

Waitprocessbegin

wait until rising_edge(clk);q <= d;

end process;

1.11.3.2 Flops with Synchronous Reset

The two code fragments below synthesize to identical hardware (flops with synchronous reset).Notice that the synchronous reset is really nothing more than an AND gate on the input.

If


if (reset = ’1’) thenq <= ’0’;

elseq <= d;

end if;end if;

end process;

Wait

processbegin

wait until rising_edge(clk);if reset = ’1’ then

q <= ’0’;else

q <= d;end if;

end process;


1.11.3.3 Flops with Chip-Enable

The two code fragments below synthesize to identical hardware (flops with chip-enable lines).

If


if ce = ’1’ thenq <= d;

end if;end if;

end process;

Waitprocessbegin

wait until rising_edge(clk);if ce = ’1’ then

q <= d;end if;

end process;

1.11.3.4 Flop with Chip-Enable and Mux on Input

The two code fragments below synthesize to identical hardware (flops with chip-enable lines andmuxes on inputs).

Ifprocess (clk)beginif rising_edge(clk) then

if ce = ’1’ thenif sel = ’1’ thenq <= d1;

elseq <= d0;

end if;end if;

end if;end process;

Waitprocessbegin

wait until rising_edge(clk);if ce = ’1’ then

if sel = ’1’ thenq <= d1;

elseq <= d0;

end if;end if;

end process;

1.11.4 An Example Sequential Circuit 88

1.11.3.5 Flops with Chip-Enable, Muxes, and Reset

The two code fragments below synthesize to identical hardware (flops with chip-enable lines,muxes on inputs, and synchronous reset). Notice that the synchronous reset is really nothingmore than a mux, or an AND gate on the input.

Note: The specific combination and order of tests is important to guaranteethat the circuit synthesizes to a flop with a chip enable, as opposed to a level-sensitive latch testing the chip enable and/or reset followed by a flop.

Note: The chip-enable pin on the flop is connected to both ce and reset.If the chip-enable pin was not connected to reset, then the flop would ignorereset unless chip-enable was asserted.

Ifprocess (clk)beginif rising_edge(clk) then

if ce = ’1’ or reset =’1’ thenif reset = ’1’ thenq <= ’0’;

elsif sel = ’1’ thenq <= d1;

elseq <= d0;

end if;end if;

end if;end process;

Waitprocessbegin

wait until rising_edge(clk);if ce = ’1’ or reset = ’1’ then

if reset = ’1’ thenq <= ’0’;

elsif sel = ’1’ thenq <= d1;

elseq <= d0;

end if;end if;

end process;

1.11.4 An Example Sequential Circuit

There are many ways to write VHDL code that synthesizes to the schematic in figure ??. Themajor choices are:1. Categories of signals

(a) All signals are outputs of flip-flops or inputs (no combinational signals)(b) Signals include both flopped and combinational

2. Number of flopped signals per process(a) All flopped signals in a single process(b) Some processes with multiple flopped signals(c) Each flopped signal in its own process

3. Style of flop code(a) Flops use if statements


(b) Flops use wait statements

Some examples of these different options are shown in figures ??– ??.

S

R

S

R

sel reset

clk

c

a

entity and_not_reg isport (

reset,clk,sel : in std_logic;c : out std_logic

);end;

Schematic and entity for examples of different code organizations in Figures ??– ??

Figure 1.16: Schematic and entity for and not reg

One Process, Flops, Wait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

architecture one_proc of and_not_reg issignal a : std_logic;

beginprocess begin


a <= ’0’;elsif sel = ’1’ then

a <= not a;else

a <= a;end if;c <= not a;

end process;end one_proc;

Figure 1.17: Implementation of Figure ??: all signals are flops, all flops in one process, flops use waits

1.11.4 An Example Sequential Circuit 90

Two Processes, Flops, Wait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

architecture two_proc_wait of and_not_reg issignal a : std_logic;

beginprocess begin



a <= not a;else

a <= a;end if;


wait until rising_edge(clk);c <= not a;

end process;end two_proc_wait;

Figure 1.18: Implementation of Figure ??: all signals are flops, one flop per process, flops use waits


Two Processes with If-Then-Else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

architecture two_proc_if of and_not_reg issignal a : std_logic;

beginprocess (clk)begin

if rising_edge(clk) thenif reset = ’1’ then


a <= not a;else

a <= a;end if;

end if;end process;process (clk)begin

if rising_edge(clk) thenc <= not a;

end if;end process;

end two_proc_if;

Figure 1.19: Implementation of Figure ??: all signals are flops, one flop per process, flops use if-then-else

1.12. SYNTHESIZABLE VS NON-SYNTHESIZABLE CODE 92

Concurrent Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

architecture comb of and_not_reg issignal a, b, d : std_logic;

beginprocess (clk) begin

if rising_edge(clk) thenif reset = ’1’ then

a <= ’0’;else

a <= d;end if;

end if;end process;process (clk) begin

if rising_edge(clk) thenc <= not a;

end if;end process;d <= b when sel = ’1’ else a;b <= not a;

end comb;

Figure 1.20: Implementation of Figure ??: flopped and combinational signals, one flop per process, flops use if-then-else

1.12 Synthesizable vs Non-Synthesizable Code

For us to consider a VHDL progam synthesizable, all of the conditions below must be satisfied:• the program must be theoretically implementable in hardware• the hardware that is produced must be consistent with the structure of the source code• the source code must be portable across a wide range of synthesis tools, in that the synthesis

tools all produce correct hardware

Synthesis is done by matching VHDL code against templates or patterns. It’s important to useidioms that your synthesis tools recognizes. If you aren’t careful, you could write code that hasthe same behaviour as one of the idioms, but which results in inefficient or incorrect hardware.section 1.11 described common idioms and the resulting hardware.

Most synthesis tools agree on a large set of idioms, and will reliably generate hardware for theseidioms. This section is based on the idioms that Synopsys, Xilinx, Altera, and Mentor Graphicsare able to synthesize.


1.12.1 Initial Values

Initial values on signals (UNSYNTHESIZABLE)

signal bad_signal : std_logic := ’0’;

Reason: In most implementation technologies, when a circuit powers up, the values on signalsare completely random. Some FPGAs are an exception to this. For some FPGAs, when a chip ispowered up, all flip flops will be ’0’. For other FPGAs, the initial values can be programmed.

1.12.2 Wait For

Wait for length of time (UNSYNTHESIZABLE)

wait for 10 ns;

Reason: Delays through circuits are dependent upon both the circuit and its operating environment,particularly supply voltage and temperature.

1.12.3 Variables

processvariable bad : std_logic;

beginwait until rising_edge(clk);bad := not a;d <= bad and b;e <= bad or c;

end process;•Use signals, do not use variables

reason The intention of the creators of VHDL was for signals to be wires and variables to bejust for simulation. Some synthesis tools allow some uses of variables, but when usingvariables, it is easy to create a design that works in simulation but not in real hardware.

1.12.4 Bits and Booleans

signal bad1 : bit;signal bad2 : boolean;•Use std_logic signals, do not use bit or Boolean signals.

reason std_logic is the most commonly used signal type across synthesis tools and simu-lation tools.

1.12.5 Assignments before Wait Statement 94

1.12.5 Assignments before Wait Statement

If a synthesizable clocked process has a wait statement, then the process must begin with a waitstatement.

processc <= a;wait until rising edge(clk);d <= b;wait until rising edge(clk);

end process;Unsynthesizable

processwait until rising edge(clk);d <= b;wait until rising edge(clk);c <= a;

end process;Synthesizable

Reason: In simulation, any assignments before the first wait statement will be executed in thefirst delta-cycle. In the synthesized circuit, the signals will be outputs of flip-flops and will first beassigned values after the first rising-edge. To maintain equivalent behaviour between simulationand synthesis, most synthesis tools require that no assigments appear before the first wait statementin a process.

1.12.6 Different Wait Conditions

wait statements with different conditions in a process (UNSYNTHESIZABLE)

-- different clock signalsprocessbegin

wait until rising_edge(clk1);x <= a;wait until rising_edge(clk2);x <= a;

end process;

-- different clock edgesprocessbegin

wait until rising_edge(clk);x <= a;wait until falling_edge(clk);x <= a;

end process;

Detailed reason: processes with multiple wait statements are turned into finite state machines. Thewait statements denote transitions between states. The target signals in the process are outputs offlip flops. Using different wait conditions would require the flip flops to use different clock signalsat different times. Multiple clock signals for a single flip flop would be difficult to synthesize,inefficient to build, and fragile to operate.

1.12.7 Multiple “if rising edge” in Process

Multiple if rising edge statements in a process (UNSYNTHESIZABLE)


process (clk)begin

if rising_edge(clk) thenq0 <= d0;

end if;if rising_edge(clk) then

q1 <= d1;end if;

end process;

Reason: The idioms for synthesis tools generally expect just a single if rising edge state-ment in each process. The simpler the VHDL code is, the easier it is to synthesize hardware.Programmers of synthesis tools make idiomatic restrictions to make their jobs simpler.

1.12.8 “if rising edge” and “wait” in Same Process 96

1.12.8 “if rising edge” and “wait” in Same Process

An if rising edge statement and a wait statement in the same process (UNSYNTHESIZ-ABLE)

processbegin


end if;wait until rising_edge(clk);q0 <= d1;

end process;

Reason: The idioms for synthesis tools generally expect just a single type of flop-generating state-ment in each process.

1.12.9 “if rising edge” with “else” Clause

The if statement has a rising edge condition and an else clause (UNSYNTHESIZABLE).

process (clk)begin


elseq0 <= d1;

end if;end process;

Reason: The idioms for the synthesis tools expect a signal to be either registered or combinational,not both.

1.12.10 Loop with Both Comb and Clocked Paths

loops where some paths are clocked and some are not (UNSYNTHESIZABLE)


process beginwhile c /= ’1’ loop

if b = ’1’ thenwait until rising_edge(clk);e <= d;

elsee <= not d;

end if;end loop;e <= b;

end process;

Reason: if the loop condition is true and the if-then-else condition is false, then the combinationalpath is taken and the process will get stuck in an infinite loop going through the combinationalpath.

1.12.11 “wait” Inside of a “for loop” 98

1.12.11 “wait” Inside of a “for loop”

wait statements in a for loop (UNSYNTHESIZABLE)

processbegin

for i in 0 to 7 loopwait until rising_edge(clk);x <= to_unsigned(i,4);

end loop;end process;

Reason: Idiom of synthesis tools; while-loops with the same behaviour are synthesizable.

Synthesizable Alternative to Wait-Inside-For . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

while loop (synthesizable)

This is the synthesizable alternative to the the wait statement in a for loop above.

processbegin

-- output values from 0 to 4 on i-- sending one value out each clock cyclei <= to_unsigned(0,4);wait until rising_edge(clk);while (4 > i) loop

i <= i + 1;wait until rising_edge(clk);

end loop;end process;

1.13 Guidelines for Desirable Hardware

It is possible to write code that is synthesizable, but undesireble. This sections describes ourguidelines for writing synthesizable code that will result in “desireable” hardware. Our codingguidelines are designed for creating circuits will work well for a wide range of implementationtechnologies from low-end FPGAs to high-speed ASICs.

Remember, there is a world of difference between getting a design to work in simulation andgetting it to work on a real FPGA. And there is also a huge difference between getting a designto work in an FPGA for a few minutes of testing and getting thousands of products to work formonths at a time in thousands of different environments around the world.


Finally, note that there are exceptions to every rule. You might find yourself in a circumstancewhere your particular situation (e.g. choice of tool, target technology, etc) would benefit frombending or breaking a guideline here. Within E&CE 327, of course, there won’t be any suchcircumstances.

Our list of undesirable hardware features is:• latches• asynchronous resets• combinational loops• using a data signal as a clock• using a clock signal as data• tri-state buffers and signals•multiple drivers for a signal

We limit our definition of bad practice to code that produces undesirable hardware. The guidelinesdo not address coding styles that lead to inefficient hardware. Inefficient or unoptimized hardwaremight be useful in the early stages of the design process, when the focus is on functionality and notoptimality. As such, inefficient code is not considered bad practice. Poor coding styles that do notaffect the hardware, for example, including extraneous signals in a sensitivity list, should certainlybe avoided, but fall into the general realm of programming guidelines and will not be discussed.

1.13.1 Know Your Hardware

The most important guideline is: know what you want the synthesis tool to build for you.• For every signal in your design, know whether it should be a flip-flop or combinational. Check

the output of the synthesis tool see if the flip flops in your circuit match your expectations, andto check that you do not have any latches in your design.• If you cannot predict what hardware the synthesis tool will generate, then you probably will be

unhappy with the result of synthesis.

1.13.2 Latches 100

1.13.2 Latches

Combinational if-then without else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

process (a, b)begin

if (a = ’1’) thenc <= b;

end if;end process;

• For a combinational process, every signal that is assigned to, must be assigned to in every branchof if-then and case statements.reason If a signal is not assigned a value in a path through a combinational process, then that

signal will be a latch.note For a clocked process, if a signal is not assigned a value in a clock cycle, then the flip-flop

for that signal will have a chip-enable pin. Chip-enable pins are fine; they are available onflip-flops in essentially every cell library.

Signals Missing from Sensitivity List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

process (a)begin

c <= a and b;end process;

• For a combinational process, the sensitivity list should contain all of the signals that are read inthe process.reason Gives consistent results across different tools. Many synthesis tools will implicitly

include all signals that a process reads in its sensitivity list. This differs from the VHDLStandard. A synthesis tool that adheres to the standard will either generate an error or willcreate hardware with latches or flops clocked by data sigansl if not all signals that are readfrom are included in the sensitivity list.

exception In a clocked process using an if rising edge, it is acceptable to have only theclock in the sensitivity list


1.13.3 Asynchronous Reset

In an asynchronous reset, the test for reset occurs outside of the test for the clock edge.

process (reset, clk)begin

if (reset = ’1’) thenq <= ’0’;

elsif rising_edge(clk) thenq <= d;

end if;end process;•All reset signals should be synchronous.

reason If a reset occurs very close to a clock edge, some parts of the circuit might be reset inone clock cycle and some in the subsequent clock cycle. This can lead the circuit to be outof sync as it goes through the reset sequence, potentially causing erroneous internal stateand output values.

1.13.4 Combinational Loops

A combinational loop is a cyclic path of dependencies through one or more combinational pro-cesses.

process (a, b, c) beginif a = ’0’ then

d <= b;else

d <= c;end if;

end process;

process (d, e) beginb <= d and e;

end process;• If you need a signal to be dependent on itself, you must include a register somewhere in the

cyclic path.reason Combinational loops are almost always unstable, in that the value on a signal in the

loop is unpredictable and can change over time, even if the none of the inputs change.note Registered loops are fine.note Internally, the implementations of flip-flops and other storage devices use combinational

loops, but these loops are built and analyzed at the analog level to ensure that they arestable.

1.13.5 Using a Data Signal as a Clock 102

1.13.5 Using a Data Signal as a Clock

process beginwait until rising_edge(clk);count <= count + 1;

end process;

process beginwaiting until rising_edge( count(5) );b <= a;

end process;•Data signals should be used only as data.

reason All data assignments should be synchronized to a clock. This ensures that the timinganalysis tool can determine the maximum clock speed accurately. Using a data signal as aclock clock signals can lead to unpredictable delays between different assignments, whichmakes it infeasible to do an accurate timing analysis.

1.13.6 Using a Clock Signal as Data

process beginwait until rising_edge(clk);count <= count + 1;

end process;

b <= a and clk;•Clock signals should be used only as clocks.

reason Clock signals have two defined values in a clock cycle and transition in the middle ofthe clock cycle. At the register-transfer level, each signal has exactly one value in a clockcycle and signals transition between values only at the boundary between clock cycles.


1.13.7 Tri-State Buffers and Signals

’Z’ as a Signal Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

process (sel, a0)b <= a0 when sel = ’0’

else ’Z’;end process;

process (sel, a1)b <= a1 when sel = ’1’

else ’Z’;end process;•Use multiplexers, not tri-state buffers.

reason Multiplexers are more robust than tri-state buffers, because tri-state buffers rely on ana-log effects such as drive-strength and voltages that are between ’0’ and ’1’. Multiplexersrequire more area than tri-state buffers, but for the size of most busses, the advantage in amore robust design is worth the cost in extra area.

Inout and Buffer Port Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

entity bad isport (io_bad : inout std_logic;buf_bad : buffer std_logic

);end entity;•Use in or out, do not use inout or buffer

reason inout and buffer signals are tri-state.note If you have an output signal that you also want to read from, you might be tempted to

declare the mode of the signal to be inout. A better solution is to create a new, internal,signal that you both read from and write to. Then, your output signal can just read fromthe internal signal.

1.13.8 Multiple Drivers 104

1.13.8 Multiple Drivers

process beginwait until rising edge(clk);if reset = ’1’ then

y <= ’0’;z <= ’0’;

end if;end process;


if a = ’1’ thenz <= b and c;

elsez <= d;

end if;end if;

end process;


if b = ’1’ theny <= c;

end if;end if;

end process;• Each signal should be assigned to in only one process. This is often called the “single assignment

rule”.reason Multiple processes driving the same signal is the same as having multiple gates driving

the same wire. This can cause contention, tri-state values, and other bad things.exception Multiple drivers are acceptable for tri-state busses or if your implementation tech-

nology has wired-ANDs or wired-ORs. FPGAs do not have wired-ANDs or wired-ORs,and many ASIC designers consider them to be risky and bad practice.