597
ECE 327: Digital Systems Engineering Lecture Slides 2018t1 (Winter) Mark Aagaard University of Waterloo Department of Electrical and Computer Engineering

2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

  • Upload
    ngonhi

  • View
    221

  • Download
    4

Embed Size (px)

Citation preview

Page 1: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

ECE 327: Digital Systems Engineering

Lecture Slides

2018t1 (Winter)

Mark Aagaard

University of Waterloo

Department of Electrical and Computer Engineering

Page 2: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

ii

Page 3: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Contents

1 Fundamentals of VHDL 211.1 Introduction to VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.1.1 Levels of Abstraction . . . . . . . . . . . . . . . . . . . . . . . . 221.1.2 VHDL Origins and History . . . . . . . . . . . . . . . . . . . . . 231.1.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.1.4 Synthesis of a Simulation-Based Language . . . . . . . . . . . 261.1.5 Solution to Synthesis Sanity . . . . . . . . . . . . . . . . . . . 271.1.6 Standard Logic 1164 . . . . . . . . . . . . . . . . . . . . . . . . 28

1.2 Comparison of VHDL to Other Hardware Description Languages . . . 291.3 Overview of Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.3.1 Syntactic Categories . . . . . . . . . . . . . . . . . . . . . . . . 29iii

Page 4: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.3.2 Library Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.3.3 Entities and Architecture . . . . . . . . . . . . . . . . . . . . . 301.3.4 Concurrent Statements . . . . . . . . . . . . . . . . . . . . . . 331.3.5 Component Declaration and Instantiations . . . . . . . . . . . 361.3.6 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.3.7 Generate Statements . . . . . . . . . . . . . . . . . . . . . . . 411.3.8 Sequential Statements . . . . . . . . . . . . . . . . . . . . . . 421.3.9 A Few More Miscellaneous VHDL Features . . . . . . . . . . . 43

1.4 Concurrent vs Sequential Statements . . . . . . . . . . . . . . . . . . 431.4.1 Concurrent Assignment vs Process . . . . . . . . . . . . . . . 441.4.2 Conditional Assignment vs If Statements . . . . . . . . . . . . 451.4.3 Selected Assignment vs Case Statement . . . . . . . . . . . . 461.4.4 Coding Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

1.5 Overview of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 481.5.1 Combinational Process vs Clocked Process . . . . . . . . . . . 521.5.2 Latch Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

1.6 VHDL Execution: Delta-Cycle Simulation . . . . . . . . . . . . . . . . 641.6.1 Simple Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 641.6.2 Temporal Granularities of Simulation . . . . . . . . . . . . . . . 661.6.3 Zero-Delay Simulation . . . . . . . . . . . . . . . . . . . . . . . 67

CONTENTS iv

Page 5: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.4 Intuition Behind Delta-Cycle Simulation . . . . . . . . . . . . . 681.6.4.1 Introduction to Delta-Cycle Simulation . . . . . . . . . 681.6.4.2 Intuitive Rules for Delta-Cycle Simulation . . . . . . . 691.6.4.3 Example of Delta-Cycles: Back-to-Back Buffers . . . 701.6.4.4 Example of Proj Asn: Buffers . . . . . . . . . . . . . . 711.6.4.5 Example of Proj Asn: Flip-Flops . . . . . . . . . . . . 721.6.4.6 Example of Proj Asn: Comb Loop . . . . . . . . . . . 73

1.6.5 VHDL Delta-Cycle Simulation . . . . . . . . . . . . . . . . . . . 771.6.5.1 Informal Description of Algorithm . . . . . . . . . . . . 781.6.5.2 Example: VHDL Sim for Buffers . . . . . . . . . . . . 791.6.5.3 Definitions and Algorithm . . . . . . . . . . . . . . . . 801.6.5.4 Example: Delta-Cycle for Flip-Flops . . . . . . . . . . 831.6.5.5 Ex: VHDL Sim of Comb Loop . . . . . . . . . . . . . 851.6.5.6 Rules and Observations for Drawing Delta-Cycle Sim-

ulations . . . . . . . . . . . . . . . . . . . . . . . . . 871.6.6 External Inputs and Flip-Flops . . . . . . . . . . . . . . . . . . 89

1.7 Register-Transfer-Level Simulation . . . . . . . . . . . . . . . . . . . . 911.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 911.7.2 Technique for Register-Transfer Level Simulation . . . . . . . . 931.7.3 Examples of RTL Simulation . . . . . . . . . . . . . . . . . . . 94

CONTENTS v

Page 6: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.7.3.1 RTL Simulation Example 1 . . . . . . . . . . . . . . . 941.8 Simple RTL Simulation in Software . . . . . . . . . . . . . . . . . . . . 991.9 Variables in VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001.10 Delta-Cycle Simulation with Delays . . . . . . . . . . . . . . . . . . . 1011.11 VHDL and Hardware Building Blocks . . . . . . . . . . . . . . . . . . 102

1.11.1 Basic Building Blocks . . . . . . . . . . . . . . . . . . . . . . 1021.11.2 Deprecated Building Blocks for RTL . . . . . . . . . . . . . . 1071.11.3 Hardware and Code for Flops . . . . . . . . . . . . . . . . . . 109

1.11.3.1 Flops with Waits and Ifs . . . . . . . . . . . . . . . . 1091.11.3.2 Flops with Synchronous Reset . . . . . . . . . . . . 1111.11.3.3 Flop with Chip-Enable and Mux on Input . . . . . . . 1181.11.3.4 Flops with Chip-Enable, Muxes, and Reset . . . . . 119

1.11.4 Example Coding Styles . . . . . . . . . . . . . . . . . . . . . 1191.12 Synthesizable vs Non-Synthesizable Code . . . . . . . . . . . . . . . 120

1.12.1 Wait For . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1211.12.2 Initial Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 1211.12.3 Assignments before Wait Statement . . . . . . . . . . . . . . 1221.12.4 Multiple “if rising edge” in Process . . . . . . . . . . . . . . . 1231.12.5 “if rising edge” and “wait” in Same Process . . . . . . . . . . 1241.12.6 “if rising edge” with “else” Clause . . . . . . . . . . . . . . . . 125

CONTENTS vi

Page 7: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.12.7 While Loop with Dynamic Condition and Combinational Body 1261.13 Guidelines for Desirable Hardware . . . . . . . . . . . . . . . . . . . 128

1.13.1 Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1301.13.2 Combinational Loops . . . . . . . . . . . . . . . . . . . . . . . 1321.13.3 Multiple Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . 1331.13.4 Asynchronous Reset . . . . . . . . . . . . . . . . . . . . . . . 1351.13.5 Using a Data Signal as a Clock . . . . . . . . . . . . . . . . . 1361.13.6 Using a Clock Signal as Data . . . . . . . . . . . . . . . . . . 137

1.14 Bad VHDL Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1381.14.1 Tri-State Buffers and Signals . . . . . . . . . . . . . . . . . . 1381.14.2 Variables in Processes . . . . . . . . . . . . . . . . . . . . . . 1411.14.3 Bits and Booleans as Signals . . . . . . . . . . . . . . . . . . 142

CONTENTS vii

Page 8: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2 Additional Features of VHDL 1452.1 Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

2.1.1 Numeric Literals . . . . . . . . . . . . . . . . . . . . . . . . . . 1462.1.2 Bit-String Literals . . . . . . . . . . . . . . . . . . . . . . . . . . 147

2.2 Arrays and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1482.2.1 Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1482.2.2 Indexing, Slicing, Concatenation, Aggregates . . . . . . . . . . 150

2.3 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1532.3.1 Arithmetic Packages . . . . . . . . . . . . . . . . . . . . . . . . 1532.3.2 Arithmetic Types . . . . . . . . . . . . . . . . . . . . . . . . . . 1542.3.3 Overloading of Arithmetic . . . . . . . . . . . . . . . . . . . . . 1552.3.4 Widths for Addition and Subtraction . . . . . . . . . . . . . . . 1562.3.5 Overloading of Comparisons . . . . . . . . . . . . . . . . . . . 1582.3.6 Widths for Comparisons . . . . . . . . . . . . . . . . . . . . . . 1592.3.7 Type Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . 1602.3.8 Shift and Rotate Operations . . . . . . . . . . . . . . . . . . . . 1642.3.9 Arithmetic Optimizations . . . . . . . . . . . . . . . . . . . . . . 165

2.4 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1662.4.1 Enumerated Types . . . . . . . . . . . . . . . . . . . . . . . . . 1662.4.2 Defining New Array Types . . . . . . . . . . . . . . . . . . . . . 167

CONTENTS viii

Page 9: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3 Overview of FPGAs 1693.1 Generic FPGA Hardware . . . . . . . . . . . . . . . . . . . . . . . . . 169

3.1.1 Generic FPGA Cell . . . . . . . . . . . . . . . . . . . . . . . . 1703.1.2 Lookup Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1743.1.3 Interconnect for Generic FPGA . . . . . . . . . . . . . . . . . . 1753.1.4 Blocks of Cells for Generic FPGA . . . . . . . . . . . . . . . . 1783.1.5 Special Circuitry in FPGAs . . . . . . . . . . . . . . . . . . . . 180

3.2 Area Estimation for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . 1813.2.1 Area for Circuit with one Target . . . . . . . . . . . . . . . . . . 1823.2.2 Algorithm to Allocate Gates to Cells . . . . . . . . . . . . . . . 1853.2.3 Area for Arithmetic Circuits . . . . . . . . . . . . . . . . . . . . 190

CONTENTS ix

Page 10: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4 State Machines 1954.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1964.2 Finite State Machines in VHDL . . . . . . . . . . . . . . . . . . . . . . 197

4.2.1 HDL Coding Styles for State Machines . . . . . . . . . . . . . . 1974.2.2 State Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . 1984.2.3 Traditional State-Machine Notation . . . . . . . . . . . . . . . . 1994.2.4 Our State-Machine Notation . . . . . . . . . . . . . . . . . . . 2004.2.5 Bounce Example . . . . . . . . . . . . . . . . . . . . . . . . . . 2014.2.6 Registered Assignments . . . . . . . . . . . . . . . . . . . . . 2064.2.7 More Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

4.2.7.1 Extension: Transient States . . . . . . . . . . . . . . . 2104.2.7.2 Assignments within States . . . . . . . . . . . . . . . 2124.2.7.3 Conditional Expressions . . . . . . . . . . . . . . . . 2154.2.7.4 Default Values . . . . . . . . . . . . . . . . . . . . . . 216

4.2.8 Semantic and Syntax Rules . . . . . . . . . . . . . . . . . . . . 2204.2.9 Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

4.3 LeBlanc FSM Design Example . . . . . . . . . . . . . . . . . . . . . . 2294.3.1 State Machine and VHDL . . . . . . . . . . . . . . . . . . . . . 2304.3.2 State Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . 233

4.4 Parcels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239CONTENTS x

Page 11: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.4.1 Bubbles and Throughput . . . . . . . . . . . . . . . . . . . . . 2404.4.2 Parcel Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . 2454.4.3 Valid Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

4.5 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2494.6 LeBlanc with Bubbles . . . . . . . . . . . . . . . . . . . . . . . . . . . 2514.7 Interparcel Variables and Loops . . . . . . . . . . . . . . . . . . . . . 253

4.7.1 Introduction to Looping Le Blanc . . . . . . . . . . . . . . . . . 2534.7.2 Pseudo-Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2544.7.3 State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 2564.7.4 VHDL Code for Loop and Bubbles . . . . . . . . . . . . . . . . 258

4.8 Memory Arrays and RTL Design . . . . . . . . . . . . . . . . . . . . . 2594.8.1 Memory Operations . . . . . . . . . . . . . . . . . . . . . . . . 2594.8.2 Memory Arrays in VHDL . . . . . . . . . . . . . . . . . . . . . . 2624.8.3 Using Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

4.8.3.1 Writing from Multiple Vars . . . . . . . . . . . . . . . 2644.8.3.2 Reading from Memory to Multiple Variables . . . . . . 2654.8.3.3 Example: Maximum Value Seen so Far . . . . . . . . 267

4.8.4 Build Larger Memory from Slices . . . . . . . . . . . . . . . . . 2704.8.5 Memory Arrays in High-Level Models . . . . . . . . . . . . . . 271

CONTENTS xi

Page 12: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5 Dataflow Diagrams 2735.1 Dataflow Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

5.1.1 Dataflow Diagrams Overview . . . . . . . . . . . . . . . . . . . 2745.1.2 Dataflow Diagram Execution . . . . . . . . . . . . . . . . . . . 2825.1.3 Dataflow Diagrams, Hardware, and Behaviour . . . . . . . . . 2845.1.4 Performance Estimation . . . . . . . . . . . . . . . . . . . . . . 2885.1.5 Area Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 2895.1.6 Design Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 291

5.2 Design Example: Hnatyshyn DFD . . . . . . . . . . . . . . . . . . . . 2965.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 2965.2.2 Data-Dependency Graph . . . . . . . . . . . . . . . . . . . . . 2975.2.3 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . . . . . 2985.2.4 Area Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 2995.2.5 Assign Names to Registered Signals . . . . . . . . . . . . . . . 3005.2.6 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3035.2.7 State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 3105.2.8 VHDL Implementation . . . . . . . . . . . . . . . . . . . . . . . 316

5.3 Design Example: Hnatyshyn with Bubbles . . . . . . . . . . . . . . . . 3215.3.1 Adding Support for Bubbles . . . . . . . . . . . . . . . . . . . . 3225.3.2 Control Table with Valid Bits . . . . . . . . . . . . . . . . . . . . 326

CONTENTS xii

Page 13: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.3.3 VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3285.4 Inter-Parcel Variables: Hnatyshyn with Internal State . . . . . . . . . . 331

5.4.1 Requirements and Goals . . . . . . . . . . . . . . . . . . . . . 3325.4.2 Dataflow Diagrams and Waveforms . . . . . . . . . . . . . . . 3335.4.3 Control Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 3375.4.4 VHDL Implementation . . . . . . . . . . . . . . . . . . . . . . . 3395.4.5 Summary of Bubbles and Inter-Parcel Variables . . . . . . . . 341

5.5 Design Example: Vanier . . . . . . . . . . . . . . . . . . . . . . . . . . 3425.5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 3435.5.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3445.5.3 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . . . . . 3455.5.4 Reschedule to Meet Requirements . . . . . . . . . . . . . . . . 3465.5.5 Optimization: Reduce Inputs . . . . . . . . . . . . . . . . . . . 3485.5.6 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3505.5.7 Explicit State Machine . . . . . . . . . . . . . . . . . . . . . . . 3525.5.8 VHDL #1: Explicit . . . . . . . . . . . . . . . . . . . . . . . . . 3535.5.9 VHDL #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3575.5.10 Notes and Observations . . . . . . . . . . . . . . . . . . . . . 359

5.6 Memory Operations in Dataflow Diagrams . . . . . . . . . . . . . . . . 3615.7 Data Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3665.8 Example of DFD and Memory . . . . . . . . . . . . . . . . . . . . . . 371

CONTENTS xiii

Page 14: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6 Optimizations 3776.1 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

6.1.1 Introduction to Pipelining . . . . . . . . . . . . . . . . . . . . . 3796.1.2 Partially Pipelined . . . . . . . . . . . . . . . . . . . . . . . . . 3836.1.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3856.1.4 Overlapping Pipeline Stages . . . . . . . . . . . . . . . . . . . 386

6.2 Staggering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3946.3 Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3956.4 General Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 402

6.4.1 Strength Reduction . . . . . . . . . . . . . . . . . . . . . . . . 4026.4.1.1 Arithmetic Strength Reduction . . . . . . . . . . . . . 4026.4.1.2 Boolean Strength Reduction . . . . . . . . . . . . . . 403

6.4.2 Replication and Sharing . . . . . . . . . . . . . . . . . . . . . . 4046.4.2.1 Mux-Pushing . . . . . . . . . . . . . . . . . . . . . . . 4046.4.2.2 Common Subexpression Elimination . . . . . . . . . . 4056.4.2.3 Computation Replication . . . . . . . . . . . . . . . . 407

6.4.3 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4086.5 Customized State Encodings . . . . . . . . . . . . . . . . . . . . . . . 409

CONTENTS xiv

Page 15: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7 Performance Analysis 4117.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4127.2 Defining Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 4137.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4167.4 Comparing Performance . . . . . . . . . . . . . . . . . . . . . . . . . 419

7.4.1 General Equations . . . . . . . . . . . . . . . . . . . . . . . . . 4197.4.2 Example: Performance of Printers . . . . . . . . . . . . . . . . 426

7.5 Clock Speed, CPI, Program Length, and Performance . . . . . . . . . 4277.5.1 Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4277.5.2 Example: CISC vs RISC and CPI . . . . . . . . . . . . . . . . 4287.5.3 Effect of Instruction Set on Performance . . . . . . . . . . . . . 432

7.6 Effect of Time to Market on Relative Performance . . . . . . . . . . . 438

CONTENTS xv

Page 16: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8 Timing Analysis 4458.1 Delays and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 446

8.1.1 Background Definitions . . . . . . . . . . . . . . . . . . . . . . 4468.1.2 Clock-Related Timing Definitions . . . . . . . . . . . . . . . . . 447

8.1.2.1 Clock Latency . . . . . . . . . . . . . . . . . . . . . . 4488.1.2.2 Clock Skew . . . . . . . . . . . . . . . . . . . . . . . . 4498.1.2.3 Clock Jitter . . . . . . . . . . . . . . . . . . . . . . . . 451

8.1.3 Storage-Related Timing Definitions . . . . . . . . . . . . . . . 4538.1.3.1 Flops and Latches . . . . . . . . . . . . . . . . . . . . 4538.1.3.2 Timing Parameters . . . . . . . . . . . . . . . . . . . 4548.1.3.3 Timing Parameters for a Flop . . . . . . . . . . . . . . 455

8.1.4 Propagation Delays . . . . . . . . . . . . . . . . . . . . . . . . 4568.1.5 Timing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 457

8.2 Timing Analysis of Simple Latches . . . . . . . . . . . . . . . . . . . . 4618.2.1 Structure and Behaviour of Multiplexer Latch . . . . . . . . . . 4618.2.2 Strategy for Timing Analysis of Storage Devices . . . . . . . . 4648.2.3 Clock-to-Q Time of a Latch . . . . . . . . . . . . . . . . . . . . 4658.2.4 From Load Mode to Store Mode . . . . . . . . . . . . . . . . . 4668.2.5 Setup Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . 4678.2.6 Hold Time of a Multiplexer Latch . . . . . . . . . . . . . . . . . 473

CONTENTS xvi

Page 17: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.2.7 Example of a Bad Latch . . . . . . . . . . . . . . . . . . . . . . 4768.2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479

8.3 Advanced Timing Analysis of Storage Elements . . . . . . . . . . . . 4808.4 Critical Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

8.4.1 Introduction to Critical and False Paths . . . . . . . . . . . . . 4838.4.1.1 Example of Critical Path in Full Adder . . . . . . . . . 4848.4.1.2 Longest Path and Critical Path . . . . . . . . . . . . . 4868.4.1.3 Criteria for Critical Path Algorithms . . . . . . . . . . . 489

8.4.2 Longest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4908.4.2.1 Algorithm to Find Longest Path . . . . . . . . . . . . . 4908.4.2.2 Longest Path Example . . . . . . . . . . . . . . . . . 491

8.4.3 Monotone Speedup . . . . . . . . . . . . . . . . . . . . . . . . 4928.5 False Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4978.6 Analog Timing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 498

8.6.1 Defining Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 4998.6.2 Modeling Circuits for Timing . . . . . . . . . . . . . . . . . . . . 503

8.6.2.1 Example: Two Buffers with Complex Wiring . . . . . . 5068.6.2.2 Example: Two Buffers with Simple Wiring . . . . . . . 507

8.6.3 Calculate Delay . . . . . . . . . . . . . . . . . . . . . . . . . . 5088.6.4 Ex: Two Bufs with Both Caps . . . . . . . . . . . . . . . . . . . 513

CONTENTS xvii

Page 18: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.7 Elmore Delay Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5178.7.1 Elmore Delay as an Approximation . . . . . . . . . . . . . . . . 5178.7.2 A More Complicated Example . . . . . . . . . . . . . . . . . . 520

8.8 Practical Usage of Timing Analysis . . . . . . . . . . . . . . . . . . . . 524

9 Power 5259.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526

9.1.1 Importance of Power and Energy . . . . . . . . . . . . . . . . . 5269.1.2 Power vs Energy . . . . . . . . . . . . . . . . . . . . . . . . . . 5279.1.3 Batteries, Power and Energy . . . . . . . . . . . . . . . . . . . 528

9.1.3.1 Do Batteries Store Energy or Power? . . . . . . . . . 5289.1.3.2 Battery Life and Efficiency . . . . . . . . . . . . . . . 5299.1.3.3 Battery Life and Power . . . . . . . . . . . . . . . . . 530

9.2 Power Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5339.2.1 Switching Power . . . . . . . . . . . . . . . . . . . . . . . . . . 5359.2.2 Short-Circuited Power . . . . . . . . . . . . . . . . . . . . . . . 5379.2.3 Leakage Power . . . . . . . . . . . . . . . . . . . . . . . . . . . 5389.2.4 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5399.2.5 Note on Power Equations . . . . . . . . . . . . . . . . . . . . . 539

9.3 Overview of Power Reduction Techniques . . . . . . . . . . . . . . . . 539CONTENTS xviii

Page 19: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.4 Voltage Reduction for Power Reduction . . . . . . . . . . . . . . . . . 5449.5 Data Encoding for Power Reduction . . . . . . . . . . . . . . . . . . . 550

9.5.1 How Data Encoding Can Reduce Power . . . . . . . . . . . . . 5509.5.2 Example Problem: Sixteen Pulser . . . . . . . . . . . . . . . . 554

9.5.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . 5549.5.2.2 Additional Information . . . . . . . . . . . . . . . . . . 5559.5.2.3 Answer . . . . . . . . . . . . . . . . . . . . . . . . . . 557

9.6 Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5639.6.1 Introduction to Clock Gating . . . . . . . . . . . . . . . . . . . . 5649.6.2 Implementing Clock Gating . . . . . . . . . . . . . . . . . . . . 5659.6.3 Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . 5669.6.4 Effectiveness of Clock Gating . . . . . . . . . . . . . . . . . . . 5679.6.5 Example: Reduced Activity Factor with Clock Gating . . . . . . 5719.6.6 Calculating PctBusy . . . . . . . . . . . . . . . . . . . . . . . . 573

9.6.6.1 Valid Bits and Busy . . . . . . . . . . . . . . . . . . . 5739.6.6.2 Calculating LenBusy . . . . . . . . . . . . . . . . . . 5759.6.6.3 From LenBusy to PctBusy . . . . . . . . . . . . . . . 576

9.6.7 Example: Pipelined Circuit with Clock-Gating . . . . . . . . . . 5779.6.8 Clock Gating in ASICs . . . . . . . . . . . . . . . . . . . . . . . 5839.6.9 Alternatives to Clock Gating . . . . . . . . . . . . . . . . . . . . 584

9.6.9.1 Use Chip Enables . . . . . . . . . . . . . . . . . . . . 5849.6.9.2 Operand Gating . . . . . . . . . . . . . . . . . . . . . 585

CONTENTS xix

Page 20: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10 Review 58710.1 Overview of the Term . . . . . . . . . . . . . . . . . . . . . . . . . . . 58810.2 VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589

10.2.1 VHDL Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 58910.2.2 VHDL Example Problems . . . . . . . . . . . . . . . . . . . . 590

10.3 RTL Design Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 59110.3.1 Design Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 59110.3.2 Design Example Problems . . . . . . . . . . . . . . . . . . . . 592

10.4 Performance Analysis and Optimization . . . . . . . . . . . . . . . . 59310.4.1 Performance Topics . . . . . . . . . . . . . . . . . . . . . . . 59310.4.2 Performance Example Problems . . . . . . . . . . . . . . . . 594

10.5 Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59510.5.1 Timing Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 59510.5.2 Timing Example Problems . . . . . . . . . . . . . . . . . . . . 596

10.6 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59710.6.1 Power Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 59710.6.2 Power Example Problems . . . . . . . . . . . . . . . . . . . . 598

10.7 Formulas to be Given on Final Exam . . . . . . . . . . . . . . . . . . 599

CONTENTS xx

Page 21: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 1

Fundamentals of VHDL

21

Page 22: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.1 Introduction to VHDL

1.1.1 Levels of AbstractionTransistor Signal values and time are continous (analog). Each transistor is

modeled by a resistor-capacitor network.

Switch Time is continuous, but voltage may be either continuous or discrete.Linear equations are used.

Gate Transistors are grouped together into gates. Voltages are discrete valuessuch as 0 and 1.

Register transfer level Hardware is modeled as assignments to registers andcombinational signals. Basic unit of time is one clock cycle.

Transaction level A transaction is an operation such as transfering data acrossa bus. Building blocks are processors, controllers, etc. VHDL, SystemC, orSystemVerilog.

Electronic-system level Looks at an entire electronic system, with bothhardware and software.

1.1. INTRODUCTION TO VHDL 22

Page 23: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.1.2 VHDL Origins and History

VHDL = VHSIC Hardware Description LanguageVHSIC = Very High Speed Integrated Circuit

The VHSIC Hardware Description Language (VHDL) is a formal notationintended for use in all phases of the creation of electronic systems.Because it is both machine readable and human readable, it supports thedevelopment, verification, synthesis and testing of hardware designs, thecommunication of hardware design data, and the maintenance,modification, and procurement of hardware.

Language Reference Manual (IEEE Design Automation StandardsCommittee, 1993a)

VHDL is a lot more than synthesis of digitalhardware

1.1.2 VHDL Origins and History 23

Page 24: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.1.3 Semantics

The original goal of VHDL was to simulate circuits. The semantics of the languagedefine circuit behaviour.

a

b

c

c <= a AND b; simulation

But now, VHDL is used in simulation and synthesis. Synthesis is concerned withthe structure of the circuit.

Synthesis: converts one type of description (behavioural) into another, lower level,description (usually a netlist).

a

b cc <= a AND b; synthesis

1.1.3 Semantics 24

Page 25: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Synthesis vs SimulationFor synthesis, we want the code we write to define the structure of the hardwarethat is generated.

The VHDL semantics define the behaviour of the hardware that is generated, notthe structure of the hardware.

a

b c

a

b cc <= a AND b;

a

b

c

different

structure

same

behaviour

synthesissimulation

a

b

c

simulation

synthesis

a

b

c

simulation

same

behaviour

1.1.3 Semantics 25

Page 26: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.1.4 Synthesis of a Simulation-BasedLanguage

This section reserved for your reading pleasure

1.1.4 Synthesis of a Simulation-Based Language 26

Page 27: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.1.5 Solution to Synthesis Sanity•Pick a high-quality synthesis tool and study its documentation thoroughly

• Learn the idioms of the tool

•Different VHDL code with same behaviour can result in very different circuits

•Be careful if you have to port VHDL code from one tool to another

•KISS: Keep It Simple Stupid

– VHDL examples will illustrate reliable coding techniques for the synthesistools from Synopsys, Mentor Graphics, Altera, Xilinx, and most othercompanies as well.

– Follow the coding guidelines and examples from lecture

– As you write VHDL, think about the hardware you expect to get.Note: If you can’t predict the hardware, then the hardwareprobably won’t be very good (small, fast, correct, etc)

1.1.5 Solution to Synthesis Sanity 27

Page 28: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.1.6 Standard Logic 1164

std logic 1164: IEEE standard for signal values in VHDL.

’U’ uninitialized’X’ strong unknown’0’ strong 0’1’ strong 1’Z’ high impedance’W’ weak unknown’L’ weak 0’H’ weak 1’-’ don’t care

The most common values are: ’U’, ’X’, ’0’, ’1’.

If you see ’X’ in a simulation, it usually means that there is a mistake in yourcode.

1.1.6 Standard Logic 1164 28

Page 29: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.2 Comparison of VHDL to OtherHardware Description Languages

This section reserved for your reading pleasure

1.3 Overview of Syntax

1.3.1 Syntactic Categories

This section reserved for your reading pleasure

1.3.2 Library Units

This section reserved for your reading pleasure

1.2. COMPARISON OF VHDL TO OTHER HARDWARE DESCRIPTION LANGUAGES 29

Page 30: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.3.3 Entities and Architecture

Each hardware module is described with an Entity/Architecture pair

architecture

entity

architecture

entity

Entity and Architecture

1.3.3 Entities and Architecture 30

Page 31: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Entity

library ieee;

use ieee.std_logic_1164.all;

entity and_or is

port (

a, b, c : in std_logic ;

z : out std_logic

);

end entity;

Example of an entity

1.3.3 Entities and Architecture 31

Page 32: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Architecture

architecture main of and_or is

signal x : std_logic;

begin

x <= a AND b;

z <= x OR (a AND c);

end architecture;

Example of architecture

1.3.3 Entities and Architecture 32

Page 33: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.3.4 Concurrent Statements•An architecture contains concurrent statements

•Concurrent statements execute in parallel

– Concurrent statements make VHDL fundamentally different from mostsoftware languages.

– Hardware (gates) naturally execute in parallel — VHDL mimics the behaviourof real hardware.

– At each infinitesimally small moment of time, each gate:

1. samples its inputs

2. computes the value of its output

3. drives the output

1.3.4 Concurrent Statements 33

Page 34: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Concurrent Statements

architecture main of bowser isbegin x1 <= a AND b; x2 <= NOT x1; z <= NOT x2;end main;

architecture main of bowser isbegin z <= NOT x2; x2 <= NOT x1; x1 <= a AND b;end main;

a

b

zx1 x2

The order of concurrent statements doesn’t matter

1.3.4 Concurrent Statements 34

Page 35: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Types of Concurrent Statements

conditional assignment similar to conventional if-then-elsec <= a+b when sel=’1’ else a+c when sel=’0’ else "0000";

selected assignment similar to conventional case/switchwith color select d <= "00" when red , "01" when . . .;

component instantiation use a hardware module/componentadd1 : adder port map( a => f, b => g, s => h, co => i);

for-generate create multiple pieces of hardwarebgen: for i in 1 to 7 generate b(i)<=a(7-i); end generate;

if-generate conditionally create some hardwareokgen : if optgoal /= fast then generateresult <= ((a and b) or (d and not e)) or g;

end generate;fastgen : if optgoal = fast then generateresult <= ’1’;

end generate;process description of complex behaviour (section 1.3.6)

1.3.4 Concurrent Statements 35

Page 36: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.3.5 Component Declaration andInstantiations

This section reserved for your reading pleasure

1.3.6 Processes

•Processes are used to describe complex and potentially unsynthesizablebehaviour

•A process is a concurrent statement (section 1.3.4).

• The body of a process contains sequential statements (section 1.3.8)

•Processes are the most complex and difficult to understand part of VHDL(sections 1.5 and 1.6)

1.3.5 Component Declaration and Instantiations 36

Page 37: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Example Process with Sensitivity List

process (a, b, c)

begin

y <= a AND b;

if (a = ’1’) then

z1 <= b AND c;

z2 <= NOT c;

else

z1 <= b OR c;

z2 <= c;

end if;

end process;

1.3.6 Processes 37

Page 38: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Example Process with Wait Statements

process

begin

y <= a AND b;

z <= ’0’;

wait until rising_edge(clk);

if (a = ’1’) then

z <= ’1’;

y <= ’0’;

wait until rising_edge(clk);

else

y <= a OR b;

end if;

end process;

1.3.6 Processes 38

Page 39: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Sensitivity Lists and Wait Statements

•Processes must have either a sensitivity list or at least one wait statement oneach execution path through the process.

•Processes cannot have both a sensitivity list and a wait statement.

1.3.6 Processes 39

Page 40: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Sensitivity List

The sensitivity list contains the signals that are read in the process.

A process is executed when a signal in its sensitivity list changes value.

An important coding guideline to ensure consistent synthesis and simulationresults is to include all signals that are read in the sensitivity list.

There is one exception to this rule: for a process that implements a flip-flop withan if rising edge statement, it is acceptable to include only the clock signal inthe sensitivity list — other signals may be included, but are not needed.

1.3.6 Processes 40

Page 41: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.3.7 Generate Statements

• Two categories of generate statements:

– if-generate: conditionally generate some hardware

– for-generate: generate multiple copies of some hardware

• Generate statements are executed during elaboration (at compile time)

• The conditions and loop ranges must be static

– Must be able to be evaluated at elaboration

– Must not depend upon the value of any signal

• A generate statement must be preceded by a label

1.3.7 Generate Statements 41

Page 42: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.3.8 Sequential Statements

Used inside processes, functions, and procedures.

wait wait until . . .;signal assignment . . . <= . . .;if-then-else if . . . then . . . elsif . . . end if;case case . . . is

when . . . | . . . => . . .;when . . . => . . .;end case;

loop loop . . . end loop;while loop while . . . loop . . . end loop;for loop for . . . in . . . loop . . . end loop;next next . . .;

The most commonly used sequential statements

1.3.8 Sequential Statements 42

Page 43: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.3.9 A Few More Miscellaneous VHDLFeatures

This section reserved for your reading pleasure

1.4 Concurrent vs Sequential Statements

All concurrent assignments can be translated into sequential statements. But, notall sequential statements can be translated into concurrent statements.

1.3.9 A Few More Miscellaneous VHDL Features 43

Page 44: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.4.1 Concurrent Assignment vs Process

The two code fragments below have identical behaviour:

architecture main of tiny isbeginb <= a;

end main;

architecture main of tiny isbegin

process (a) beginb <= a;

end process;end main;

1.4.1 Concurrent Assignment vs Process 44

Page 45: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.4.2 Conditional Assignment vs IfStatements

The two code fragments below have identical behaviour:

Concurrent Statements

t <= <val1> when <cond>

else <val2>;

Sequential Statementsif <cond> then

t <= <val1>;

else

t <= <val2>;

end if

1.4.2 Conditional Assignment vs If Statements 45

Page 46: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.4.3 Selected Assignment vs CaseStatement

The two code fragments below have identical behaviour

Concurrent Statementswith <expr> select

t <= <val1> when <choices1>,

<val2> when <choices2>,

<val3> when <choices3>;

Sequential Statementscase <expr> is

when <choices1> =>

t <= <val1>;

when <choices2> =>

t <= <val2>;

when <choices3> =>

t <= <val3>;

end case;

1.4.3 Selected Assignment vs Case Statement 46

Page 47: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.4.4 Coding Style

Code that’s easy to write with sequential statements, but difficult with concurrent:

case <expr> is

when <choice1> =>

if <cond> then

o <= <expr1>;

else

o <= <expr2>;

end if;

when <choice2> =>

. . .end case;

1.4.4 Coding Style 47

Page 48: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.5 Overview of Processes

Processes are the most difficult VHDL construct to understand. This section givesan overview of processes. section 1.6 gives the details of the semantics ofprocesses.

•Within a process, statements are executed almost sequentially

•Among processes, execution is done in parallel

•Remember: a process is a concurrent statement!

1.5. OVERVIEW OF PROCESSES 48

Page 49: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Process Semantics•VHDL mimics hardware

•Hardware (gates) execute in parallel

•Processes execute in parallel with each other

•All possible orders of executing processes must produce the same simulationresults (waveforms)

• If a signal is not assigned a value, then it holds its previous value

All orders of executing concurrentstatements must produce the same

waveforms

1.5. OVERVIEW OF PROCESSES 49

Page 50: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Process Semantics

architecture

procA: process

stmtA1;

stmtA2;

stmtA3;

end process;

procB: process

stmtB1;

stmtB2;

end process;

execution sequence

A1

A2

A3

B1

B2

execution sequence

A1

A2

A3

B1

B2

execution sequence

A1

A2

A3

B1

B2

single threaded:procA beforeprocB

single threaded:procB beforeprocA

multithreaded:procA and procB

in parallel

1.5. OVERVIEW OF PROCESSES 50

Page 51: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Process Semantics

All execution orders must have same behaviour

1.5. OVERVIEW OF PROCESSES 51

Page 52: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.5.1 Combinational Process vs ClockedProcess

Each well-written synthesizable process is either combinational or clocked.

Combinational process:•Executing the process takes part of one clock cycle

• Target signals are outputs of combinational circuitry

•A combinational process must have a sensitivity list

•A combinational process must not have any wait statements

•A combinational process must not have any rising_edges, orfalling_edges

• The hardware for a combinational process is just combinational circuitry

1.5.1 Combinational Process vs Clocked Process 52

Page 53: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Clocked process:•Executing the process takes one (or more) clock cycles

• Target signals are outputs of flops

•Process contains one or more wait or if rising edge statements

•Hardware contains combinational circuitry and flip flops

Note: Clocked processes are sometimes called “sequentialprocesses”, but this can be easily confused with “sequentialstatements”, so in ECE-327 we’ll refer to synthesizable processesas either “combinational” or “clocked”.

1.5.1 Combinational Process vs Clocked Process 53

Page 54: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Combinational or Clocked Process? (1)

process (a,b,c)

p1 <= a;

if (b = c) then

p2 <= b;

else

p2 <= a;

end if;

end process;

1.5.1 Combinational Process vs Clocked Process 54

Page 55: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Combinational or Clocked Process? (2)

process

begin

wait until rising_edge(clk);

b <= a;

end process;

1.5.1 Combinational Process vs Clocked Process 55

Page 56: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Combinational or Clocked Process? (3)

process (clk)

begin

if rising_edge(clk) then

b <= a;

end if;

end process;

1.5.1 Combinational Process vs Clocked Process 56

Page 57: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Combinational or Clocked Process? (4)

process (clk)

begin

a <= clk;

end process;

1.5.1 Combinational Process vs Clocked Process 57

Page 58: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Combinational or Clocked Process? (5)

process

begin

wait until rising_edge(a);

c <= b;

end process;

1.5.1 Combinational Process vs Clocked Process 58

Page 59: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.5.2 Latch Inference

The semantics of VHDL require that if a signal is assigned a value on somepasses through a process and not on other passes, then on a pass through theprocess when the signal is not assigned a value, it must maintain its value fromthe previous pass.

process (a, b, c)

begin

if (a = ’1’) then

z1 <= b;

z2 <= b;

else

z1 <= c;

end if;

end process;

a

b

c

z1

z2

Example of latch inference

1.5.2 Latch Inference 59

Page 60: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Latch Inference

When a signal’s value must be stored, VHDL infers a latch or a flip-flop in thehardware to store the value.

If you want a latch or a flip-flop for the signal, then latch inference is good.

If you want combinational circuitry, then latch inference is bad.

1.5.2 Latch Inference 60

Page 61: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Loop, Latch, Flop

b

a

z

Combinational loop

b z

a EN

Latch

b z

a

D Q

Flip-flop

Question: Write VHDL code for each of the above circuits

1.5.2 Latch Inference 61

Page 62: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Introduction to VHDL

1. The goal of ece327 is help you think .

2. Hardware runs

Software runs

3. In VHDL, the interface of a circuit is called a(n) .

In VHDL, the body of a circuit is called a(n) .

The body of a circuit contains statements, which execute

1.5.2 Latch Inference 62

Page 63: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

A process contains statements, which execute

4. To simulate hardware:

At each , every gate in the circuit:

1

2

3

1.5.2 Latch Inference 63

Page 64: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6 VHDL Execution: Delta-CycleSimulation

1.6.1 Simple Simulation

Hardware runs in parallel: At each infinitesimally small moment of time, eachgate:

1. samples its inputs

2. computes the value of its output

3. drives the output

a

b

c d

e

a

b

c

d

e

0ns 10ns 12ns 15ns

1.6. VHDL EXECUTION: DELTA-CYCLE SIMULATION 64

Page 65: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Different Programs, Same Behaviour

All three programs below synthesize to the circuit on the previous slide.

The goal of VHDL semantics is that all three programs have the same behaviour.

process (a,b)

begin

c <= a and b;

end process;

process (b,c,d)

begin

d <= not c;

e <= b and d;

end process;

process (a,b,c,d)

begin

c <= a and b;

d <= not c;

e <= b and d;

end process;

process (a,b)

begin

c <= a and b;

end process;

process (c)

begin

d <= not c;

end process;

process (b,d)

begin

e <= b and d;

end process;

1.6.2 Temporal Granularities of Simulation 65

Page 66: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.2 Temporal Granularities of Simulation

register-transfer-level

• smallest unit of time is a clock cycle

• combinational logic has zero delay

• flip-flops have a delay of one clock cycle

timing simulation

• smallest unit of time is a nano, pico, or fempto second

• combinational logic and wires have delay as computed by timing analysistools

• flip-flops have setup, hold, and clock-to-Q timing parameters

delta cycles

• units of time are artifacts of VHDL semantics and simulation software

• simulation cycles, delta cycles, and simulation steps are infinitesimally smallamounts of time

•VHDL semantics are defined in terms of these concepts

1.6.2 Temporal Granularities of Simulation 66

Page 67: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.3 Zero-Delay Simulation

Register-transfer-level and delta-cycle simulation are both examples of zero-delaysimulation.

There are two fundamental rules for zero-delay simulation:

1. Events appear to propagate through combinational circuitry instantaneously.

2. All of the gates appear to operate in parallel

1.6.3 Zero-Delay Simulation 67

Page 68: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.4 Intuition Behind Delta-CycleSimulation

1.6.4.1 Introduction to Delta-CycleSimulation

• To make it appear that events propagate instantaneously throughcombinational circuitry : VHDL introduces the delta cycle– Infinitesimally small artificial unit of time– In each delta cycle, every gate in the circuit

1. samples its input signals2. computes its result value3. drives the result value on its output signal

• To make it appear that gates operate in parallel : VHDL introduces theprojected assignment– the effect of simulating a gate remains invisible until the beginning of the next

delta cycle1.6.4 Intuition Behind Delta-Cycle Simulation 68

Page 69: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.4.2 Intuitive Rules for Delta-CycleSimulation1. Simulate a gate if any of its inputs changed.

If no input changed, then the current value of the output is correct and theoutput can stay at the same value.

2. Each gate is simulated at most once per delta cycle.

3. When a gate is executed, the projected (i.e., new) value of the output remainsinvisible until the beginning of the next delta cycle.

4. Increment time when there is no need for another delta cycle.

No gate had an input change value in the current delta cycle.

1.6.4 Intuition Behind Delta-Cycle Simulation 69

Page 70: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.4.3 Example of Delta-Cycles:Back-to-Back Buffers

process (a) begin

b <= a;

end process;

process (b) begin

c <= b;

end process;

a b

c

a

b

c

1nsδ-cycle

2ns

a

b

c

1ns

Delta-cycle simulation

Simple simulation

1.6.4 Intuition Behind Delta-Cycle Simulation 70

Page 71: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.4.4 Example of Proj Asn: Buffers

a

b

c

1ns 2ns

Del

ta-c

ycl

e si

mu

lati

on

wit

h p

roje

cted

valu

es

Sim

ple

sim

ula

tion

1ns

a

b

c

1.6.4 Intuition Behind Delta-Cycle Simulation 71

Page 72: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.4.5 Example of Proj Asn: Flip-Flopsp_b: process (clk) beginif rising_edge(clk) thenb <= a;

end if;end process;

p_c: process (clk) beginif rising_edge(clk) thenc <= b;

end if;end process;

a b

cD Q D Q

clk

a

clk

b

10ns

δ

11ns

c

δ

9ns

1.6.4 Intuition Behind Delta-Cycle Simulation 72

Page 73: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.4.6 Example of Proj Asn: Comb Loop

We begin with a truly parallel simulation of the circuit.

a

b c d

a

b

c

d

1nsδ δ-cycle δ-cycle δ

1

0

1

1

Fin

al v

alue

simulation of gates b,c, and d

done truly in parallel, therefore

no need for projected assignment

1.6.4 Intuition Behind Delta-Cycle Simulation 73

Page 74: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Hardware Runs in Parallel

Recall the following about the simulation done in a delta cycle:

• The simulation of the gates must be done in parallel.

• The execution of each gate must be independent of the other gates.

• The choice of in which order to simulate the gates must not affect the result.

1.6.4 Intuition Behind Delta-Cycle Simulation 74

Page 75: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Combinational Loop: Correct Simulation

a

b c d

a

b

c

d

1nsδ δ-cycle δ-cycle δ

1

0

1

1

Fin

al v

alu

e

Execution order: b, c, d

a

b

c

d

1nsδ-cycle δ-cycleδ δ

1

0

1

1

Fin

al v

alu

e

Execution order: b, d, c

1.6.4 Intuition Behind Delta-Cycle Simulation 75

Page 76: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Combinational Loop: Incorrect Simulation

a

b c d

1

0 0 0

a

b

c

d

1nsδ-cycleδ

1

0

0

0

Fin

al v

alu

e

δ-cycle

Execution order: b, c, d

a

b c d

1

0 1 1

a

b

c

d

1nsδ δ-cycle

1

0

1

1

Fin

al v

alu

e

δ-cycle

Execution order: b, d, c

1.6.4 Intuition Behind Delta-Cycle Simulation 76

Page 77: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.5 VHDL Delta-Cycle Simulation

The algorithm presented here is a simplification of the actual algorithm in theVHDL Standard.

This algorithm does not support:

• delayed assignments; for example: a <= b after 2 ns;

• resolution, which is where multiple processes write to the same signal (usually amistake, but useful for tri-state busses)

1.6.5 VHDL Delta-Cycle Simulation 77

Page 78: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.5.1 Informal Description of Algorithm•Processes have three modes:

Resumed : The process has work to do and is waiting its turn to execute.Executing : The process is running.Suspended : The process is idle and has no work to do.•A simulation run is initialization followed by a sequence of simulation rounds• Initialization:

– Each process starts off resumed.– Each signal starts off with its default value. (’U’ for std logic)• In each simulation round :

– Increment time– Resume all processes that are waiting for the current time– A simulation round is a sequence of simulation cycles.• In each simulation cycle:

– Copy projected value of signals to current value.– Resume processes based on sensitivity lists and wait conditions.– Execute each resumed process.– If no projected assignment changed the value of a signal, then increment time

and start next simulation round.1.6.5 VHDL Delta-Cycle Simulation 78

Page 79: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.5.2 Example: VHDL Sim for Buffersproc_a : process begina <= ’0’;wait for 1 ns;a <= ’1’;wait;

end process;

proc_b : process (a)beginb <= a;

end process;

proc_c : process (b)beginc <= b;

end process;

a b

c

old new

’U’ ’0’

’U’ ’U’

’0’ ’1’

graphical

symbol

valuestext

0

U

1

a

b

c

Time

Sim rounds

Sim cycles

proc_a

proc_b

proc_c

0ns

1.6.5 VHDL Delta-Cycle Simulation 79

Page 80: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.5.3 Definitions and Algorithm

Notes on Simulation Algorithm•At a wait statement, the process will suspend even if the condition is true in the

current simulation cycle. The process will resume the next time that a signal inthe condition changes and the condition is true.

• If we execute multiple assignments to the same signal in the same process inthe same simulation cycle, only the last assignment actually takes effect — allbut the last assignment are ignored.

• In a simulation round, the first simulation cycle is not a delta cycle.

• The mode of a process is determined implicitly by keeping track of the set ofprocesses that are resumed (the resume set) and the process(es) that is(are)executing. All other processes are suspended.

1.6.5 VHDL Delta-Cycle Simulation 80

Page 81: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

VHDL Simulation Definitions

Definition simulation step: Executing one sequential assignment or processmode change.

Definition simulation cycle: The operations that occur in one iteration of thesimulation algorithm.

Definition delta cycle: A simulation cycle that does not advance simulationtime.

Definition simulation round: A sequence of simulation cycles that all have thesame simulation time.

1.6.5 VHDL Delta-Cycle Simulation 81

Page 82: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

More Formal Description of Algorithm(∗ initialization ∗)set all signals to default value;add to resume set all processes;set time to 0 ns;

(∗ simulation loop ∗)while time < ∞ {

(∗ begin simulation round ∗)add to resume set all processes that are waiting for current time;

while time does not change {(∗ begin simulation cycle ∗)copy projected values of signals to current values;add to resume set any process that:

is sensitive to a signal that changed valueor whose wait-condition became true;

execute all processes in resume set;(∗ assign to projected values of signals ∗)(∗ execute until suspend on a wait statement or sensitivity list ∗)

clear resume set; (∗ resume set = 6© ∗)if none of the executing processes performed a signal assignment then {

increment time to the minimum of the wait times for processes;}

}

}

1.6.5 VHDL Delta-Cycle Simulation 82

Page 83: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.5.4 Example: Delta-Cycle for Flip-Flopsproc_a : processbegina <= ’0’;wait for 9 ns;a <= ’1’;wait;

end process;

proc_clk : processbeginclk <= ’0’;wait for 10 ns;clk <= ’1’;wait for 10 ns;

end process;

proc_flops : processbegin

wait until re(clk);b <= a;c <= b;

end process;“re”=rising edge

a

clk

b

Time

Sim rounds

Sim cycles

proc_a

proc_clk

proc_flops

0ns

c

1.6.5 VHDL Delta-Cycle Simulation 83

Page 84: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Back-to-Back Flip-Flops with If-Rising Edge

proc_flops1 :processbeginwait until re(clk);b1 <= a;c1 <= b1;

end process;

proc_flops2 :process (clk)beginif re(clk) thenb2 <= a;c2 <= b2;

end if;end process;

a

clk

b1

Time

Sim rounds

Sim cycles

proc_a

proc_flops1

c1

proc_clk

10ns

R E S

R E S

proc_flops2

b2

c2U

UU

U

U

U

U

U

1.6.5 VHDL Delta-Cycle Simulation 84

Page 85: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.5.5 Ex: VHDL Sim of Comb Loop

a

b c d

proc_a : process begina <= ’0’;wait for 1 ns;a <= ’1’;wait;

end process;

proc_b : process (a)beginb <= not( a );

end process;

proc_c : process (a,b,d)beginc <= not( a ) or b or d;

end process;

proc_d : process (a,c)begind <= a and c;

end process;

1.6.5 VHDL Delta-Cycle Simulation 85

Page 86: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

a

b c d

a

b

c

Time

Sim rounds

Sim cycles

proc_a

proc_b

proc_c

0ns

d

proc_d

1.6.5 VHDL Delta-Cycle Simulation 86

Page 87: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.5.6 Rules and Observations forDrawing Delta-Cycle Simulations

The VHDL Language Reference Manual gives only a textual description of theVHDL semantics. The conventions for drawing the waveforms are just our own.

•Each column is a simulation step.

• In a simulation step, either exactly one process changes mode or exactly onesignal changes value, except in the first two simulation steps of each simulationcycle, when multiple current values may be updated and multiple processesmay resume.

• If a projected assignment assigns the same value as the signal’s currentprojected value, the projected assignment must still be shown, because thisassignment will force another simulation cycle in the current simulation round.

• If a signal’s visible value is updated with the same value as it currently has, thisassignment is not shown, because it will not trigger any sensitivity lists.

•Assignments to signals may be denoted by either the number/letter of the newvalue or one of the edge symbols:

1.6.5 VHDL Delta-Cycle Simulation 87

Page 88: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

U 0 1

new value

old

valu

e U

0

1

Some observations about delta-cycle simulation waveforms that can be helpful inchecking that a simulation is correct:

• In the first simulation step of the first simulation cycle of a simulation round (i.e.,the first simulation step of a simulation round), at least one process will resume.This is contrast to the first simulation step of all other simulation cycle, wherecurrent values of signals are updated with projected values.

•At the end of a simulation cycle all processes are suspended.

• In the last simulation cycle of a simulation round either no signals change value,or any signal that changes value is not in the sensitivity list of any process.

1.6.5 VHDL Delta-Cycle Simulation 88

Page 89: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.6.6 External Inputs and Flip-FlopsQuestion: Do the signals b1 and b2 have the same behaviour from

10–20 ns?architecture mathilde of sauve issignal clk, a, b : std_logic;

beginprocess beginclk <= ’0’;wait for 10 ns;clk <= ’1’;wait for 10 ns;

end process;process beginwait for 10 ns;a1 <= ’1’;

end process;process beginwait until rising_edge(clk);a2 <= ’1’;

end process;process beginwait until rising_edge( clk );b1 <= a1;b2 <= a2;

end process;end architecture;

1.6.6 External Inputs and Flip-Flops 89

Page 90: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Delta-Cycle Simulation

A delta-cycle is a at the end of which

.

The two illusions of zero-delay simulation:

1. propagate

2. operate

VHDL achieves the illusions by:

1.

2.

1.6.6 External Inputs and Flip-Flops 90

Page 91: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.7 Register-Transfer-Level Simulation

1.7.1 Overview•Much simpler than delta cycle

•Columns are real time: clock cycles, nanoseconds, etc.

•Can simulate both synthesizable and unsynthesizable code

•Cannot simulate combinational loops

•Same values as delta-cycle at end of simulation round

1.7. REGISTER-TRANSFER-LEVEL SIMULATION 91

Page 92: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

process begin

a <= ’0’;

wait for 10 ns;

a <= ’1’;

...

end process;

process begin

b <= ’0’;

wait for 10 ns;

b <= a;

...

end process;

Question: In this code, whatvalue should b have at 10 ns —does it read the new value of a orthe old value?

1.7.1 Overview 92

Page 93: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.7.2 Technique for Register-TransferLevel Simulation

1. Pre-processing

(a) Separate processes into timed, clocked, and combinational

(b) Decompose each combinational process into separate processes with onetarget signal per process

(c) Sort combinational processes into topological order based on dependencies

2. For each moment of real time:

(a) Run timed processes in any order, reading old values of signals.

(b) Run clocked processes in any order, reading new values of timed signalsand old values of registered signals.

(c) Run combinational processes in topological order, reading new values ofsignals.

1.7.2 Technique for Register-Transfer Level Simulation 93

Page 94: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.7.3 Examples of RTL Simulation

1.7.3.1 RTL Simulation Example 1

We revisit an earlier example from delta-cycle simulation, but change the codeslightly and do register-transfer-level simulation.

proc1: process (a, b, c) begin

d <= NOT c;

c <= a AND b;

end process;

proc2: process (b, d) begin

e <= b AND d;

end process;

proc3: process begin

a <= ’1’;

b <= ’0’;

wait for 3 ns;

b <= ’1’;

wait for 99 ns;

end process;

1.7.3 Examples of RTL Simulation 94

Page 95: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Decompose and sort comb procs

proc1d: process (c) begind <= NOT c;

end process;

proc1c: process (a, b) beginc <= a AND b;

end process;

proc2: process (b, d) begine <= b AND d;

end process;

proc1c: process (a, b) beginc <= a AND b;

end process;

proc1d: process (c) begind <= NOT c;

end process;

proc2: process (b, d) begine <= b AND d;

end process;

Decomposed Sorted

1.7.3 Examples of RTL Simulation 95

Page 96: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Waveforms

a

b

c

d

e

U

U

U

U

U

0ns 1ns 2ns 3ns 102ns

1.7.3 Examples of RTL Simulation 96

Page 97: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: RTL Simulation

1. Algorithm for RTL simulation:

Preprocessing

(a) Separate processes into two groups: and

(b) the processes so that each process

(c) Sort the processes into orderRunning For each moment in time or clock cycle:

(a) Run the processes in order.

Processs read the value of signals.1.7.3 Examples of RTL Simulation 97

Page 98: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

(b) Run the processes in order.

Processes read the value of signals.

2. What are the defining characteristics of zero-delay simulation?

(a) operate

(b) propagate

3. Comparing delta-cycle and RTL simulation:Illusion #1 Illusion #2

Delta cyle

RTL

1.7.3 Examples of RTL Simulation 98

Page 99: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.8 Simple RTL Simulation in Software

This is an advanced section.It is not covered in the course

and will not be tested.

1.8. SIMPLE RTL SIMULATION IN SOFTWARE 99

Page 100: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.9 Variables in VHDLThis is an advanced section.

It is not covered in the courseand will not be tested.

1.9. VARIABLES IN VHDL 100

Page 101: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.10 Delta-Cycle Simulation with Delays

This is an advanced section.It is not covered in the course

and will not be tested.

1.10. DELTA-CYCLE SIMULATION WITH DELAYS 101

Page 102: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.11 VHDL and Hardware Building Blocks

1.11.1 Basic Building Blocks

Different classes of building blocks:

•Conditional

•Arithmetic

•Storage

1.11. VHDL AND HARDWARE BUILDING BLOCKS 102

Page 103: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Basic Building Blocks: Boolean

Schematic VHDL Descriptionand AND gate

or OR gate

not inverternand NAND gate

nor and gate

xor exclusive-or gate

1.11.1 Basic Building Blocks 103

Page 104: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Basic Building Blocks: Conditional

if-then-else,when-else,with-select,case

Multiplexer

1.11.1 Basic Building Blocks 104

Page 105: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Basic Building Blocks: Arithmetic

+ adder

- subtracter

asl, lsl left shifter

asr, lsr right shifter

1.11.1 Basic Building Blocks 105

Page 106: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Basic Building Blocks: Storage

CE

S

R D Q

clocked process flip flop

WE

A

DI

DO memory component single-port memory

WE

A0

DI0

DO0

A1 DO1

memory component dual-port memory

1.11.1 Basic Building Blocks 106

Page 107: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.11.2 Deprecated Building Blocks for RTL

Some of the common gates you have encountered in previous courses should beavoided when synthesizing register-transfer-level hardware, particularly if FPGAsare the implementation technology.

Latches : Use flops, not latches

T, JK, SR, etc flip-flops : Limit yourself to D-type flip-flops

Tri-State Buffers : Use multiplexers, not tri-state buffers

Note: Unfortunately and surprisingly, PalmChip was awardeda US patent for using uni-directional busses (i.e. multiplexers) forsystem-on-chip designs. The patent was filed in 2000, so allfourth-year design projects using multiplexers need to payroyalties to PalmChip

1.11.2 Deprecated Building Blocks for RTL 107

Page 108: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

What is This?

process (a)

begin

if rising_edge(a) then

c <= b;

end if;

end process;

1.11.2 Deprecated Building Blocks for RTL 108

Page 109: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.11.3 Hardware and Code for Flops

1.11.3.1 Flops with Waits and Ifs

process (clk)

begin

if rising_edge(clk) then

q <= d;

end if;

end process;

1.11.3 Hardware and Code for Flops 109

Page 110: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

VHDL Code for Flip-Flop: Wait-Style

process

begin

wait until rising_edge(clk);

q <= d;

end process;

1.11.3 Hardware and Code for Flops 110

Page 111: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.11.3.2 Flops with Synchronous Reset

process (clk)

begin

if rising_edge(clk) then

if (reset = ’1’) then

q <= ’0’;

else

q <= d;

end if;

end if;

end process;

1.11.3 Hardware and Code for Flops 111

Page 112: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Flop with Synchronous Reset: Wait-Style

process

begin

wait until rising_edge(clk);

if reset = ’1’ then

q <= ’0’;

else

q <= d;

end if;

end process;

1.11.3 Hardware and Code for Flops 112

Page 113: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Variation on a Floppy Theme

Question: What is this?

process (clk, reset)

begin

if reset = ’1’ then

q <= ’0’;

else

if rising_edge(clk) then

q <= d;

end if;

end if;

end process;

1.11.3 Hardware and Code for Flops 113

Page 114: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Flop with Chip-Enable

process (clk)

begin

if rising_edge(clk) then

if ce = ’1’ then

q <= d;

end if;

end if;

end process;

Wait-style flop with chip-enable included in course notes

1.11.3 Hardware and Code for Flops 114

Page 115: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Q: Flop with a Mux on the Input?

D Q

d0

d1

sel

q

clk

1.11.3 Hardware and Code for Flops 115

Page 116: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Q: Flops with a Mux on the Output?

D Q q0

q1

sel

clk

D Q

clk

d1

d0

q

1.11.3 Hardware and Code for Flops 116

Page 117: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Behavioural Comparison

D Q

d0

d1

sel

q

clk D Q

d0

d1

sel

q1

clk

D Q q0

q

Question: For the two circuits above, does q have the same behaviour inboth circuits?

Mux on input

clk

sel

d0

d1

q

Mux on output

clk

sel

d0

d1

q

1.11.3 Hardware and Code for Flops 117

Page 118: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.11.3.3 Flop with Chip-Enable and Mux onInput

Hint: Chip Enableprocess (clk)

begin

if rising_edge(clk) then

if ce = ’1’ then

q <= d;

end if;

end if;

end process;

1.11.3 Hardware and Code for Flops 118

Page 119: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.11.3.4 Flops with Chip-Enable, Muxes,and Reset

This section reserved for your reading pleasure

1.11.4 Example Coding Styles

This section reserved for your reading pleasure

1.11.4 Example Coding Styles 119

Page 120: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.12 Synthesizable vs Non-SynthesizableCode

For us to consider a VHDL progam synthesizable, all of the conditions below mustbe satisfied:

• the program must be theoretically implementable in hardware

• the hardware that is produced must be consistent with the structure of thesource code

• the source code must be portable across a wide range of synthesis tools, in thatthe synthesis tools all produce correct hardware

Synthesis is done by matching VHDL code against templates or patterns.

It’s important to use idioms that your synthesis tools recognize.

Think like hardware: when you write VHDL, you should know what hardware youexpect to be produced by the synthesizer.

1.12. SYNTHESIZABLE VS NON-SYNTHESIZABLE CODE 120

Page 121: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.12.1 Wait For

Wait for length of time (UNSYNTHESIZABLE)

wait for 10 ns;

Reason: Delays through circuits are dependent upon both the circuit and itsoperating environment, particularly supply voltage and temperature. For example,imagine trying to build an AND gate that will have exactly a 2ns delay in allenvironments.

1.12.2 Initial Values

Initial values on signals (UNSYNTHESIZABLE)

signal bad_signal : std_logic := ’0’;

Reason: At powerup, the values on signals are random (except for some FPGAs).1.12.1 Wait For 121

Page 122: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.12.3 Assignments before Wait Statement

If a synthesizable clocked process has a wait statement, then the process mustbegin with a wait statement.

process

c <= a;

d <= b;

wait until rising edge(clk);

end process;Unsynthesizable

process

wait until rising edge(clk);

c <= a;

d <= b;

end process;Synthesizable

Reason: Cannot synthesize reasonble hardware that has the correct behavior.

In simulation, any assignments before the first wait statement will be executed inthe first delta-cycle. In the synthesized circuit, the signals will be outputs offlip-flops and will first be assigned values after the first rising-edge.

1.12.3 Assignments before Wait Statement 122

Page 123: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.12.4 Multiple “if rising edge” in Process

Multiple if rising edge statements in a process (UNSYNTHESIZABLE)

process (clk)

begin

if rising_edge(clk) then

q0 <= d0;

end if;

if rising_edge(clk) then

q1 <= d1;

end if;

end process;

Reason: The idioms for synthesis tools generally expect just a single ifrising edge statement in each process.

The simpler the VHDL code is, the easier it is to synthesize hardware.Programmers of synthesis tools make idiomatic (idiotic?) restrictions to make theirjobs simpler.

1.12.4 Multiple “if rising edge” in Process 123

Page 124: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.12.5 “if rising edge” and “wait” in SameProcess

An if rising edge statement and a wait statement in the same process(UNSYNTHESIZABLE)

process

begin

if rising_edge(clk) then

q0 <= d0;

end if;

wait until rising_edge(clk);

q0 <= d1;

end process;

Reason: The idioms for synthesis tools generally expect just a single type offlop-generating statement in each process.

1.12.5 “if rising edge” and “wait” in Same Process 124

Page 125: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.12.6 “if rising edge” with “else” Clause

The if statement has a rising edge condition and an else clause(UNSYNTHESIZABLE).

process (clk)

begin

if rising_edge(clk) then

q0 <= d0;

else

q0 <= d1;

end if;

end process;

Reason: The idioms for the synthesis tools expect a signal to be either registeredor combinational, not both.

1.12.6 “if rising edge” with “else” Clause 125

Page 126: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.12.7 While Loop with Dynamic Conditionand Combinational Body

A while loop where the condition is dynamic (depends upon a signal value) andthe body is combinational is unsynthesizable. The loop below is unsynthesizable:

process (a,b,c) begin

while a = ’1’ loop

z <= b and c;

end loop;

end process;

This loop is designed to be very small, but illustrate the problem. The loop itself isnon-sensical.

1.12.7 While Loop with Dynamic Condition and Combinational Body 126

Page 127: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

For Loop with Combinational Body

A for-loop with a combinational body is synthesizable, because the loop conditioncan be evaluated statically (at compile/elaboration time). The loop below issynthesizable:

process ( b, c ) begin

for i in 0 to 3 loop

z(i) <= b(i) and c(i);

end loop;

end process;

An equivalent while loop would require variables, which are an advanced topic(section 1.9).

While loops with dynamic conditions and clocked bodies are synthesizable, butare an example of an implicit state machine and are an advanced topic.

1.12.7 While Loop with Dynamic Condition and Combinational Body 127

Page 128: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.13 Guidelines for Desirable Hardware

Code that is synthesizable, but undesirable (i.e., bad coding practices):

• latches

• combinational loops

•multiple drivers for a signal

• asynchronous resets

• using a data signal as a clock

• using a clock signal as data

To prevent undesireable hardware, some synthesis tools will flag some of theseproblems as “unsynthesizable”.

1.13. GUIDELINES FOR DESIRABLE HARDWARE 128

Page 129: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Know Your Hardware

The most important guideline is: know what you want the synthesis tool to build foryou.

• For every signal in your design, know whether it should be a flip-flop orcombinational. Check the output of the synthesis tool see if the flip flops in yourcircuit match your expectations, and to check that you do not have any latchesin your design.

• If you cannot predict what hardware the synthesis tool will generate, then youprobably will be unhappy with the result of synthesis.

1.13. GUIDELINES FOR DESIRABLE HARDWARE 129

Page 130: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.13.1 Latches

Combinational if-then without elseprocess (a, b)

begin

if (a = ’1’) then

c <= b;

end if;

end process;• For a combinational process, every signal that is assigned to, must be assigned

to in every branch of if-then and case statements.

reason If a signal is not assigned a value in a path through a combinationalprocess, then that signal will be a latch.

note For a clocked process, if a signal is not assigned a value in a clock cycle,then the flip-flop for that signal will have a chip-enable pin. Chip-enable pinsare fine; they are available on flip-flops in essentially every cell library.

1.13.1 Latches 130

Page 131: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Signals Missing from Sensitivity List

process (a)

begin

c <= a and b;

end process;• For a combinational process, the sensitivity list should contain all of the signals

that are read in the process.

reason Gives consistent results across different tools. Many synthesis toolswill implicitly include all signals that a process reads in its sensitivity list. Thisdiffers from the VHDL Standard. A synthesis tool that adheres to thestandard will either generate an error or will create hardware with latches orflops clocked by data sigansl if not all signals that are read from are includedin the sensitivity list.

exception In a clocked process using an if rising edge, it is acceptable tohave only the clock in the sensitivity list

1.13.1 Latches 131

Page 132: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.13.2 Combinational LoopsA combinational loop is a cyclic path of dependencies through one or morecombinational processes.

process (a, b, c) begin

if a = ’0’ then

d <= b;

else

d <= c;

end if;

end process;

process (d, e) begin

b <= d and e;

end process;

b

d

c

e

b

• If you need a signal to be dependent on itself, you must include a registersomewhere in the cyclic path.•Some FPGA synthesis tools consider a combinational loop to be

unsynthesizable. We consider it to be synthesizable and bad-hardware,because the hardware is obvious and is obviously bad.

1.13.2 Combinational Loops 132

Page 133: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.13.3 Multiple Drivers

z <= a and b;

z <= c;

a

b

c

z

•Each signal should be assigned to in only one process. This is often called the“single assignment rule”.

reason Multiple processes driving the same signal is the same as havingmultiple gates driving the same wire. This can cause contention, tri-statevalues, and other bad things.

1.13.3 Multiple Drivers 133

Page 134: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Multiple Drivers Example

The example below shows how a “software style” structure that puts the resetcode in one process will cause multiple drivers for the signals y and z.

process beginwait until rising edge(clk);if reset = ’1’ theny <= ’0’;z <= ’0’;

end if;end process;

process beginwait until rising edge(clk);if reset = ’0’ thenif a = ’1’ thenz <= b and c;

elsez <= d;

end if;end if;

end process;

process beginwait until rising edge(clk);if reset = ’0’ thenif b = ’1’ theny <= c;

end if;end if;

end process;

1.13.3 Multiple Drivers 134

Page 135: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.13.4 Asynchronous Reset

In an asynchronous reset, the test for reset occurs outside of the test for the clockedge.

process (reset, clk)

begin

if (reset = ’1’) then

q <= ’0’;

elsif rising_edge(clk) then

q <= d;

end if;

end process;•All reset signals should be synchronous.

reason If a reset occurs very close to a clock edge, some parts of the circuitmight be reset in one clock cycle and some in the subsequent clock cycle.This can lead the circuit to be out of sync as it goes through the resetsequence, potentially causing erroneous internal state and output values.

1.13.4 Asynchronous Reset 135

Page 136: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.13.5 Using a Data Signal as a Clockprocess begin

wait until rising_edge(clk);

count <= count + 1;

end process;

process begin

waiting until rising_edge( count(5) );

b <= a;

end process;

count1D Q

clk

D Qa

(5)

b

•Data signals should be used only as data.

reason All data assignments should be synchronized to a clock. This ensuresthat the timing analysis tool can determine the maximum clock speedaccurately. Using a data signal as a clock clock signals can lead tounpredictable delays between different assignments, which makes itinfeasible to do an accurate timing analysis.

1.13.5 Using a Data Signal as a Clock 136

Page 137: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.13.6 Using a Clock Signal as Data

process begin

wait until rising_edge(clk);

count <= count + 1;

end process;

b <= a and clk;•Clock signals should be used only as clocks.

reason Clock signals have two defined values in a clock cycle and transition inthe middle of the clock cycle. At the register-transfer level, each signal hasexactly one value in a clock cycle and signals transition between values onlyat the boundary between clock cycles.

1.13.6 Using a Clock Signal as Data 137

Page 138: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.14 Bad VHDL Coding

This section lists some coding practices to avoid in VHDL unless you have a verygood reason.

1.14.1 Tri-State Buffers and Signals

‘Z’ as a Signal Valueprocess (sel, a0)

b <= a0 when sel = ’0’

else ’Z’;

end process;

process (sel, a1)

b <= a1 when sel = ’1’

else ’Z’;

end process;1.14. BAD VHDL CODING 138

Page 139: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

•Use multiplexers, not tri-state buffers.

reason Multiplexers are more robust than tri-state buffers, because tri-statebuffers rely on analog effects such as drive-strength and voltages that arebetween ’0’ and ’1’. Multiplexers require more area than tri-state buffers,but for the size of most busses, the advantage in a more robust design isworth the cost in extra area.

1.14.1 Tri-State Buffers and Signals 139

Page 140: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Inout and Buffer Port Modes

entity bad is

port (

io_bad : inout std_logic;

buf_bad : buffer std_logic

);

end entity;•Use in or out, do not use inout or buffer

reason inout and buffer signals are tri-state.

note If you have an output signal that you also want to read from, you might betempted to declare the mode of the signal to be inout. A better solution is tocreate a new, internal, signal that you both read from and write to. Then, youroutput signal can just read from the internal signal.

1.14.1 Tri-State Buffers and Signals 140

Page 141: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.14.2 Variables in Processes

process

variable bad : std_logic;

begin

wait until rising_edge(clk);

bad := not a;

d <= bad and b;

e <= bad or c;

end process;• In a process, use signals; do not use variables

reason The intention of the creators of VHDL was for signals to be wires andvariables to be just for simulation. Some synthesis tools allow some uses ofvariables, but when using variables, it is easy to create a design that works insimulation but not in real hardware. (section 1.9)

1.14.2 Variables in Processes 141

Page 142: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

1.14.3 Bits and Booleans as Signals

signal bad1 : bit;

signal bad2 : boolean;•Use std_logic signals, do not use bit or Boolean signals.

reason std_logic is the most commonly used signal type across synthesistools and simulation tools.

1.14.3 Bits and Booleans as Signals 142

Page 143: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Synthesizable, Good, and Bad VHDL

For each code fragment below, answer whether it is synthesizable. If the code issynthesizable, answer whether it follows good coding practices for synthesizablehardware.

1. process (clk) begin

if rising_edge(clk) then

q <= a;

else

q <= b;

end if;

end proces;Yes No

Synth?Good?

2. process (clk) begin

if rising_edge(clk) then

q1 <= d1;

end if;

if rising_edge(clk) then

q2 <= d2;

end if;

end proces;Yes No

Synth?Good?

1.14.3 Bits and Booleans as Signals 143

Page 144: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3. process (a,b) begin

if a = ’1’ then

q <= b;

end if;

end proces;Yes No

Synth?Good?

4. process (a, b) begin

if a = ’1’ then

q <= b;

else

q <= not q;

end if;

end proces;Yes No

Synth?Good?

1.14.3 Bits and Booleans as Signals 144

Page 145: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 2

Additional Features of VHDL

145

Page 146: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.1 Literals

2.1.1 Numeric LiteralsDescription Type Example 1 Example 2Decimal Integer 17 1023

Decimal Real 17.0 1023.1

Hexadecimal Integer 16#FF# 16#2F190#

Hexadecimal Real 16#FF.F# 16#2F1.90#

Binary Integer 2#1101# 2#011101#

Binary Real 2#1101.111# 2#0111.01#

Exponent Integer 17E+3 2#111#E3

Exponent Real 17.1E+3 2#11.1#E3

Underscore Integer 123 45 67 16#FF 3A#

2.1. LITERALS 146

Page 147: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.1.2 Bit-String Literals

Binary B"1101010" B"1101 1010"Octal O"3470100" O"45 23"Hexadecimal X"FF2300" X"Ff3dbF 23"

2.1.2 Bit-String Literals 147

Page 148: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.2 Arrays and Vectors

2.2.1 Declarations

VHDL arrays have:

• direction (to or downto)

• upper bound

• lower bound

signal a : std_logic_vector( 3 downto 0 );

signal b : std_logic_vector( 0 to 3 );

signal c : std_logic_vector( 1 to 4 );

2.2. ARRAYS AND VECTORS 148

Page 149: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Constant Arrays

To define a constant array:

constant a : array( 0 to 3 ) of integer

:= ( 10, 17, -31, 23 );

constant b : array( 0 to 3 ) of integer

:= ( 0 => 10, 1 => 17, 2 => -31, 3 => 23 );

constant c : array( 0 to 3 ) of integer

:= ( 0 => 10, 1 => 17, others => 23 );

2.2.1 Declarations 149

Page 150: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.2.2 Indexing, Slicing, Concatenation,Aggregates

Operations

Indexing an array to reference asingle element

a(0)

A slice or “discrete subrange” ofan array

a( 3 downto 2)

Concatenating an element ontoan array, or concatenation two ar-rays

’1’ & ab & a

Explicit arrays or “aggregates” ( ’0’, ’0’, ’1’ )( a(0), b(2), a(3) )

Aggregate with positional indices ( 0=>’0’, 2=>’X’, 1=>’U’ )

Aggregate with “others” key-word

( 0=>’0’, 3=>’1’, others=>’X’ )

2.2.2 Indexing, Slicing, Concatenation, Aggregates 150

Page 151: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Assignments1. The ranges on both sides of the assignment must be the same.

2. The direction (downto or to) of each slice must match the direction of thesignal declaration.

3. The direction of the target and expression may be different.

2.2.2 Indexing, Slicing, Concatenation, Aggregates 151

Page 152: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Assignments (cont’d)Declarationsa , b : std_logic_vector(15 downto 0);

ax, bx : std_logic_vector(0 to 15);Legal codeb (3 downto 0) <= a(15 downto 12);

bx(0 to 3) <= a(15 downto 12);

( b(3) , b(4) ) <= a(13 downto 12);

( bx(4), b(4) ) <= a(13 downto 12);Illegal codebx(0 to 3) <= a(12 to 15);

-- slice dirs must be same as decl, fails for a

c (3 downto 0) <= (a & b)( 3 downto 0);

-- may not index an expression

b(3) & b(2) <= a(12 to 13);

-- & may not be used on lhs

2.2.2 Indexing, Slicing, Concatenation, Aggregates 152

Page 153: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.3 Arithmetic

VHDL includes all of the common arithmetic operators and relations.

Use the VHDL arithmetic operators and let the synthesis tool choose the bestimplementation for you.

2.3.1 Arithmetic Packages

To do arithmetic with signals, use the numeric_std package.

numeric std supersedes earlier arithmetic packages, such asstd logic arith.

Use only one arithmetic package, otherwise the different definitions will clash andyou can get strange error messages.

We will describe arithmetic with the numeric std package.2.3. ARITHMETIC 153

Page 154: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.3.2 Arithmetic Types

Arithmetic may be done on three types of expressions:

integers Numeric values, such as 17

unsigned Unsigned vectors, such as signals defined as typeunsigned( 7 downto 0).

signed Signed vectors, such as signals defined as typesigned( 7 downto 0).

The types signed and unsigned are std_logic vectors on which you can dosigned or unsigned arithmetic and all of the operations that are supported bystd logic vectors.

2.3.2 Arithmetic Types 154

Page 155: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.3.3 Overloading of Arithmetic

The arithmetic operators +, -, and * are overloaded on signed vectors,unsigned vectors, and integers.

Declarationsu1, u2, u3 : unsigned( 7 downto 0);s1, s2, s3 : signed( 7 downto 0);

Target Src1/2 Src2/1 Exampleunsigned unsigned unsigned u3 <= u1 + u2; OKunsigned unsigned integer u3 <= u1 + 17; OKsigned signed signed s3 <= s1 + s2; OKsigned signed integer s3 <= s1 + -17; OK

— unsigned signed u3 <= u1 + s2; Fail

2.3.3 Overloading of Arithmetic 155

Page 156: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.3.4 Widths for Addition and Subtraction

•Sources may have different widths

• The target must be the same width as the widest source

Declarationsw1, w2, w3 : unsigned(7 downto 0) – widen1, n2, n3 : unsigned(3 downto 0) – narrow

Target Src1/2 Src2/1 Examplewide wide wide w3 <= w1 + w2; OKwide wide narrow w3 <= w1 + n2; OKwide wide int w3 <= w1 + 17; OK

narrow narrow narrow n3 <= n1 + n2; OKnarrow narrow int n3 <= n1 + 17; OKnarrow wide — n3 <= w1 + n2; Fail

These failures are caught at elaboration, which happens after typechecking.

2.3.4 Widths for Addition and Subtraction 156

Page 157: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Widths for Multiplication• The sources may be different widths

• the width of the result must be the sum of the widths of the sources

Declarationsv4a, v4b, v4c : unsigned( 3 downto 0 );v8 : unsigned( 7 downto 0 );v12 : unsigned( 11 downto 0 );

Target Src1/2 Src2/1 Example8-bits 4-bits 4-bits v8 <= v4a * v4b; OK12-bits 4-bits 8-bits v12 <= v4a * v8; OK4-bits 4-bits 4-bits v4c <= v4a * v; Fail

2.3.4 Widths for Addition and Subtraction 157

Page 158: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.3.5 Overloading of Comparisons•Comparisons are overloaded on arrays and integers.

• If both operands are arrays, both must be of the same type.

Declarationsu1, u2 : unsigned( 7 downto 0);s1, s2 : signed( 7 downto 0);

Src1/2 Src2/1 Exampleunsigned unsigned u1 >= u2 OKunsigned integer u1 >= 17 OKsigned signed s1 >= s2 OKsigned integer s1 >= 17 OK

unsigned signed u1 >= s1 Fail

2.3.5 Overloading of Comparisons 158

Page 159: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.3.6 Widths for Comparisons•Sources may have different widths

Declarationsw1, w2 : unsigned(7 downto 0) – widen1, n2 : unsigned(3 downto 0) – narrow

Src1/2 Src2/1 Examplewide — w1 >= n1 OK

narrow — n1 >= w2 OK

2.3.6 Widths for Comparisons 159

Page 160: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.3.7 Type Conversion

If you convert between two types of the same width, then no additional hardwarewill be generated.

unsigned ( val : std_logic_vector ) return unsigned;

signed ( val : std_logic_vector ) return signed;

to_unsigned( val : integer; width : natural) return unsigned;

to_signed ( val : integer; width : natural) return signed;

to_integer ( val : signed ) return integer;

to_integer ( val : unsigned ) return integer;

2.3.7 Type Conversion 160

Page 161: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Examples of Conversions

Declarationsu1, u2, u3 : unsigned( 7 downto 0);sn1, sn2, sn3 : signed( 7 downto 0);sw1, sw2, sw3 : signed( 8 downto 0);

Examplesu3 <= to unsigned( 17, 8 ); OKsn3 <= to signed( 17, 8 ); OKsw3 <= signed( "0" & u1 ); OKsn3 <= signed( u1 ); Badsw3 <= signed( "0" & u1) - signed( "0" & u2); OKsw3 <= signed( "0" & (u1 + u2)); OKsw3 <= signed( "0" & (u1 - u2)); Bad

The Bad examples above will typecheck and elaborate without any errors, butthey potentially will produce incorrect results.

2.3.7 Type Conversion 161

Page 162: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Resizing and Sign ExtensionThe function resize resizes vectors, performing sign extension if necessary,based upon the type of the argument. It is overloaded for different types ofarguments.

resize( v : std_logic_vector; width : natural ) return std_logic_vector;resize( u : unsigned ; width : natural ) return unsigned;resize( s : signed ; width : natural ) return signed;

Declarationsun1, un2 : unsigned( 4 downto 0);uw1, uw2 : unsigned( 7 downto 0);sn1, sn2 : signed( 4 downto 0);sw1, sw2 : signed( 7 downto 0);

Examplesuw1 <= resize( un1, 8 ); OKun1 <= resize( uw1, 4 ); OKsw1 <= resize( sn1, 8 ); OKsn1 <= resize( sw1, 4 ); OKsw1 <= resize( un1, 8 ); Failuw1 <= resize( sn1, 8 ); Fail

2.3.7 Type Conversion 162

Page 163: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Type Conversion and Array Indices

To use a signal as an index into an array, you must convert the signal into aninteger using the function to_integer.

Declarationssignal u : unsigned( 3 downto 0);signal v : std logic vector( 3 downto 0);signal a : std logic vector(15 downto 0);

Examplesa( to integer(u) ) Oka( to integer( unsigned(v) ) ) Okv(u) Faila( unsigned(v) ) Fail

2.3.7 Type Conversion 163

Page 164: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.3.8 Shift and Rotate Operations

Shift and rotate operations are described with three character acronyms:

〈 shift 〉 〈 left/right 〉 〈 arithmetic/logical 〉〈 rotate 〉 〈 left/right 〉

The shift right arithmetic (sra) operation preserves the sign of the operand, bycopying the most significant bit into lower bit positions.

The shift left arithmetic (sla) does the analogous operation, except that the leastsignificant bit is copied.

a sra 2 -- arithmetic shift of a by 2 bits

2.3.8 Shift and Rotate Operations 164

Page 165: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.3.9 Arithmetic Optimizations

Multiply by a constant power of two wired shift logical leftMultiply by a power of two shift logical leftDivide by a constant power of two wired shift logical rightDivide by a power of two shift logical right

Question: How would you implement: z <= a * 3?

2.3.9 Arithmetic Optimizations 165

Page 166: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.4 Types

2.4.1 Enumerated Types

VHDL supports enumerated types:

type color is (red, green, blue);

2.4. TYPES 166

Page 167: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.4.2 Defining New Array Types

When defining a new array type, the range may be left unconstrained:

type color is (red, green, blue);

type color_vector is array ( natural range <> ) of color;

We may then use the unconstrained array type as the basis for defining aconstrained array subtype:

subtype few_colors is color_vector( 0 to 3 );

subtype many_colors is color_vector( 0 to 1023 );

Note the use of subtype above. It is illegal to use type to define a constrainedarray in terms of an unconstrained array.

We can use type to define a constrained array directly:

type few_colors is array ( 0 to 3 ) of color;

2.4.2 Defining New Array Types 167

Page 168: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

2.4.2 Defining New Array Types 168

Page 169: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 3

Overview of FPGAs

3.1 Generic FPGA Hardware• This section: generic FPGA with 4 inputs per lookup table.

•Many real FPGAs have more (e.g., 6) inputs per lookup table.

•Principles described here are applicable in general, even as details differ.169

Page 170: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3.1.1 Generic FPGA CellFPGA “Cell” = “Logic Element” (LE) in Altera

= “Configurable Logic Block” (CLB) in Xilinx“LUT” = “lookup table”

= PLA (programmable logic array)

CE

S

R D Q

comb_data_in

ctrl_in

carry_in

carry_out

flop_data_outLUT

comb_data_out

flop_data_in

configurable 4:1 lookup table

configurable multiplexer

3.1.1 Generic FPGA Cell 170

Page 171: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Separate Comb and Flop

CE

S

R D Q

comb_data_in

ctrl_in

carry_in

carry_out

flop_data_outcomb

comb_data_out

flop_data_in

3.1.1 Generic FPGA Cell 171

Page 172: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Connect Comb and Flop

CE

S

R D Q

comb_data_in

ctrl_in

carry_in

carry_out

flop_data_outcomb

comb_data_out

flop_data_in

3.1.1 Generic FPGA Cell 172

Page 173: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Flopped and Unflopped Outputs

CE

S

R D Q

comb_data_in

ctrl_in

carry_in

carry_out

flop_data_outcomb

comb_data_out

flop_data_in

3.1.1 Generic FPGA Cell 173

Page 174: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3.1.2 Lookup TableA 4:1 lookup table is usually implemented as a memory array with 16 1-bitelements.

z = (a AND b) OR

(b AND NOT c) OR

(c AND NOT d)

z = NOT a

4-bit address 1-bit datad c b a z0 0 0 0 00 0 0 1 00 0 1 0 10 0 1 1 10 1 0 0 1

...1 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 1

d c b a z0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 1

...1 0 1 1 01 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 0

3.1.2 Lookup Table 174

Page 175: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3.1.3 Interconnect for Generic FPGA

Local ConnectionsNote: In these pictures, the space between tightly groupedwires sometimes disappears, making a group of wires appear tobe a single large wire.

3.1.3 Interconnect for Generic FPGA 175

Page 176: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Local Connections (Zoom Out)

3.1.3 Interconnect for Generic FPGA 176

Page 177: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

General-Purpose Wires and Carry Chains

General purpose interconnectconfigurable, slow

Carry chains and cascade chainsvertically adjacent cells, fast

3.1.3 Interconnect for Generic FPGA 177

Page 178: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3.1.4 Blocks of Cells for Generic FPGAColumn of cells in blocks

Two rows of blocksPath to connect cells

in different rows

3.1.4 Blocks of Cells for Generic FPGA 178

Page 179: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Connecting Through Cells

Cells that are not used for computation can be used as “wires” to shorten length ofpath between cells.

3.1.4 Blocks of Cells for Generic FPGA 179

Page 180: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3.1.5 Special Circuitry in FPGAsMemory Since the mid 1990s, almost all FPGAs have had special circuits for

RAM and ROM. These special circuits are possible because many FPGAs arefabricated on the same processes as SRAM chips. So, the FPGAs simplycontain small chunks of SRAM.

Microprocessors In 2001, some high-end FPGAs had one or more hardwiredmicroprocessors on the same chip as programmable hardware. In 2005, theXilinx-II Pro had 4 Power PCs and enough programmable hardware toimplement the first-generation Intel Pentium microprocessor.

Arithmetic Circuitry In 2001, FPGAs began to have hardwired circuits formultipliers and adders. Using these resources can improve significantly boththe area and performance of a design.

Input / Output Some FPGAs include special circuits to increase the bandwidthof communication with the outside world.

3.1.5 Special Circuitry in FPGAs 180

Page 181: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3.2 Area Estimation for FPGAs

This section describes three methods to estimate the number of FPGA cellsrequired to implement a circuit:

section 3.2.1 Rough estimate based simply upon the number of flip-flops andprimary inputs that are in the fanin of each flip-flop or output.

section 3.2.2 A more accurate, and more complex, technique that uses a greedyalgorithm to allocates as many gates as possible into the lookuptable of eachFPGA cell.

section 3.2.3 A technique to estimate the area for arithmetic circuits withregisters.

Each cell:

• LUT for any combinational function with up to four inputs and one output

•Carry-in and carry-out signals used only for arithmetic carries

• Flip-flop can be driven by LUT or separate input

3.2. AREA ESTIMATION FOR FPGAS 181

Page 182: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3.2.1 Area for Circuit with one TargetThis section gives a technique to esti-mate the number of FPGA cells requiredfor a purely combinational circuit withone output.

Question: What is the maximumnumber of inputs for a functionthat can be implemented withone LUT?

Question: Number of inputs fortwo LUTs?

Question: Three LUTs? Question: Four LUTs?

3.2.1 Area for Circuit with one Target 182

Page 183: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Single Target vs Multiple Targets

For a single target signal, this technique gives a lower bound on the number ofLUTs needed.

For multiple target signals, this technique might be an overestimate, because asingle LUT can be used in the logic for multiple target cells.

3.2.1 Area for Circuit with one Target 183

Page 184: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4:1 Mux in Two FPGA Cells

A 4:1 mux has 6 inputs, so it should fit into two FPGA cells.

But, there is no partitioning of the gatesinto two groups such that each grouphas at most 4 inputs and 1 output.

sel1

sel0

d0

d1

d2

d3

z

But, with some clever tricks, a 4:1 muxcan be implemented in two FPGA cells:

sel1

sel0

d0

d1

d2

d3

i

j

k

lm

n

o

z

3.2.1 Area for Circuit with one Target 184

Page 185: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3.2.2 Algorithm to Allocate Gates to Cells

This section presents an algorithm to allocate gates to FPGA cells for circuits with:

•multiple outputs

• combinational gates

• flip-flops

The algorithm mimics what a synthesis tool does in transforming a netlist ofgeneric gates into an FPGA:

Technology map Map groups of generic combinational gates into LUTs

Placement Assign each LUT and flip-flop to an FPGA cell

In addition to above, synthesis tools do the step of routing: connecting the signalsbetween FPGA cells.

Because we are working with general-purpose combinational gates, we cannotuse the carry-in and carry-out signals with the LUTs.

3.2.2 Algorithm to Allocate Gates to Cells 185

Page 186: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Overview of Algorithm

For each flip-flop and output: traverse backward through the fanin gathering asmuch combinational circuitry as possible into the FPGA cell.

Stopping conditions:

• flip-flop

•more than four inputs — However, have more than four signals as input, thenfurther back in the fanin, the circuit will collapse back to four or fewer signals.

3.2.2 Algorithm to Allocate Gates to Cells 186

Page 187: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Number of FPGA Cells (1)

Question: Map the circuit below onto generic FPGA cells.

Do not perform any algebraic optimizations. Use NC (no connect) for any unusedpins on the cells.

a

b

c

d

z

3.2.2 Algorithm to Allocate Gates to Cells 187

Page 188: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Number of FPGA Cells (2)

Question: Map the circuit below onto generic FPGA cells.

a

b

c

dz y

xe

f

g

h

i

Extra copy:a

b

c

dz y

xe

f

g

h

i

3.2.2 Algorithm to Allocate Gates to Cells 188

Page 189: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Number of FPGA Cells (3)In this question, the signal i becomes a new output.

Question: Map the circuit below onto generic FPGA cells.

a

b

c

dz

xe

f

g

h y

i

Extra copy:a

b

c

dz

xe

f

g

h y

i

3.2.2 Algorithm to Allocate Gates to Cells 189

Page 190: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

3.2.3 Area for Arithmetic Circuits

For arithmetic circuits, we take into account inputs, outputs, carry-in, and carry-outsignals.

1 lookup table can implement one 1-bitfull-adder

ci

a

b

co

sum

NC

NC

d0

d1

d2

d3

n lookup tables can implement one n-bit full-adder

a0

b0

ci

sum0

a1

b1 sum1

a2

b2 sum2

a3

b3 sum3

co

3.2.3 Area for Arithmetic Circuits 190

Page 191: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Two-Bit Adder

Question: How many lookup tables are needed for a two-bit adder?

a0

b0

a1

b1

ci

co

sum0

sum1

3.2.3 Area for Arithmetic Circuits 191

Page 192: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Adder with a Multiplexer

Question: How many lookup tables for an adder with a 2:1 mux on oneinput?

sel

a

b

c

ci

co

sum

sel

a

b

c

ci

co

sum

3.2.3 Area for Arithmetic Circuits 192

Page 193: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Arithmetic VHDL Code

Question: How many cells are needed for each of the code fragmentsbelow?

All signals are 8 bits.

z <= a + b;

z <= a + b + c;

process beginwait until rising_edge(clk);z <= a + b + c;

end process;

3.2.3 Area for Arithmetic Circuits 193

Page 194: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Arithmetic VHDL Code (Cont’d)

process beginwait until rising_edge(clk);a <= i_a;b <= i_b;c <= i_c;z <= a + b + c;

end process;

a <= i_a;b <= i_b;c <= i_c;process beginwait until rising_edge(clk);z <= a + b + c;

end process;

m <= a when sel=’0’else b;

process beginwait until rising_edge(clk);z <= m + c;

end process;3.2.3 Area for Arithmetic Circuits 194

Page 195: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 4

State Machines

195

Page 196: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.1 Notations

We will use a variety of notations to model our hardware:

Pseudocode For algorithms. Used early in the design process for sequentialbehaviour and high-level optimizations.

Dataflow diagrams Models the structure and behaviour of datapath-intensivecircuits.

State machines A variation on the conventional bubble-and-arrow style statemachines.

VHDL code For the real implementation.

4.1. NOTATIONS 196

Page 197: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2 Finite State Machines in VHDL

4.2.1 HDL Coding Styles for StateMachinesExplicit VHDL code contains a state signal. At most one wait statement per

process.

Explicit-Current The state signal represents the current state of the machineand the signal is assigned its next value in a clocked process.

Explicit-Current+Next there is a signal for the current state and anothersignal for the next state. The next-state signal is assigned its value in acombinational process or concurrent statement and is dependent upon thecurrent state and the inputs. The current-state signal is assigned its value ina clocked process and is just a flopped copy of the next-state signal.(“three-process” style)

Implicit There is no explicit state signal. At least one process has multiple wait

statements. Each wait statement corresponds to a single state (Advancedtopic not covered in this course).

4.2. FINITE STATE MACHINES IN VHDL 197

Page 198: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.2 State Encodings

Explicit state machines require a state signal. Before we can define a state signal,we must define values for the names of the states. For example, we might defineS0 to be "000" and S1 to be "001". The value for the name of state is called the“encoding” of the state. In hardware, each value is a bit-vector. There are a varietycommon encodings for states: binary, one-hot, Gray, and thermometer.

We can either define the encoding ourselves, or let the synthesis tool choose theencoding for us. If we define the encoding, then the type for the states isstd logic vector. To let the synthesis choose the encoding, we create anenumerated type to the states, where each state is an element of the type. Thesynthesis tool then chooses a specific binary value for each state. Usually, thesynthesis tool has heuristics to choose either a binary or one-hot encoding.

This section reserved for your reading pleasure

4.2.2 State Encodings 198

Page 199: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.3 Traditional State-Machine Notation

This section reserved for your reading pleasure

4.2.3 Traditional State-Machine Notation 199

Page 200: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.4 Our State-Machine Notation

A simple extension to Mealy machines, allow both:• combinational assignments z = 0• registered assignments z’ = 0

Combinationalassignments

s0

s1 s2

s3

z=1 !aa z=0

z=0 z=0

z=0

Combinational assignments0 1 2 3 4 5

a

state

z

S0

1

1

S1 S3 S0 S2 S3

0 0 0 0 0

0

6

S0

0

Registered assignments0 1 2 3 4 5

a

state

z

S0

1

1

S1 S3 S0 S2 S3

0 0 0 0 0

0

6

S0

Registeredassignments

s0

s1 s2

s3

!az’=1 !a z’=0

z’=0 z’=0

z’=0

a

4.2.4 Our State-Machine Notation 200

Page 201: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.5 Bounce ExampleCombinational Assignments

0 1 2 3 4

a

state

z

S0

1

1

S1 S0 S2 S0

0 0 1

0

s0s1 s2

z=1a z=0!a

z=0 z=1

Registered Assignments

0 1 2 3 4

a

state

z

S0

1

1

S1 S0 S2 S0

0 0 1

0

s0s1 s2

z’=1a z’=0!a

z’=0 z’=1

Explicit-Current Coding Style

4.2.5 Bounce Example 201

Page 202: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Combinational Assignments

s0s1 s2

z=1a z=0!a

z=0 z=1

process (clk) beginif rising_edge(clk) thencase state iswhen S0 =>if a = ’1’ thenstate <= S1;

elsestate <= S2;

end if;when others =>state <= S0;

end case;end if;

end process;process (state, a) beginif (state = S0 and a = ’1’)

or (state = S2)thenz <= ’1’;

elsez <= ’0’;

end ifend process;

Registered Assignments

s0s1 s2

z’=1a z’=0!a

z’=0 z’=1

process (clk) beginif rising_edge(clk) thencase state iswhen S0 =>if a = ’1’ thenstate <= S1;

elsestate <= S2;

end if;when others =>state <= S0;

end case;end if;

end process;process beginwait until rising_edge(clk);if (state = S0 and a = ’1’)

or (state = S2)thenz <= ’1’;

elsez <= ’0’;

end ifend process;

4.2.5 Bounce Example 202

Page 203: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Additional Coding OptionsCombinational Assignments

s0s1 s2

z=1a z=0!a

z=0 z=1

process (clk) beginif rising_edge(clk) thencase state iswhen S0 =>if a = ’1’ thenstate <= S1;

elsestate <= S2;

end if;when others =>state <= S0;

end case;end if;

end process;z <= ’1’ when (state = S0 and a = ’1’)

or state = S2else ’0’;

Registered Assignments

s0s1 s2

z’=1a z’=0!a

z’=0 z’=1

process (clk) beginif rising_edge(clk) thencase state iswhen S0 =>if a = ’1’ thenz <= ’1’;state <= S1;

elsez <= ’0’;state <= S2;

end if;when S1 =>z <= ’0’;state <= S0;

when others =>z <= ’1’;state <= S0;

end case;end if;

end process;

4.2.5 Bounce Example 203

Page 204: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Explicit-Current+NextCombinational Assignments

s0s1 s2

z=1a z=0!a

z=0 z=1

process (clk) beginif rising_edge(clk) thenst <= next_st;

end if;end process;next_st<= S1 when st = S0 and a = ’1’else S2 when st = S0else S0;

z <= ’1’ when (st = S0 and a = ’1’)or (st = S2)

else ’0’;

Registered Assignments

s0s1 s2

z’=1a z’=0!a

z’=0 z’=1

process (clk) beginif rising_edge(clk) thenst <= next_st;

end if;end process;next_st<= S1 when st = S0 and a = ’1’else S2 when st = S0else S0;

process (clk) beginif rising_edge(clk) thenif (st = S0 and a = ’1’)

or (st = S2)thenz <= ’1’;

elsez <= ’0’;

end if;end if;

end process;

4.2.5 Bounce Example 204

Page 205: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

ImplicitCombinational Assignments

s0s1 s2

z=1a z=0!a

z=0 z=1

Note: Implicit statemachines do not supportcombinational assignments,because an implicit statemachine is a clocked processand in a clocked process, allassignments are registered.Note: Implicit statemachines are an advancedtopic and are not covered inECE-327.

Registered Assignments

s0s1 s2

z’=1a z’=0!a

z’=0 z’=1

process beginwait until rising_edge(clk); -- S0if a = ’1’ thenz <= ’1’;wait until rising_edge(clk); -- S1z <= ’0’;

elsez <= ’0’;wait until rising_edge(clk); -- S2z <= ’1’;

end if;end process;

4.2.5 Bounce Example 205

Page 206: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.6 Registered AssignmentsCombinational assignments Appear to happen instantaneously.Registered assigments Clock-cycle boundary between when inputs are

sampled and when target signal is driven.

VHDL and FSMs use different techniques to achieve the same behaviour.

Use a registered assignment based on the state to illustrate.

S0

S1

z’ = 1;

z’ = 0;

process begin

wait until re(clk);

if state = S0 then

z <= 1;

else

z <= 0;

end if;

end process;FSM Assignment is executed before the clock edge.

Delay driving the output until after clock edge.VHDL Assignment is executed after the clock edge.

Sample the old (visible) value of registered inputs from before the clock edge.4.2.6 Registered Assignments 206

Page 207: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Registered Assignments in State Machines

S0

S1

z’ = 1;

z’ = 0;

state’ = S0;

z’ = 1;

z’ = 0;

state’ = S1;

state

z

S0 S1 S0 S1 S0

1 0 1 0

clk

10ns 30ns 50ns 70ns 90ns

state

z

S1 S0

1

clk

state asn z asn

0

50ns

4.2.6 Registered Assignments 207

Page 208: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Registered Assignments in VHDL

p_z1 : process beginwait until re(clk);if state = S0 thenz <= 1;

elsez <= 0;

end if;end process;

p_z2 : process beginif re(clk) thenif state = S0 thenz <= 1;

elsez <= 0;

end if;end if;

end process;

4.2.6 Registered Assignments 208

Page 209: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Explicit State Machines (p z1, p z2)

state

z

S1 S0

1

clk

proc_state p_z1, p_z2

0

50ns

+1δ +2δ

S1

Delta-cycle simulation

state

z

S0 S1 S0 S1 S0

1 0 1 0

clk

10ns 30ns 50ns 70ns 90ns

RTL simulation

4.2.6 Registered Assignments 209

Page 210: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.7 More Notation

4.2.7.1 Extension: Transient States

ay = 1;

z = 2;!a

y = 1;

z = 3;

S0

S1 S2

a z = 2; !a z = 3;

S0

S1 S2

y = 1;

With transient-state, write y = 1 justonce.

4.2.7 More Notation 210

Page 211: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Transient States with RegisteredAssignments

•Syntactically, registered assigments may appear before combinationalassignments.

•Semantically, the effect of the registered assignments occurs after thecombinational assignments.

ay’ = 1;

z = 2;!a

y’ = 1;

z = 3;

S0

S1 S2

a z = 2; !a z = 3;

S0

S1 S2

y’ = 1;

0 1 0 1

a

state

y

S0

1

1

S1 S0 S2

1

0

z 2 3

4.2.7 More Notation 211

Page 212: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.7.2 Assignments within States

Assignments may appear within states.

1. If all outgoing edges have the same assignment, then the assignment may bemoved into the state.

The three state machines below all have the same behaviour.

s1

s2 s3

w = 0;

x = 1;

y = 2;

z’= 3;

x = 1;

y = 5;

z’= 3;w = 0;

y = 2;y = 5;

x = 1;

z’= 3;

s1

s2 s3

s1

x = 1;

w = 0;

y = 2;y = 5;

z’ = 3;

s2 s3

4.2.7 More Notation 212

Page 213: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Assignments within States (Cont’d)2. If all incoming edges have the same registered assignment, then the

assignment may be transformed into a combinational assignment and movedinto the state.

The three state machines below all have the same behaviour.

s1 s2

s3

w = 0;

x = 1;

y = 2;

z’= 3;

x = 1;

y = 5;

z’= 3;

s1 s2

w = 0;

y = 2;y = 5;

x = 1;

z’= 3;

s3

s1 s2

s3

w = 0;

y = 2;

z = 3;

y = 5;

x = 1;

4.2.7 More Notation 213

Page 214: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Assignments within States (Cont’d)

As another example to illustrate moving assignments between edges and states,the three machines below have the same behaviour:

s1 s2

x’ = 1; x’ = 1;

s3

s4 s5

s1 s2

s3

x = 1;

s4 s5

s1 s2

x = 1; x = 1;

s3

s4 s5

4.2.7 More Notation 214

Page 215: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.7.3 Conditional ExpressionsThe FSMs below have the same behaviour:

S0

S1

z = ba z = c!a

S0

S1

if a then

z = b

else

z = c

The FSMs below have the same behaviour:

S0

S1

z = ba !a

S0

S1

if a then

z = b

4.2.7 More Notation 215

Page 216: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.7.4 Default ValuesCombinational

default: z=0

S0

S1S2

z=1

a !a

With default values

S0

S1S2

z=1

a !a

Equivalent FSM without default values

4.2.7 More Notation 216

Page 217: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Default Values: Registers

The semantics define that if a registered variable is not assigned a value in a clockcycle, then it holds its previous value.

Defaultexpression Behaviour when not assigned a valuenone z holds its previous value.z’ = a z is assigned a.z’ = ’-’ z is unconstrained.

4.2.7 More Notation 217

Page 218: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Default Value: Registered Assignment

default: z’ = 99

S0

S1S2

z’=a

z’=b

With default values

S0

S1S2

z’=a

z’=b

Equivalent FSM without default values

0 1

a

state

b

S0 S1 S0 S1

z

S2

2 3 4 5

S0S2

6 7

0 1 2 3 4 5 6 7

10 11 12 13 14 15 16 17

4.2.7 More Notation 218

Page 219: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Default Value: Unconstrained Register

default: z’ = ’−’

S0

S1S2

z’=a

z’=a

With default values

S0

S1S2

z’=a

z’=a

Optimized FSM

S0

S1S2

Simplified FSM

4.2.7 More Notation 219

Page 220: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.8 Semantic and Syntax Rules

Inputs, Combinational, RegisteredThere are three categories of variables in FSMs. Each category has its own rulesfor how and when the variables are updated.

Inputs Values are updated every clock cycle.

Combinational If a variable is not assigned a value in a clock cycle, then itsvalue is unconstrained.

Registered If a variable is not assigned a value in a clock cycle, then it holds itsprevious value.

If there is any ambiguity about whether a signal is an input, then it should bedeclared as an input.

4.2.8 Semantic and Syntax Rules 220

Page 221: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Multiple Assignments to Same Signal

For a sequence of transitions within the same clock cycle, only the last assigmentto each signal is visible.

S0

S1

y = 1; z’ = 3

y = 2;

z’ = 5

0 1

state

y

S0

5

2

S1

z

4.2.8 Semantic and Syntax Rules 221

Page 222: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Summary of Semantic Rules1. Signals take on the value of the last assignment that is executed in a clock

cycle.

2. Combinational assignments become visible immediately.

3. Registered assignments become visible in the next clock cycle.

4. If a combinational signal is not assigned to in a given clock cycle, then the valueof that signal is unconstrained (in other words, arbitrary, non-deterministic, ordon’t-care).

4.2.8 Semantic and Syntax Rules 222

Page 223: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Syntax Rules1. For a given signal, it must be that either all assignments are combinational or all

assignments are registered.It is illegal to have both combinational and registered assignments to the samesignal. The reason is that this will lead to unsynthesizable code, because asignal cannot be both combinational and registered.

2. Within a clock cycle, a combinational signal must not be written to after it hasbeen read.Violating this rule will lead to combinational loops.

3. Completness of transitions: The conditions on the outgoing edges from a statemust cover all possibilities. That is, from a given state, it must always bepossible to make a transition. This includes a self-looping transition back to thestate itself.

Additional guidelines:1. Within a clock cycle, a combinational signal should be assigned to before it is

read.Violating this guideline will lead to non-deterministic behaviour, because thevalue of a combinational signal is unconstrained in a clock cycle until it has beenwritten to.

4.2.8 Semantic and Syntax Rules 223

Page 224: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Deterministic vs Non-Deterministicdeterministic Exactly one outgoing transition is enabled (condition is true)

non-deterministic Multiple outgoing transitions are enabled; machine randomlychooses which transtion to take

• Our state machines may be non-deterministic.

• Non-determinism happens when multiple outgoing transitions are enabled atthe same time.

• Non-determinism is sometimes useful in specifications and high-level models.

• Real hardware is deterministic

(unless you are building a quantum computer)

• For real hardware, your transitions must be mutually exclusive.

4.2.8 Semantic and Syntax Rules 224

Page 225: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.2.9 Reset

All circuits should have a reset signal that puts the circuit back into a good initialstate. However, not all flip flops within the circuit need to be reset. In a circuit thathas a datapath and a state machine, the state machine will probably need to bereset, but datapath may not need to be reset.

This section reserved for your reading pleasure

4.2.9 Reset 225

Page 226: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Reset with Explicit-Current

process (clk) beginif rising_edge(clk) thencase state iswhen S0 =>if a = ’1’ thenz <= ’1’;state <= S1;

elsez <= ’0’;state <= S2;

end if;when S1 =>z <= ’0’;state <= S0;

when others =>z <= ’1’;state <= S0;

end case;end if;

end process;

process (clk) beginif rising_edge(clk) thenif reset = ’1’ thenstate <= S0;

elsecase state iswhen S0 =>if a = ’1’ thenz <= ’1’;state <= S1;

elsez <= ’0’;state <= S2;

end if;when S1 =>z <= ’0’;state <= S0;

when others =>z <= ’1’;state <= S0;

end case;end if;

end if;end process;

4.2.9 Reset 226

Page 227: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Reset with Explicit-Current+Next

Without Reset

process (clk) beginif rising_edge(clk) thenst <= next_st;

end if;end process;next_st<= S1 when st = S0 and a = ’1’else S2 when st = S0else S0;

z <= ’1’ when (st = S0 and a = ’1’)or (st = S2)

else ’0’;

With Reset

process (clk) beginif rising_edge(clk) thenif reset = ’1’ thenst <= S0;

elsest <= next_st;

end if;end if;

end process;next_st<= S1 when st = S0 and a = ’1’else S2 when st = S0else S0;

z <= ’1’ when (st = S0 and a = ’1’)or (st = S2)

else ’0’;

4.2.9 Reset 227

Page 228: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Introduction to State Machines

Do the state-machine fragments below have the same behaviour?

S1

a !a

b = 1 b = 0

S0

c’ = b c’ = b

S1

a !a

b = 1 b = 0

S0

c = b

4.2.9 Reset 228

Page 229: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.3 LeBlanc FSM Design Example

4.3. LEBLANC FSM DESIGN EXAMPLE 229

Page 230: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.3.1 State Machine and VHDL

S0

S1 S2

a!a

S3

z’ = b - c z’ = b + c

type state_ty is (S0, S1, S2, S3);signal state : state_ty;

process beginwait until rising_edge(clk);ifz <= b - c;

z <= b + c;end if;

end process;

process beginwait until rising_edge(clk);if reset = ’1’ thenstate <= S0;

elsecase state iswhen S0 =>if a = ’0’ thenst <= S1;

elsest <= S2;

end if;when

st <= S3;when S3 =>st <= S0;

end case;end if;

end process;

4.3.1 State Machine and VHDL 230

Page 231: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Datapath + Control

next

state

dp

ctrlstate

Ctrl

Datapath

•Control circuitry– Compute next state (sequencing between states)– Drive control inputs to datapath

• From datapath to control:

– Usually 1-bit signals

– Outputs of comparators

– External inputs

– etc.

• From control to datapath:

– Multiplexer select lines

– Chip-enables for registers

– Operations for multifunction datap-ath components.

4.3.1 State Machine and VHDL 231

Page 232: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Hardware

S0

S1 S2

a!a

S3

z’ = b - c z’ = b + c

a

b

c

next

state

dp

ctrlstate

Ctrl

CE

D Q z

reset

4.3.1 State Machine and VHDL 232

Page 233: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.3.2 State Encodings

With 7 states

Binary One-Hot0 000 00000011 001 00000102 010 00001003 011 00010004 100 00100005 101 01000006 110 1000000

4.3.2 State Encodings 233

Page 234: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Le Blanc in Binary

S0

S1 S2

a!a

S3

z’ = b - c z’ = b + c

default: z’ = ’-’

______type state_ty issignal state : state_ty;

S0 :S1 :S2 :S3 :

process beginwait until rising_edge(clk);ifz <= b - c;

z <= b + c;end if;

end process;

process beginwait until rising_edge(clk);if reset = ’1’ thenstate <=

else

4.3.2 State Encodings 234

Page 235: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

LeBlanc in Optimized Binary

Define a custom encoding to simplify the circuitry needed to recognize thecondition that the system is in either S1 or S2.

S0

S1 S2

a!a

S3

z’ = b - c z’ = b + c

default: z’ = ’-’

signal state :S0 :S1 :S2 :S3 :

process beginwait until rising_edge(clk);ifz <= b - c;

z <= b + c;end if;

end process;

process beginwait until rising_edge(clk);if reset = ’1’ thenstate <=

else

4.3.2 State Encodings 235

Page 236: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Optimized Binary Le Blanc in Hardware

a

b

c

state

z

reset

4.3.2 State Encodings 236

Page 237: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

One-Hot LeBlanc

S0

S1 S2

a!a

S3

z’ = b - c z’ = b + c

default: z’ = ’-’

signal state :S0 :S1 :S2 :S3 :

process beginwait until rising_edge(clk);ifz <= b - c;

z <= b + c;end if;

end process;

process beginwait until rising_edge(clk);if reset = ’1’ thenstate <=

else

4.3.2 State Encodings 237

Page 238: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

One-Hot Le Blanc in Hardware

S0

S1 S2

a!a

S3

z’ = b - c z’ = b + c

default: z’ = ’-’

a

b

c

reset

z

4.3.2 State Encodings 238

Page 239: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.4 Parcels• “Parcel” = basic unit of data in a system

•ExamplesSystem Parcel

Microprocessor InstructionCar factory Car

•A parcel flows through a system

•A parcel may be composed of multiple componentsParcel Components

Instruction Opcode, operands, resultCar Doors, windows, engine, etc.

4.4. PARCELS 239

Page 240: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.4.1 Bubbles and Throughput•Between each pair of parcels is a sequence of zero or more bubbles

α β γ

bubbles bubblesparcel parcel parcel

Bubble : invalid or garbage data that must be ignored

•Each system has a requirement for minimum number of bubbles betweenparcels• Throughput: number of parcels per clock cycle

α β γ δ ε

2 bubbles 2 bubbles 2 bubbles 2 bubbles

throughput = 1 parcel / 3 clock cycles= 1/3 parcels per clock cycle

α β γ δ

2 bubbles 4 bubbles 3 bubbles

12 clock cycles

throughput = 3 parcels / 12 clock cycles= 1/4 parcels per clock cycle

4.4.1 Bubbles and Throughput 240

Page 241: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Maximum and Actual Throughput

Maximum Throughput The maximum rate of parcels per cycle (minimumnumber of bubbles) at which the system will work correctly.

usually: max throughput = 1/(minimum number of bubbles + 1)

Actual Throughput The actual rate at which the environment sends parcels tothe system.

Actual throughput must be less-than-or-equal-to maximum throughput.Actual number of bubbles must be greater -than-or-equal-to minimum number of bubbles.

4.4.1 Bubbles and Throughput 241

Page 242: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Max Tput: Pipelining and Superscalar

Question: Label each of the arrows and dots below with one of:Unpipelined, Pipelined, Fully-pipelined, or Superscalar

0 11/latency

Maximum throughput

As an advanced topic, some systems with both combinational inputs and outputsuse an area optimization that reduces the maximum throughput of an unpipelinedsystem to be 1/(latency+1).

4.4.1 Bubbles and Throughput 242

Page 243: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

FSMs, Latency, and Tput

Question: What are the latency and maximum throughput of the FSMbelow?

S0

S1 S2

a!a

S3

p’ = b - c p’ = b + c

S3

z’ = p + c

Answer:

b

a

p

c

z

0 1 2 3 4 5 6 7 8 9

Latency

Throughput

4.4.1 Bubbles and Throughput 243

Page 244: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Actual Throughput: Constant and Variable

Two categories of actual throughput:

Constant Throughput Always the same number of bubbles between parcels.

Often actual number of bubble is the minimum number of bubbles.

Choose actual throughput = maximum throughput.

α β γ δ ε

2 bubbles 2 bubbles 2 bubbles 2 bubbles

Variable Throughput The number of bubbles changes over time.

Usually the number of bubbles is unpredictable.

Actual number of bubbles must be at least as great as minimum required.

α β γ δ

2 bubbles 4 bubbles 3 bubbles

4.4.1 Bubbles and Throughput 244

Page 245: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.4.2 Parcel Schedule

Actual Throughput and Parcel ScheduleTo reduce confusion about the meaning of “throughput”, we will use:

• “throughput” means “maximum possible throughput”

• “parcel schedule” means “actual throughput”

• “as soon as possible (ASAP) parcel schedule” means actual throughput isconstant and is the maximum possible

• “unpredictable number of bubbles” means actual throughput is variable

4.4.2 Parcel Schedule 245

Page 246: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Parcel Schedule and FSM Patterns

"Trunk" derived from

computation for

one parcel

Outer loop derived from

parcel scheduleS0

bubbleparcel

Outer loop derived from

parcel schedule

"Trunk" derived from

computation for

one parcel

ASAP parcels Unpredictable number of bubbles

4.4.2 Parcel Schedule 246

Page 247: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.4.3 Valid Bits

When the parcel schedule is unpredictable number of bubbles, we need amechanism to distinguish between a parcel and a bubble.

Most common solution: valid bit protocol.

α β γ δi_data

i_valid

α β γo_data

o_valid

4.4.3 Valid Bits 247

Page 248: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

State Encodings

ASAP Parcels One-hot, binary, or custom.

Bubbles Valid bits

Hardware implementation of one-hot:

reset

Hardware implementation of valid bits:

i_valid o_valid

reset

4.4.3 Valid Bits 248

Page 249: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.5 PseudocodeWe use pseudocode to describe multi-step computation (e.g., algorithms).

Declarations We must declare “special”variables.

Inputs Value might change in eachclock cycle.

Interpcl section 4.7 Used to communi-cate between parcels

Outputs

If-then-else

While loop

For loop

Repeat-until loop

Assignments

Expressions Arithmetic, logical, arrays,etc.

Exampleinput: a, b;

output: z;

p = a + b;

for i in 0 to 3 {p = p + b;

}z = p;

4.5. PSEUDOCODE 249

Page 250: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Pseudocode Semantics•Executed sequentially: target is updated when the assignment is executed.

•All assignments are instantenous: no reg vs comb.

•No notion of time or clock cycles.

Core vs System•Pseudocode describes the core of a computation. It does not show the parcel

schedule.

•But, with a finite sequence of parcels, the pseudocode may show more than 1parcel.

• FSM for core does not show i valid and o valid.

• FSM for system (including parcel schedule) does show i valid and o valid.

4.5. PSEUDOCODE 250

Page 251: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.6 LeBlanc with Bubbles

Le Blanc with a parcel schedule of unpredictable number of bubbles.

a!a

z’ = b - c z’ = b + c

4.6. LEBLANC WITH BUBBLES 251

Page 252: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

VHDL Code

process beginwait until rising_edge(clk);if reset = ’1’ thenstate <=

else

a!a

z’ = b - c z’ = b + c

S1 S2

S3

S0

!i_vi_v

signal state :S0 :S1 :S2 :S3 :

process beginwait until rising_edge(clk);ifz <= b - c;

z <= b + c;end if;

end process;

4.6. LEBLANC WITH BUBBLES 252

Page 253: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.7 Interparcel Variables and Loops

4.7.1 Introduction to Looping Le Blanc

Two new concepts:

• Inter-parcel variables

•Outer loop around “trunk”

Inter-parcel variables are used to communicate data between parcels.

Until now, all of our variables have been intra-parcel: used within a single parcel:

All intra-parcel varsz = a + b + c

“Total is an inter-parcel variableTotal = Total + a + b

4.7. INTERPARCEL VARIABLES AND LOOPS 253

Page 254: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.7.2 Pseudo-Code

4.7.2 Pseudo-Code 254

Page 255: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Pseudo-Code

Declarations for the examples:inputs a, b, c;

outputs z;

Below, “T” stands for “total”.

Simple

if a then {z = b + c;

} else {z = b - c;

}

Inter-parcel var T

interpcl T;

if a then {T = T + b + c;

} else {T = T + b - c;

}

Loop and inter-parcel var

interpcl T;

T = 0;

for i in 0 to 127 {if a then {

T = T + b + c;

} else {T = T + b - c;

}}

4.7.2 Pseudo-Code 255

Page 256: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.7.3 State MachineDesign Patterns

ASAP Parcels Unpredictable number of bubbles

4.7.3 State Machine 256

Page 257: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

State Machine

ASAP Parcels

S1 S2

a!a

S3

i’=i+1

S4

Total’ = Total + b + cTotal’ =

Total + b - c

i < 128 i ≥ 128

Total’ = 0

i’ = 0

Unpredictable number of bubbles

a!a

i’=i+1

Total’ = Total + b + cTotal’ =

Total + b - c

4.7.3 State Machine 257

Page 258: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.7.4 VHDL Code for Loop and Bubbles

v(0) <= i_v;process beginwait until re(clk);if reset = ’1’ thenv(1 to 4) = (others => ’0’);

else

end if;end process;

process beginwait until re(clk);if reset = ’1’ thentotal = (others => ’0’);

elsif v(4) and i >= 128 thentotal = (others => ’0’);

elsif v(1)=’1’ thentotal = b - c;

elsif v(2) thentotal = b + c;

end if;end process;

process beginwait until re(clk);if reset = ’1’ theni = (others => ’0’);

elsif v(4) and i >= 128 theni = (others => ’0’);

elsif v(3)=’1’ theni = i + 1;

end if;end process;

4.7.4 VHDL Code for Loop and Bubbles 258

Page 259: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.8 Memory Arrays and RTL Design4.8.1 Memory Operations

Read of MemoryHardware

WE

A

DI

DO a doM

clk

we

Behaviour

clk

αaa

M(αa)

we

do

αd

FSM

4.8. MEMORY ARRAYS AND RTL DESIGN 259

Page 260: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Write to Memory

Hardware WE

A

DI

DO aM

clk

di

we

do

Behaviour

clk

αaa

M(αa)

αd

we

di

do

FSM

4.8.1 Memory Operations 260

Page 261: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Dual-Port Memory

Hardware

a0M

clk

di0

we WE

A0

DI0

DO0

A1 DO1 a1 do1

do0

Behaviourclk

αaa0

M(αa)

αd

we

di0

βaa1

do0

M(βa) βd

do1

FSM

4.8.1 Memory Operations 261

Page 262: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.8.2 Memory Arrays in VHDLentity mem isgeneric (data_width : natural := 8;addr_width : natural := 7

);port (clk : in std_logic;wr_en : in std_logic -- write enableaddr : in unsigned( add_width - 1 downto 0); -- addressi_data : in data; -- input datao_data : out data -- output data

);end mem;

architecture main of mem istype mem_type is array (2**addr_width-1 downto 0) of

std_logic_vector(data_width - 1 downto 0) ;signal mem : mem_type ;

beginprocess (clk)beginif rising_edge(clk) then

if wr_en = ’1’ thenmem( to_integer( addr) ) <= i_data ;

end if ;o_data <= mem( to_integer( addr ));

end if ;end process;

end main;

4.8.2 Memory Arrays in VHDL 262

Page 263: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.8.3 Using Memory

Pseudocode

M[i] = a;

p = M[i+1];

FSM

S2

S1

Both vars are

VHDLu_mem : entity work.memport map (clk => clk,wr_en =>addr =>i_data =>o_data =>

);

mem_wr_en <= ’1’ whenelse ’0’;

mem_addr <= i whenelse i + 1;

Hardware: WE

A

DI

DO

1

i

ctrl

pM

4.8.3 Using Memory 263

Page 264: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.8.3.1 Writing from Multiple Vars

FSM

S1

S2

M’[i] = a

M’[i+1] = b

Hardware

WE

A

DI

DO

1

i

ctrl

pM

a

b

u_mem : entity work.memport map (clk => clk,wr_en => mem_wr_en,addr => mem_addr,i_data => mem_i_data;o_data => p;

);

mem_wr_en <= ’1’ when state = S1or state = S2;

mem_addr <= i when state = S1else i + 1;

mem_i_data <= a when state = S1else b;

4.8.3 Using Memory 264

Page 265: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.8.3.2 Reading from Memory to MultipleVariables

Pseudocodep = M[i]

q = a

...

p = b

q = M[i+1]

FSM

S1

p’ = M[i]

q’ = a

S2

q’ = M[i+1]

p’ = b

Hardware

WE

A

DI

DO

1

i

M

p

q

a

b

ctrl

Question: How should we connect memory to p and q?

4.8.3 Using Memory 265

Page 266: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Multivar Reading (cont’d)

S2

mem_o_data’ = M[i]

mem_o_data’ = M[i+1]

p = mem_o_data

q = a

S1

S3

q = mem_o_data

p = b

u_mem : entity work.memport map (clk => clk,wr_en => mem_wr_en,addr => mem_addr,i_data => mem_i_data;o_data => mem_o_data;

);

mem_wr_en <= ’0’;

mem_addr <= i when state = S1else i + 1;

p <= mem_o_data when state = S2else b;

q <= a when state = S2else mem_o_data;

4.8.3 Using Memory 266

Page 267: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.8.3.3 Example: Maximum Value Seen soFar

Design an FSM that iterates through a memory array, replacing each value withthe maximum value seen so far.

Example execution:Initial value of M 4 3 2 6 7 3 5

Final value of M

4.8.3 Using Memory 267

Page 268: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Pseudocode #1i = 0

max = M[i]

while i < 128 {i = i + 1

b = M[i]

if max < b {max = b

} else {M[i] = max

}}

Pseudocode #2i = 0

while i < 128 {

if max < b {max = b

} else {M[i] = max

}}

4.8.3 Using Memory 268

Page 269: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

FSM #1

S0

i’=0

max’ = M[i]

i < 128i ≥ 128

i’=i+1

S1

b’ = M[i]

S2

max < b max ≥ b

max’=b M’[i]=max

FSM #2i’=0

i < 128i ≥ 128

S2

max < b max ≥ b

max’=b M’[i]=max

S0

4.8.3 Using Memory 269

Page 270: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.8.4 Build Larger Memory from Slices

This section reserved for your reading pleasure

4.8.4 Build Larger Memory from Slices 270

Page 271: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.8.5 Memory Arrays in High-Level Models

This section reserved for your reading pleasure

4.8.5 Memory Arrays in High-Level Models 271

Page 272: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

4.8.5 Memory Arrays in High-Level Models 272

Page 273: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 5

Dataflow Diagrams

273

Page 274: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.1 Dataflow Diagrams

5.1.1 Dataflow Diagrams Overview•Dataflow diagrams are data-dependency graphs where the computation is

divided into clock cycles.

•Purpose:

– Provide a disciplined approach for designing datapath-centric circuits

– Guide the design from algorithm, through high-level models, and finally toregister transfer level code for the datapath and control circuitry.

– Estimate area and performance

– Make tradeoffs between different design options

•Background

– Based on techniques from high-level synthesis tools

– Some similarity between high-level synthesis and software compilation

– Each dataflow diagram corresponds to a basic block in software compilerterminology.

5.1. DATAFLOW DIAGRAMS 274

Page 275: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Data-Dependency Graphs and DataflowDiagrams

Models for z = a + b + c + d + e + f

a b c d e f

+

+

+

+

+

z

Data-dependency graph

a b c d e f

+

+

+

+

+

z

Dataflow diagram

5.1.1 Dataflow Diagrams Overview 275

Page 276: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

a b c d e f

+

+

+

+

+

z

Horizontal lines mark

clock cycle boundaries

Unconnected signal tails

are inputs

Signals crossing clock

boundaries are flip-flops

Blocks in clock cycles

are datapath components

Unconnected signal heads

are outputs

5.1.1 Dataflow Diagrams Overview 276

Page 277: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.1.2 Dataflow Diagram Execution

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0

1

2

3

4

5

0 1 2 3 4 5 6

x5

5.1.2 Dataflow Diagram Execution 282

Page 278: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Latency

Definition Latency: Number of clock cycles from inputs to outputs.

•A combinational circuit has latency of zero.

•A single register has a latency of one.

•A chain of n registers has a latency of n.

+

+

+

+

+

Latency =

+

+

+

+

+

Latency =

5.1.2 Dataflow Diagram Execution 283

Page 279: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.1.3 Dataflow Diagrams, Hardware, andBehaviour

Primary Input

Dataflow Diagrami

x

Hardwarei

x

Behaviourclk

i

x

5.1.3 Dataflow Diagrams, Hardware, and Behaviour 284

Page 280: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Register Signal

Dataflow Diagrami1

x

+

i2

Hardware

+

i2

x

i1

Behaviourclk

i1

i2

x

5.1.3 Dataflow Diagrams, Hardware, and Behaviour 285

Page 281: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Combinational-Component Output

Dataflow Diagrami1

x+

i2

Hardware

+

i2

i1x

Behaviourclk

i1

i2

x

5.1.3 Dataflow Diagrams, Hardware, and Behaviour 286

Page 282: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Reuse a Component

Dataflow Diagrami1

+

i2

+

r1 r2

r1 r2

r1

i2

o1

Hardware

i2

i1

+

o1r1

r2

Behaviourclk

i1

i2

o1

5.1.3 Dataflow Diagrams, Hardware, and Behaviour 287

Page 283: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.1.4 Performance Estimation

Performance Equations

Performance ∝1

TimeExec

TimeExec = Latency ×ClockPeriod

Performance of Dataflow Diagrams• Latency: count horizontal lines in diagram

•Min clock period (Max clock speed) limited by longest path in a clock cycle

5.1.4 Performance Estimation 288

Page 284: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.1.5 Area Estimation•Maximum number of blocks in a clock cycle is total number of that

component that are needed•Maximum number of signals that cross a cycle boundary is total number of

registers that are needed•Maximum number of unconnected signal tails in a clock cycle is total number

of inputs that are needed•Maximum number of unconnected signal heads in a clock cycle is total

number of outputs that are needed• These estimates are just approximations. Does not take into account:

– Area and delay of control circuitry– Multiplexers on registers and datapath components– Relative area and delay of different components– Technology-specific features, constraints, and costs

• These estimates give lower bounds.•Other constraints or design goals might force you to use more components.

Examples:– Decreasing latency =⇒ larger area– Constraint on max number of registers =⇒ more datapath components

5.1.5 Area Estimation 289

Page 285: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Area Estimation

Implementation-technology factors, such as the relative size of registers,multiplexers, and datapath components, might force you to make tradeoffs thatincrease the number of datapath components to decrease the overall area of thecircuit.

•With some FPGA chips, a 2:1 multiplexer has the same area as an adder.

•With some FPGA chips, a 2:1 multiplexer can be combined with an adder intoone FPGA cell per bit.

• In FPGAs, registers are usually “free”, in that the area consumed by a circuit islimited by the amount of combinational logic, not the number of flip-flops.

5.1.5 Area Estimation 290

Page 286: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.1.6 Design Analysis

a b c d e f

+

+

+

+

+

z

num inputs

num outputs

num registers

num adders

min clock period

latency

5.1.6 Design Analysis 291

Page 287: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Design Analysis 2

a b c d e f

+

+

+

+

+

z

num inputs

num outputs

num registers

num adders

min clock period

latency

5.1.6 Design Analysis 292

Page 288: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Design Analysis 2 (Cont’d)

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

0

1

2

clk

a

x1

x2

x3

x4

x5

z

0 1 2 3 4 5 6

3

x5

5.1.6 Design Analysis 293

Page 289: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Design Analysis 3

a b c d e f

+ +

+ +

+

z

num inputs

num outputs

num registers

num adders

min clock period

latency

5.1.6 Design Analysis 294

Page 290: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Dataflow Diagrams

For each of the diagrams below, calculate the latency, minimum clock period, andminimum number of adders required.

Latency

Clock period

Adders

5.1.6 Design Analysis 295

Page 291: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.2 Design Example: Hnatyshyn DFD

5.2.1 Requirements• Functional requirement:

– Compute the following formula: z = a + b + c

•Performance requirements:

– Max clock period: flop plus (1 add)

– Max latency: 2

•Cost requirements

– Maximum of two adders

– Unlimited registers

– Maximum of three inputs and one output

– Maximum of 5000 student-minutes of design effort

•Combinational inputs, registered outputs

•Parcels arrive as-soon-as-possible (ASAP)

5.2. DESIGN EXAMPLE: HNATYSHYN DFD 296

Page 292: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.2.2 Data-Dependency Graph

Requirements and algorithm:z = a + b + c

Create a data-dependency graph for thealgorithm.

Data-dependency graph

z

a cb

5.2.2 Data-Dependency Graph 297

Page 293: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.2.3 Initial Dataflow Diagram

Schedule operations into clockcycles

z

a cb

Area and performance analysis

latency

clock period

inputs

outputs

registers

adders•Best-case analysis for a theoretical design

•No guarantee that we will achieve best-case (optimal) design

•Design process: systematic method to try to come close close to optimal design

•Start with sub-optimal, but obviously correct, design

•Series of optimizations to improve area and speed while avoiding bugs

5.2.3 Initial Dataflow Diagram 298

Page 294: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.2.4 Area Optimization

z

a b

clock

cycle

0

1

2

latency

clock period

inputs

outputs

registers

adders

5.2.4 Area Optimization 299

Page 295: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.2.5 Assign Names to Registered Signals

We start our initial (sub-optimal) design.

Before we can write VHDL code for our dataflow diagram, we must assign a nameto each internal registered value.

Optionally, we may assign names to combinational values.

z

a

c

b

clock

cycle

0

1

2

5.2.5 Assign Names to Registered Signals 300

Page 296: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Behaviour and Analysis

c

x2

0 1 2 3 4 5

a

b

x1

clock

cycle

z

z

a

c

b

x2

0

1

2

x1

latency

clock period

inputs

outputs

registers

adders

5.2.5 Assign Names to Registered Signals 301

Page 297: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Use ASAP Parcel Schedule

S1

default

z = x2;

x1’ = a+b;

x2’ = x1 + c;

S2

Question: When to start parcel β?

0 1 2 3 4 5

a, b, a1

x2, z

6

α

α

7 8

state

c, x1, a2 α

Question: What is the maximum throughput that this system supports?5.2.5 Assign Names to Registered Signals 302

Page 298: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.2.6 AllocationAllocation is the area optimization ofmapping a large number of objects incurrent design to smaller number ofobjects.

Design AnalysisCurrent Optimum

Inputs 3 2Registers 2 1Adders 2 1Outputs 1 1

•Example: allocate both xi registers to the same register•Similar to register allocation in software• This design is so simple that allocation is trivial. For real designs, finding the

best allocation is very difficult. Many different heuristics for how to do allocation.•We will allocate inputs, outputs, registers, and datapath components.•We will work clock-cycle by clock-cycle.•Annotate dataflow diagram and fill in cells in I/O schedule and control table.

i1 i2 o1

clock

cycle

0

1

2

r1

ce d

a1

src1 src2

clock

cycle

0

1

const

o1

I/O Schedule Control Table

5.2.6 Allocation 303

Page 299: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Allocate Clock Cycle 0: Inputs and Datapath

i1 i2 o1

clock

cycle

0

1

2

z

a

c

b

r1

ce d

a1

src1 src2

clock

cycle

0

1

const

o1

I/O Schedule Control Table

5.2.6 Allocation 304

Page 300: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Allocate Clock Cycle 0: Regs

z

a

c

b

i1 i2i1 i2

a1

i1 i2 o1

clock

cycle

0

1

2

r1

ce d

a1

src1 src2

clock

cycle

0

1

const

o1

I/O Schedule Control Table

a b

5.2.6 Allocation 305

Page 301: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Allocate Clock-Cycle 1: Inputs and Datapath

z

a

c

b

i1 i2

a1

r1

i1 i2 o1

clock

cycle

0

1

2

i1 i2

r1

ce d

a1

src1 src2

clock

cycle

0

1

const

o1

I/O Schedule Control Table

a11a b

5.2.6 Allocation 306

Page 302: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Allocate Clock-Cycle 1: Regs

c

z

a

c

b

x1

i1 i2

i2

r1

a1

a1

i1 i2 o1

clock

cycle

0

1

2

a11i1 i2

r1

ce d

a1

src1 src2

clock

cycle

0

1

const

o1

I/O Schedule Control Table

r1 i2

a b

5.2.6 Allocation 307

Page 303: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Allocate Output

With registered outputs, each output port must be connected directly to a register.

z

a

c

b

x2

x1

i1 i2

i2

r1

r1

a1

a1

i1 i2 o1

clock

cycle

0

1

2

a11

r1 i2

i1 i2

r1

ce d

a1

src1 src2

clock

cycle

0

1

const

o1

I/O Schedule Control Table

a11

a b

c

5.2.6 Allocation 308

Page 304: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Behaviour post Allocation

clock

cycle

a1

0 1 2

i1

i2

r1

o1

α

α

αα

α

αα

α

0

1

2

z

a

c

b

x2

x1

i1 i2

i2

r1

r1

o1

a1

a1

5.2.6 Allocation 309

Page 305: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.2.7 State Machine

• Done with datapath design and optimization

• Now build the control circuitry

Control-circuit optimizations:

•Choose state encoding

•Design state machine

•Design control circuitry that drives datapath

– Multiplexer select lines

– Chip enables

– Operation selection

5.2.7 State Machine 310

Page 306: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Parcel Schedule

Question: What parcel schedule is this?

"Trunk" derived from

computation for

one parcel

Outer loop derived from

parcel schedule

5.2.7 State Machine 311

Page 307: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

From Clock Cycles to States

To build a state machine, we need to move from our clock-cycle based view ofbehaviour to a state based view.

•Hardware knows only the values of signals — no magic “cycle count”.

•Represent cycle count with state signal.

a1

0 1 2 3 4 5

i1

i2

r1

o1

6

state

α

α

α

α

α

α

α

β

β β

β

ββ

β

γ

γ

γ

γ

γγ

γ

7 8state

clock

cycle

0

1

2

z

a

c

b

x2

x1

i1 i2

i2

r1

r1

o1

a1

a1

α β γ

α cycle

β cycle

γ cycle

5.2.7 State Machine 312

Page 308: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Control Table for Explicit State Machine

Transform control table:

• Label rows by state

•Add next-state column

• Identify “don’t-care” values

Labeled by clock cycler1

ce d

a1

src1 src2 o1

a11

r1 i2 a11

r1

clock

cycle

0

1

const

i1 i2

Labeled by stater1

ce d

a1

src1 src2 o1

a11

r1 i2 a11

r1

i1 i2

statenext

state

const

5.2.7 State Machine 313

Page 309: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Find Constants

If all of the cells in a column have the same value, then that column can bereduced to a constant.

r1

ce d

a1

src1 src2 o1

a11

r1 i2 a11

S0

state

r1

S1

i1 i2

S0

next

state

S1

const

5.2.7 State Machine 314

Page 310: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Control Table, State Machine, Hardwarer1

ce d

a1

src1 src2 o1

r1

S0

state

S1

i1

S0

next

state

S1

r1i2 a11const

1

1

a1

a1i2

i2

Control table for entire system

S0

S1

State machine for entire system

i2

i1

Ctrl

o1r1

next

state

dp

ctrlstate

Hardware for entire system

5.2.7 State Machine 315

Page 311: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.2.8 VHDL Implementationarchitecture main of hnatyshyn issignal r1, a1, a1_src1 : unsigned(7 downto 0);type state_ty is (S0, S1);signal state : state_ty;

begin-------------------------------------------- controlprocess (clk) beginif rising_edge(clk) thenif reset = ’1’ thenstate <= S0;

elsecase state iswhen S0 => state <= S1;when S1 => state <= S0;

end case;end if;

end if;end process;

5.2.8 VHDL Implementation 316

Page 312: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

a1_src1 <= i1 when state = S0else r1;

-------------------------------------------- registersprocess (clk) beginif rising_edge(clk) thenr1 <= a1;

end if;end process;-------------------------------------------- datapatha1 <= a1_src1 + i2;o1 <= r1;------------------------------------------

end architecture;

5.2.8 VHDL Implementation 317

Page 313: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

VHDL Implementation #2•One-hot encoding for state

•Define constants for S0, S1

•Replace state = S0 with state(0) = ’1’.

r1

ce d

a1

src1 src2 o1

r1

state

i1

S0

next

state

S1

r1i2 a11const

1

1

a1

a1i2

i2

5.2.8 VHDL Implementation 318

Page 314: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

architecture main of hnatyshyn issignal r1, a1 : unsigned(7 downto 0);subtype state_ty is std_logic_vector(1 downto 0);constant S0 : state_ty := "01";constant S1 : state_ty := "10";signal state : state_ty;

begin-------------------------------------------- controlprocess (clk) beginif rising_edge(clk) thenif reset = ’1’ thenstate = S0;

elsestate <= state rol 1;

end if;end if;

end process;

5.2.8 VHDL Implementation 319

Page 315: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

a1_src1 <= i1 when state(0) = ’1’else r1;

-------------------------------------------- registersprocess (clk) beginif rising_edge(clk) thenr1 <= a1;

end if;end process;-------------------------------------------- datapatha1 <= a1_src1 + i2;o1 <= r1;------------------------------------------

end architecture;

5.2.8 VHDL Implementation 320

Page 316: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.3 Design Example: Hnatyshyn withBubbles• section 5.2: Hnatyshyn with ASAP parcels

• This section: Hnatyshyn with unpredictable number of bubbles

•Key feature: valid bits for control circuitry

5.3. DESIGN EXAMPLE: HNATYSHYN WITH BUBBLES 321

Page 317: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.3.1 Adding Support for Bubbles•No change to dataflow diagram (dataflow diagrams are independent of parcel

schedule)

•Add i valid and o valid to denote whether input or output is parcel or bubble

•Add idle state to state machine for when there is not a parcel in the system

5.3.1 Adding Support for Bubbles 322

Page 318: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Add Valid Bits

a1

0 1 2 3 4 5

i1

i2

r1

o1

6

α

α

α

α

α

α

α

α

β

β β

β

β

β

β

β

γ

γ

γ

γ

γ

γ

γ

γ

7 8 9 10 11 12

i_valid

o_valid

i_valid o_valid

5.3.1 Adding Support for Bubbles 323

Page 319: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Use Valid Bits as Control

i2

i1

Ctrl

o1r1

o_valid

reset

i_valid

dp

ctrl

5.3.1 Adding Support for Bubbles 324

Page 320: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Behaviour

v1 v2

’0’

’1’

S0

S1

S2

a1

0 1 2 3 4 5

i1

i2

r1

o1

6

α

α

α

α

α

α

α

α

β

β β

β

β

β

β

β

γ

γ

γ

γ

γ

γ

γ

γ

7 8

i_valid

o_valid

state

valid

bits

9

v0v1v2

5.3.1 Adding Support for Bubbles 325

Page 321: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.3.2 Control Table with Valid Bits

Initial Table• Label the rows of the control table by valid bits, instead of by states.

•Do not include a row for the last valid bit.

– We have registered outputs

– Therefore, no control decisions are made in the last clock cycle

– Therefore, the last valid bit does not affect the datapath

r1

ce d

a1

src1 src2 o1

clock

cycle

0

1

2

z

a

c

b

x2

x1

i1 i2

i2

r1

r1

o1

a1

a1

5.3.2 Control Table with Valid Bits 326

Page 322: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Constants

valid bits

v(0)

v(1)

r1

ce d

a1

src1 src2 o1

i1

a1

r1

clock

cycle

0

1

2

z

a

c

b

x2

x1

i1 i2

i2

r1

r1

o1

a1

a1

const

i2

1

a11

r1 i2

5.3.2 Control Table with Valid Bits 327

Page 323: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.3.3 VHDL

The only difference between the VHDL code for Hnatyshyn with bubbles andHnatyshyn with ASAP parcels is the control circuitry. The datapath is exactly thesame for both designs.

entity hnatyshyn_bubble isport (clk : in std_logic;i_valid : in std_logic;i1, i2 : in unsigned(7 downto 0);o_valid : out std_logic;o1 : out unsigned(7 downto 0)

);end entity;

architecture main of hnatyshyn_bubble issignal r1, a1, a1_src1 : unsigned(7 downto 0);signal v : std_logic_vector(0 to 2);

begin

5.3.3 VHDL 328

Page 324: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

-------------------------------------------- controlv(0) <= i_valid;process (clk) beginif rising_edge(clk) thenif reset = ’1’ thenv(1 to 2) <= (others => ’0’);

elsev(1 to 2) <= v(0 to 1);

end if;end if;

end process;a1_src1 <= i1 when v(0) = ’1’

else r1;

5.3.3 VHDL 329

Page 325: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

-------------------------------------------- registersprocess (clk) beginif rising_edge(clk) thenr1 <= a1;

end if;end process;-------------------------------------------- datapatha1 <= a1_src1 + i2;o_valid <= v(2);o1 <= r1;------------------------------------------

end architecture;

5.3.3 VHDL 330

Page 326: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.4 Inter-Parcel Variables: Hnatyshyn withInternal State

Inter-parcel variables are used to communicate data between parcels.

Previous systemsz = a + b + c

“Sum” is an inter-parcel variableSum = Sum + a + b

intra-parcel variables The type of variables and signals that we have used untilnow

•Also called “temporary values”

•Stores intermediate data from clock-cycle to clock-clock cycle

•Each value is read only by the same parcel that wrote the value

inter-parcel variables The new type of variables and signals

•Also called “programmer-visible”, “internal-state”, or “visible-state” variables

•Stores data that is used to communicate between parcels

•Each value is written by one parcel and then read by other parcels

5.4. INTER-PARCEL VARIABLES: HNATYSHYN WITH INTERNAL STATE 331

Page 327: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.4.1 Requirements and Goals• Functional requirements: compute the following formula: Sum = Sum + a + b

•Performance requirement:

– Max clock period: flop plus (1 add)

– Max latency: 3

•Cost requirements

– Maximum of two adders

– Unlimited registers

– Maximum of three inputs and one output

– Maximum of 5000 student-minutes of design effort

•Combinational inputs

•Registered outputs

•Parcel schedule is “Unpredictable number of bubbles”

5.4.1 Requirements and Goals 332

Page 328: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.4.2 Dataflow Diagrams and Waveforms

a b Sum

Sum

a b Sum

Sum

a b Sum

i1 i2

a1

r1 r2

a1

0

1

2

clock

cycle

Sum

r1

r2

a1

0 1 2 3 4 5

i1

i2

r1

α

α

α

α

α

α

β

β

β

β

β

6

β

γ

γ

γ

γ

γ

γ

α β γ

5.4.2 Dataflow Diagrams and Waveforms 333

Page 329: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Bad DFD

Question: What is wrong with thedataflow diagram below?

a b

Sum

i1 i2

a1

r1

a1

0

1

2

clock

cycle

Sum

r1

5.4.2 Dataflow Diagrams and Waveforms 334

Page 330: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

States and BubblesQuestion: Label the states on the DFD and execution. Complete the FSM

DFDa b Sum

i1 i2

a1

r1 r2

a1

Sum

r1

FSM

S0

S1

S2

Execution

α

α

α

α

α

α

β

β

β

β

β

α β

0 1 2 3 4 5

α

α

β

β

6

state S2

γ

γ

γ

S1

7 8 9 10

r2

a1

i1

i2

r1

γ

γ

γ

γ

S1

β

δ

δ

δδ

δ

δ

5.4.2 Dataflow Diagrams and Waveforms 335

Page 331: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Reset

γ

α

α

α

α

α

α

β

β

β

β

β

γ

γ

α β

0 1 2 3 4 5

α

α

β

β

6

state S2

γ

γ

reset

δ

δ

ε

ε

δ

S1

7 8 9 10

r2

a1

i1

i2

r1

δ ε

δ ε

δ ε

ε

δ

S1

β

5.4.2 Dataflow Diagrams and Waveforms 336

Page 332: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.4.3 Control Tables

Initial Control Table

S0

S1

S2

state S0 S1 S2 S1 S0 S0 S0 S1

valid

bits

S2 S2 S1

v0v1v2

γ

α

α

α

α

α

α

β

β

β

β

β

γ

γ

α β

0 1 2 3 4 5

α

α

β

β

6

γ

γ

reset

δ

δ

ε

ε

δ

7 8 9 10

r2

a1

i1

i2

r1

δ ε

δ ε

δ ε

ε

δ0 0 0β

5.4.3 Control Tables 337

Page 333: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

VHDL Code for Control Circuitry

The VHDL code for just the control circuitry is below. In section 5.4.4, we show thecomplete code.

a1_src1 <= i1 when v(0) = ’1’

else r1;

a1_src2 <= i2 when v(0) = ’1’

else r2;

r1_ce <= v(0) or v(1);

5.4.3 Control Tables 338

Page 334: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.4.4 VHDL Implementation

-- valid bitsv(0) <= i_valid;process beginwait until rising_edge(clk);if reset = ’1’ thenv( 1 to 2 ) <= (others => ’0’);

elsev( 1 to 2 ) <= v(0 to 1);

end if;end process;

-- a1a1_src1 <= i1 when v(0) = ’1’

else r1;a1_src2 <= i2 when v(0) = ’1’

else r2;a1 <= a1_src1 + a1_src2;

5.4.4 VHDL Implementation 339

Page 335: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

-- r1process beginwait until rising_edge(clk);if reset = ’1’ thenr1 <= (others => ’0’);

elsif v(0)=’1’ or v(1)=’1’ thenr1 <= a1;

end if;end process;

-- r2process beginwait until rising_edge(clk);r2 <= r1;

end process;

-- outputso_valid <= v(2);o1 <= r1;

5.4.4 VHDL Implementation 340

Page 336: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.4.5 Summary of Bubbles and Inter-ParcelVariables

Options for state encoding:

Systen hasInter-pcl vars

No YesState encoding

ASAP FSM has idle stateCtrl table hhas idle rowState encoding

Bubbles FSM has idle stateCtrl table hhas idle row

5.4.5 Summary of Bubbles and Inter-Parcel Variables 341

Page 337: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5 Design Example: Vanier

Design Process1. Requirements

2. Algorithm

3. Data-dependency graph

4. Schedule

5. Allocate I/O ports, datapathcomponents, registers

6. Separate datapath and control

7. Connect datapath, add muxes

8. Block-diagram of datapath

9. Control-table for state machine

10. Don’t-care assignments

11. VHDL code #1

12. State encoding

13. VHDL code #2

5.5. DESIGN EXAMPLE: VANIER 342

Page 338: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.1 Requirements• Functional requirements: compute the following formula:z = (a × d) + c + (d × b) + b for sixteen-bit unsigned data.

•Performance requirement:

– Max clock period: flop plus (2 adds or 1 multiply)

– Max latency: 4

•Cost requirements

– Maximum of two adders

– Maximum of two multipliers

– Unlimited registers

– Maximum of three inputs and one output

– Maximum of 5000 student-minutes of design effort

•Combinational inputs

•Registered outputs

•ASAP parcel schedule

5.5.1 Requirements 343

Page 339: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.2 Algorithm

z = (a × d) + c + (d × b) + b

Create a data-dependency graph for the algorithm.

z

a d

+

+

+

b c

5.5.2 Algorithm 344

Page 340: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.3 Initial Dataflow Diagram

Schedule operations into clock cycles.

z

a d

+

+

+

b c

Area and performance analysis

latency

clock period

inputs

outputs

registers

adders

multipliers

5.5.3 Initial Dataflow Diagram 345

Page 341: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.4 Reschedule to Meet Requirements

z

a d

+

+

+

b c

z

d b ca

5.5.4 Reschedule to Meet Requirements 346

Page 342: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Fix Clock Period Violation

z

d

+

+

+

b c

a

z

d

+

+

+

b c

a

5.5.4 Reschedule to Meet Requirements 347

Page 343: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.5 Optimization: Reduce Inputs

Assume that inputs are much more ex-pensive than other resources.

z

a

d

+

+

+

b c

z

d b ca

5.5.5 Optimization: Reduce Inputs 348

Page 344: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Analysis

z

a

d

+

+

+

b

c

0

1

2

3

latency

clock period

inputs

outputs

registers

adders

multipliers

Question: Should we move the second addition from clock-cycle 2 up to 1?

5.5.5 Optimization: Reduce Inputs 349

Page 345: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.6 Allocation

0

1

2

3

i1 i2

r1

ce d

r2

ce d

r3

ce d

a1

sc1

a2

sc2 sc1 sc2 o1

m1

sc1 sc2

needs mux

needs ce

o1

Ι/Ο

0

1

2

const

z

a

+

+

+

c

d b

5.5.6 Allocation 350

Page 346: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Alternative Allocation

z

a

+

+

+

c

i1 i2

r1 r2 r3

m1

m1 a1

i1 i2

i1 i2

r1

ce d

r2

ce d

r3

ce d

a1

sc1

a2

sc2 sc1 sc2 o1

m1

sc1 sc2

0

1

2

3needs mux

needs ce

5/9

0/3

o1

Ι/Ο

d b

d b

a c

z

1 i1 1 m1 1 i2i1 i2

i1 r1 r3 i2

0

1

2

const i1

5.5.6 Allocation 351

Page 347: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.7 Explicit State Machine

From Clock Cycles to States•ASAP parcel schedule

• Latency is 3, therefore 3 states (S0, S1, S2)

•State machine iterates through states, with S2 looping back to S0.

5.5.7 Explicit State Machine 352

Page 348: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.8 VHDL #1: Explicit

architecture main of vanier issignal r1, r2, r3,

a1, a1_src1, a1_src2, a2,m1, m1_src2

: unsigned(15 downto 0);type state_ty is (S0, S1, S2);signal state : state_ty;

begin------------------------ controlprocess (clk) beginif rising_edge(clk) thenif reset = ’1’ thenstate <= S0;

elsecase state iswhen S0 => state <= S1;when S1 => state <= S2;when S2 => state <= S0;

end case;end if;

end if;end process;

------------------------ datapathm1_src2 <= i2 when state = S0

else r1;m1 <= i1(7 downto 0) * m1_src2(7 downto 0);a1_src1 <= r3 when state = S1

else r2;a1_src2 <= i1 when state = S1

else a2;a1 <= a1_src1 + a1_src2;a2 <= r1 + r3;

5.5.8 VHDL #1: Explicit 353

Page 349: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

------------------------ registersprocess (clk) beginif rising_edge(clk) thenif state = S0 thenr1 <= i1;

elser1 <= r2;

end if;end if;

end process;process (clk) beginif rising_edge(clk) thenr2 <= m1;

end if;end process;process (clk) beginif rising_edge(clk) thenif state = S0 thenr3 <= i2;

elser3 <= a1;

end if;end if;

end process;----------------------o1 <= r3;

end architecture;5.5.8 VHDL #1: Explicit 354

Page 350: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

State Encoding

Use a one-hot state encoding.

5.5.8 VHDL #1: Explicit 355

Page 351: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Don’t Care: Encoding-Based InstantationsFor this simple example, the encoding-based instantiations are trivial.

z

a

+

+

+

c

i1 i2

r1 r2 r3

r3r2

m1

m1 a1

a2

a1

i1 i2

i1 i2

r1

ce d

r2

ce d

r3

ce d

a1

sc1

a2

sc2 sc1 sc2 o1

m1

sc1 sc2

0

1

2

3needs mux

needs ce

5/9

0/3

r3

o1

r1

o1

Ι/Ο

d b

d b

a c

z

1 i1 1 m1 1 i2i1 i2

1 r2 1 m1 1 a1i1 r1 r3 i2

r1 r3r2 a2

S0

const r3

1 a1

i1 r1 r3 1 1 m1 1

r3 i2

r2

S1

S2

5.5.8 VHDL #1: Explicit 356

Page 352: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.9 VHDL #2architecture main of vanier issignal r1, r2, r3,

a1, a1_src1, a1_src2, a2,m1, m1_src2

: unsigned(15 downto 0);subtype state_ty isstd_logic_vector(2 downto 0);

constant s0 : state_ty := "001";constant s1 : state_ty := "010";constant s2 : state_ty := "100";signal state : state_ty;

begin------------------------ controlprocess (clk) beginif rising_edge(clk) thenif reset = ’1’ thenstate <= S0;

else-- rotate 1-bit to leftstate <=state( 1 downto 0)& state( 2 );

end if;end if;

end process;

------------------------ datapathm1_src2 <= i2 when state = S0

else r1;m1 <= i1(7 downto 0) * m1_src2(7 downto 0);a1_src1 <= r3 when state = S1

else r2;a1_src2 <= i1 when state = S1

else a2;a1 <= a1_src1 + a1_src2;a2 <= r1 + r3;

5.5.9 VHDL #2 357

Page 353: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

------------------------ registersprocess (clk) beginif rising_edge(clk) thenif state(0) = ’1’ thenr1 <= i1;

elser1 <= r2;

end if;end if;

end process;process (clk) beginif rising_edge(clk) thenr2 <= m1;

end if;end process;process (clk) beginif rising_edge(clk) thenif state(0) = ’1’ thenr3 <= i2;

elser3 <= a1;

end if;end if;

end process;----------------------o1 <= r3;

end architecture;5.5.9 VHDL #2 358

Page 354: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.5.10 Notes and Observations

Our functional requirement was written as:

z = (a × d) + (d × b) + b + c

If we had been given the functional requirement:

z = (a × d) + b + (d × b) + c

we could have used the same design, because the two equations are equivalent.

5.5.10 Notes and Observations 359

Page 355: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Data Dependency Graphs: Clean vs Ugly

The naive data dependency graph for the second formulation is much messierthan the data dependency graph for the original formulation:

Original(a × d) + (d × b) + b + c

z

a d

+

+

+

b c

Alternative(a × d) + c + (d × b) + b

z

a b

+

+ +

cd

5.5.10 Notes and Observations 360

Page 356: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.6 Memory Operations in DataflowDiagrams

Read Write

Inputs

Output

Operation

Location

5.6. MEMORY OPERATIONS IN DATAFLOW DIAGRAMS 361

Page 357: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Memory Read

Hardware WE

A

DI

DO a doM

clk

we

Behaviour

clk

αaa

M(αa)

we

do

-

αd

Dataflow diagram

FSM

5.6. MEMORY OPERATIONS IN DATAFLOW DIAGRAMS 362

Page 358: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Memory Write

Hardware WE

A

DI

DO aM

clk

di

we

do

Behaviour

clk

αaa

M(αa)

αd

we

di

-

-

-

do

Dataflow diagram

FSM

5.6. MEMORY OPERATIONS IN DATAFLOW DIAGRAMS 363

Page 359: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Dual-Port Memory

Hardware

a0M

clk

di0

we WE

A0

DI0

DO0

A1 DO1 a1 do1

do0

Behaviourclk

αaa0

M(αa)

αd

we

di0

-

-

-

βaa1

do0

-

M(βa) βd

do1

Dataflow diagram

FSM

5.6. MEMORY OPERATIONS IN DATAFLOW DIAGRAMS 364

Page 360: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Sequence of Memory Operations

Hardware

a0M

clk

di0

we WE

A0

DI0

DO0

A1 DO1 a1 do1

do0

Behaviourclk

αaa0

M(γa)

αd

we

di0

βaa1

do0

M(θa)

do1

γa

γd2

θa

-

-

-

-

M(αa)

M(βa) βd

γd1

θd

Dataflow diagram

FSM

5.6. MEMORY OPERATIONS IN DATAFLOW DIAGRAMS 365

Page 361: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.7 Data Dependencies

Definition of Three Types of Dependencies

M[i] :=

:= M[i]

:=

M[i]

:=

:=

M[i]

:=

M[i]

:=

:=

M[i]

:=

Read after Write Write after Write Write after Read(True dependency) (Load dependency) (Anti dependency)

Instructions in a program can be reordered, so long as the data dependencies arepreserved.

5.7. DATA DEPENDENCIES 366

Page 362: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Purpose of Dependencies

R3 := ......

... := ... R3 ...

producer

consumer

W1

R1

R3 := ......W0

W2

WAW ordering prevents W0

from happening after W1

WAR ordering prevents W2

from happening before R1

RAW ordering prevents R1

from happening before W1

R3 := ......

Each of the three types of memory dependencies (RAW, WAW, and WAR) servesa specific purpose in ensuring that producer-consumer relationships arepreserved.

5.7. DATA DEPENDENCIES 367

Page 363: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Ordering of Memory Operations

Data Dependencies

M[2]

M[3]

M[3]

M[0]

:=

A

B

21

31

32

01

:=

:=

:=

M[2]

M[0]

:=

:=

M[3] M[2] M[1] M[0]

30 20 10 0

M[3]C :=

21

Initial Program

5.7. DATA DEPENDENCIES 368

Page 364: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Data Dependencies (Cont’d)

M[2]

M[3]

M[3]

M[0]

:=

A

B

21

31

32

01

:=

:=

:=

M[2]

M[0]

:=

:=

M[3]C :=

Initial Program

M[2] := 21

M[3] 31:=

A := M[2]

B := M[0]

M[3] 32:=

M[0] 01:=

C := M[3]

Valid Modification

5.7. DATA DEPENDENCIES 369

Page 365: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Data Dependencies (Cont’d)

M[2]

M[3]

M[3]

M[0]

:=

A

B

21

31

32

01

:=

:=

:=

M[2]

M[0]

:=

:=

M[3]C :=

Initial Program

M[2] := 21

M[3] 31:=

A := M[2]

B := M[0]

M[3] 32:=

M[0] 01:=

C := M[3]

Valid (or Bad?) Modification

5.7. DATA DEPENDENCIES 370

Page 366: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.8 Example of DFD and Memory

This section examines the implementation of the pseudocode specification:

M[a+1] = b;

M[a] = M[a+1];

M[c] = M[c] - M[a];

z = M[c]

5.8. EXAMPLE OF DFD AND MEMORY 371

Page 367: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

NOTES:1. Inputs shall be combinational2. Outputs shall be registered3. The system shall support an unpredictable number of bubbles4. Memory has combinational inputs and registered outputs (same as in class)5. The memory may be either dual-ported or single-ported.6. Optimization goals in order of decreasing importance:

(a) minimize latency to z

(b) minimize clock period(c) minimize area

i. input portsii. adders and subtractersiii. registersiv. output portsv. use single-ported memory instead of dual-ported memory

7. Input values may be read in any clock cycle, but each input value shall be readexactly once.

8. Optimizations to the pseudocode are allowed, as long as the final values of zand M are correct.

9. You do not need to do allocation.5.8. EXAMPLE OF DFD AND MEMORY 372

Page 368: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Pseudocode Optimization

OriginalM[a+1] = b;

M[a] = M[a+1];

M[c] = M[c] - M[a];

z = M[c]

Optimization

5.8. EXAMPLE OF DFD AND MEMORY 373

Page 369: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Dataflow Diagram

5.8. EXAMPLE OF DFD AND MEMORY 374

Page 370: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Memory Ports

How many ports does your memory have?

Briefly justify that your choice of number of memory ports produced the mostoptimal design.

5.8. EXAMPLE OF DFD AND MEMORY 375

Page 371: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5.8. EXAMPLE OF DFD AND MEMORY 376

Page 372: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 6

Optimizations

377

Page 373: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.1 Pipelining•Exploit “hardware runs in parallel”

•Performance optimization at cost of increased area

•Overlap the execution of multiple parcels

•Divide design into stages

•Maximum of one parcel executing per stage

•No sharing of hardware between stages

6.1. PIPELINING 378

Page 374: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.1.1 Introduction to Pipelining

Unpipelined

a b

c

+

+a1

d

+a1

e

+a1

f

+a1

z

a1

r1

r1

r1

r1

r1

i2

i2i1

o1

0

1

2

3

4

5

i2

i2

i2

clk

a

r1

z

0 1 2 3 4 5 6

α

α

α

7 8 9 10 11 12 13

α α α α

Pipelined

a b

c

+

+

d

+a3

e

+

f

+

z

clk

a

z

0 1 2 3 4 5 6

α

α

α

α

α

α

α

7 8 9 10 11 12 13

(stage1) r1

(stage2) r2

(stage3) r3

(stage4) r4

(stage5) r5

6.1.1 Introduction to Pipelining 379

Page 375: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Unpipelined

a b

c

+

+a1

d

+a1

e

+a1

f

+a1

z

a1

r1

r1

r1

r1

r1

i2

i2i1

o1

0

1

2

3

4

5

i2

i2

i2

Unpipelined Pipelined

Latency

Bubbles

Throughput

Clock period

Registers

Adders

Pipelined

a b

c

+

+a2

d

+a3

e

+a4

f

+a5

z

a1

r1

r2

r3

r4

r5

i3

i2i1

o1

0

1

2

3

4

5

i4

i5

i6

stag

e 1

stag

e 2

stag

e 3

stag

e 4

stag

e 5

6.1.1 Introduction to Pipelining 380

Page 376: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Sequential (Unpipelined) Hardware

State(1) State(2) State(3)reset

State(0) State(4)

a1 r1

i1

i2o1

Pipelined Hardware and VHDL Code6.1.1 Introduction to Pipelining 381

Page 377: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

a1 r1i1

i2o1

a2 r2

i3

a3 r3

i4

a4 r4

i5

a5 r5

i6

stage 1 stage 2 stage 3 stage 4 stage 5

-- stage 1process begin

wait until rising_edge(clk);r1 <= i1 + i2;

end process;-- stage 2process beginwait until rising_edge(clk);r2 <= r1 + i3;

end process;-- stage 3process beginwait until rising_edge(clk);r3 <= r2 + i4;

end process;-- stage 4process beginwait until rising_edge(clk);r4 <= r3 + i5;

end process;-- stage 5process beginwait until rising_edge(clk);r5 <= r4 + i6;

end process;-- outputo1 <= r5;

6.1.1 Introduction to Pipelining 382

Page 378: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.1.2 Partially Pipelined• Fully pipelined: throughput is one parcel per clock cycle•Partially pipelined: throughput is less than one parcel per clock cycle.•Superscalar: throughput is more than one parcel per clock cycle.

a b

c

+

+

d

+

e

+

f

+

z

0

1

2

3

4

5

Question: How do we execute α

followed by β?

clk

a

z

0 1 2 3 4 5 6 7 8 9 10 11 12 13

(stage1) r1

(stage2) r2

(stage3) r3

Latency

Bubbles

Throughput

Clock period

Registers

Adders6.1.2 Partially Pipelined 383

Page 379: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Hardware for Partially Pipelined

State(1)reset

State(0)

a1 r1i1

i2

o1a2 r2

i3

a3 r3

i4

stage 1 stage 2 stage 3

Question: How do we determine the number of states?

6.1.2 Partially Pipelined 384

Page 380: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.1.3 Terminology

Definition Depth: The depth of a pipeline is the number of stages on thelongest path through the pipeline.

Definition Latency: The latency of a pipeline is measured the same as for anunpipelined circuit: the number of clock cycles from inputs to outputs.

Definition Throughput: The number of parcels consumed or produced perclock cycle.

Definition Upstream/downstream: Because parcels flow through the pipelineanalogously to water in a stream, the terms upstream and downstream areused respectively to refer to earlier and later stages in the pipeline. Forexample, stage1 is upstream from stage2.

Definition Bubble: When a pipe stage is empty (contains invalid data), it issaid to contain a “bubble”.

6.1.3 Terminology 385

Page 381: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.1.4 Overlapping Pipeline Stages•A single parcel may be in multiple stages at the same time

Example Store instruction in a microprocessor uses separate stages foraddress and data

• Transfering a parcel between stages may require multiple clock cycles

Example 16×16 macroblock of pixels in video processing

Illustrate overlapping pipe stages with a simple example.

6.1.4 Overlapping Pipeline Stages 386

Page 382: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

c

F

F

a b

d

F

e

F

G

G

G

G

0

z

0

1

2

3

4

5

6

r1

r1 r2

r1 r2

r1 r2

r2

r2

i1 i2

i1

i1

i1

o1

Externally visible behavior:

αinput

0 1 2 3 4 5 6 7 8 9 10 11 12

output

α α α α

β

ββ β β

13

α α α β β β

α β

system α α β β

SystemInputs 2Registers 2F 1G 1Total area 4Latency 6Throughput 1/6

Internal behaviour:

αinput

0 1 2 3 4 5 6 7 8 9 10 11 12

r1

output

α α α α

β

ββ β β

13

α α α β β β

r2 α α α α α

α

ββ β β β

β

Unpipelined implementation

6.1.4 Overlapping Pipeline Stages 387

Page 383: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Design Space Exploration

Fully pipelined

c

F

F

a b

d

F

e

F

G

G

G

G

0

z

0

1

2

3

4

5

6

#regs

#F

#G

#r+F+G

tput

#regs

#F

#G

#r+F+G

tput

Throughput=1/2

c

F

F

a b

d

F

e

F

G

G

G

G

0

z

0

1

2

3

4

5

6

6.1.4 Overlapping Pipeline Stages 388

Page 384: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Design Space Exploration (Cont’d)Goal: Maximum throughput using just 1 F and 1 G

#regs

#F

#G

#r+F+G

tput

c

F

F

a b

d

F

e

F

G

G

G

G

0

z

#regs

#F

#G

#r+F+G

tput

c

F

F

a b

d

F

e

F

G

G

G

G

0

z

6.1.4 Overlapping Pipeline Stages 389

Page 385: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Design Comparison

c

F

F

a b

d

F

e

F

G

G

G

G

0

z

1

2

3

4

5

6

r1

r1 r2

r1 r3

r1 r3

r3

r3

i1 i2

i1

ii

ii

o1

stag

e 1

stag

e 2

0

i1

r1

r3

o1

α

0 1 2 3 4 5 6 7 8 9 10 11 12

α α

β β

β

13

α β

α

α

α

β β

β

α

α α

α

β

β

α α β β ββ

c

F

F

a b

d

F

e

F

G

G

G

G

0

z

1

2

3

4

5

6

r1

r1 r2

r1 r2

r1 r2

r2

r2

i1 i2

i1

i2

i2

o1

stag

e 1

stag

e 2

0

i1

r1

r2

o1

α

0 1 2 3 4 5 6 7 8 9 10 11 12

α α

β

13

α β

α

α

α

β β

β

α

α α

α

β

β

α β β ββ

β

α α β

β

β

6.1.4 Overlapping Pipeline Stages 390

Page 386: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Implementation of Overlapping Stages

c

F

F

a b

d

F

e

F

G

G

G

G

0

z

1

2

3

4

5

6

r1

r1 r2

r1 r3

r1 r3

r3

r3

i1 i2

i1

ii

ii

o1

stag

e 1

stag

e 2

0

-------------------------------- valid bits

v(0) <= i_valid;process begin

wait until rising_edge(clk);v(6 downto 1) <= v(5 downto 0);

end process;

-------------------------------- stage 1

f1_src2 <= i2 whenelse r1;

process beginwait until rising_edge(clk);r1 <= f( i1, f1_src2 );r2 <= r1;

end process;

6.1.4 Overlapping Pipeline Stages 391

Page 387: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Implementation of Overlapping (Cont’d)

c

F

F

a b

d

F

e

F

G

G

G

G

0

z

1

2

3

4

5

6

r1

r1 r2

r1 r3

r1 r3

r3

r3

i1 i2

i1

ii

ii

o1

stag

e 1

stag

e 2

0

-------------------------------- stage 2

g1_src1 <= (others => ’0’) whenelse r1;

g1_src2 <= r2 whenelse r3;

process beginwait until rising_edge(clk);r3 <= g( g1_src1, g1_src2);

end process;

o1 <= r3;

6.1.4 Overlapping Pipeline Stages 392

Page 388: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Pipelining

Analyze the dataflow diagram below.

F

G

F G

FF

F

G

F G

F

a b

c d

z

Stages

Latency

Clock period

Throughput

Inputs

Registers

F

G

6.1.4 Overlapping Pipeline Stages 393

Page 389: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.2 Staggering

This is an advanced section.It is not covered in the course

and will not be tested.

6.2. STAGGERING 394

Page 390: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.3 Retiming

Goal: decrease clock period without changing input-to-output behaviour of thesystem.

Technique: move registers “through” gates to balance delay between registers.

Retime to balance delays Push flop throughAND

Push flop throughwire fork

6.3. RETIMING 395

Page 391: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Example

Question: Do the two circuits below have the same behaviour?

a

b cd r1 e

z

r2

a

b cd r3 e

z

r1

r2

Extra copy for scratch work:

a

b cd e

z

6.3. RETIMING 396

Page 392: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Example with State Machine

state

a

b

c

sel

x y z

critical path10ns

2ns7ns

state S0 S1 S2 S3 S0 S1 S2 S3

a

b

c

sel

x

y

z

α

β

γ

1

α

α+γ

α+γ

process begin

wait until rising_edge(clk);

if state = S1 then

z <= a + c;

else

z <= b + c;

end if;

end process;

6.3. RETIMING 397

Page 393: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Retimed Circuit and Waveform

state

a

b

c

sel

x y z

sel_d

10ns

2ns7ns

state S0 S1 S2 S3 S0 S1 S2 S3

a

b

c

sel_d

x

y

z

α

β

γ

sel

α+γ

6.3. RETIMING 398

Page 394: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Original behaviour

process (state) beginif state = S1 thensel = ’1’

elsesel = ’0’

end if;end process;process beginwait until rising_edge(clk);if sel = ’1’ then... -- code for z

end if;end process;

Retimed

process beginwait until rising_edge(clk);if state = thensel = ’1’

elsesel = ’0’

end if;end process;process beginwait until rising_edge(clk);if sel = ’1’ then... -- code for z

end if;end process;

6.3. RETIMING 399

Page 395: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Retiming

For each of the example circuits below, answer whether it is correct with respect tothe specification circuit.

Specification circuit

a

b cd e f

z

Example circuit 1

a

b cd e

ze2

6.3. RETIMING 400

Page 396: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Specification circuit

a

b cd e f

z

Example circuit 2

a

b cd

z

a2 a3

b2c2

6.3. RETIMING 401

Page 397: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.4 General Optimizations

6.4.1 Strength Reduction

Strength reduction replaces one operation with another that is simpler.

6.4.1.1 Arithmetic Strength Reduction

Multiply by a constant power of two wired shift logical leftMultiply by a power of two shift logical leftDivide by a constant power of two wired shift logical rightDivide by a power of two shift logical rightMultiply by 3 wired shift and addition

6.4. GENERAL OPTIMIZATIONS 402

Page 398: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.4.1.2 Boolean Strength ReductionBoolean tests that can be implemented as wires• is odd, is even• is neg, is pos

By choosing your encodings carefully, you can sometimes reduce a vectorcomparison to a wire.

For example if your state uses a one-hot encoding, then the comparison state =S3 reduces to state(3) = ’1’. You might expect a reasonable logic-synthesistool to do this reduction automatically, but most tools do not do this reduction.

When using encodings other than one-hot, Karnaugh maps can be useful tools foroptimizing vector comparisons. By carefully choosing our state assignments,when we use a full binary encoding for 8 states, the comparison:

(state = S0 or state = S3 or state = S4) = ’1’

can be reduced from looking at 3 bits, to looking at just 2 bits. If we have acondition that is true for four states, then we can find an encoding that looks at just1 bit.

6.4.1 Strength Reduction 403

Page 399: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.4.2 Replication and Sharing

6.4.2.1 Mux-Pushing

Pushing multiplexors into the fanin of a signal can reduce area.

Beforez <= a + b when (w = ’1’)

else a + c;

Aftertmp <= b when (w = ’1’)

else c;

z <= a + tmp;

The first circuit will have two adders, while the second will have one adder. Somesynthesis tools will perform this optimization automatically, particularly if all of thesignals are combinational.

6.4.2 Replication and Sharing 404

Page 400: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.4.2.2 Common SubexpressionElimination

Introduce new signals to capture subexpressions that occur multiple places in thecode.

Beforey <= a + b + c when (w = ’1’)

else d;

z <= a + c + d when (w = ’1’)

else e;

Aftertmp <= a + c;

y <= b + tmp when (w = ’1’)

else d;

z <= d + tmp when (w = ’1’)

else e;

6.4.2 Replication and Sharing 405

Page 401: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Subexpression Elimination

Note: Clocked subexpressions Care must be taken whendoing common subexpression elimination in a clocked process.Putting the “temporary” signal in the clocked process will add aclock cycle to the latency of the computation, because the tmpsignal will be flip-flop. The tmp signal must be combinational topreserve the behaviour of the circuit.

6.4.2 Replication and Sharing 406

Page 402: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.4.2.3 Computation Replication• To improve performance

– If same result is needed at two very distant locations and wire delays aresignificant, it might improve performance (increase clock speed) to replicatethe hardware

• To reduce area

– If same result is needed at two different times that are widely separated, itmight be cheaper to reuse the hardware component to repeat thecomputation than to store the result in a register

Note: Muxes are not free Each time a component is reused,multiplexors are added to inputs and/or outputs. Too muchsharing of a component can cost more area in additionalmultiplexors than would be spent in replicating the component

6.4.2 Replication and Sharing 407

Page 403: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.4.3 Arithmetic

Perform arithmetic on the minimum number of bits needed. If you only need thelower 12 bits of a result, but your input signals are 16 bits wide, trim your inputs to12 bits. This results in a smaller and faster design than computing all 16 bits of theresult and trimming the result to 12 bits.

6.4.3 Arithmetic 408

Page 404: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.5 Customized State Encodings

This is an advanced section.It is not covered in the course

and will not be tested.

6.5. CUSTOMIZED STATE ENCODINGS 409

Page 405: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

6.5. CUSTOMIZED STATE ENCODINGS 410

Page 406: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 7

Performance Analysis

411

Page 407: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.1 Introduction

Hennessey and Patterson’s Quantitative Computer Achitecture (textbook forE&CE 429) has good information on performance. We will use some of the samedefinitions and formulas as Hennessey and Patterson, but we will move away fromgeneric definitions of performance for computer systems and focus onperformance for digital circuits.

7.1. INTRODUCTION 412

Page 408: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.2 Defining Performance

Performance =WorkTime

You can double your performance by:

doing twice the work in the same amount of time

OR doing the same amount of work in half the time

7.2. DEFINING PERFORMANCE 413

Page 409: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Benchmarking

Performance =WorkTime

Measuring time is easy, but how do we accurately measure work?

The game of benchmarketing is finding a definition of work that makes yoursystem appear to get the most work done in the least amount of time.

Measure of Work Measure of Performanceclock cycle MHzinstruction MIPssynthetic program Whetstone, Dhrystone, D-MIPs (Dhrystone MIPs)real program SPEC (PCs), EEMBC (Embedded)travel 1/4 mile drag race

7.2. DEFINING PERFORMANCE 414

Page 410: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Throughput vs Latency

Two common measures of performance:

Latency Response time

Throughput Bandwidth

Often there is a tradeoff between latency and bandwidth

• For general-purpose systems, throughput is usually most important.

• For real-time systems, latency is often most important.

7.2. DEFINING PERFORMANCE 415

Page 411: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.3 BenchmarksHistorical Benchmarks

MIPS Millions of instructions per second (My NOP instruction is faster than yours)

Whetstone

• First general-purpose benchmark for computer perforamnce

•Synthetic: an artificial program designed to be quick and easy to run, butreflects the performance of a real program.

•H. J. Curnow and B. A. Wichmann. A synthetic benchmark. The ComputerJournal, 19(1):43–49, Feb. 1976.

•Based on the Algol-60 compiler developed by Atomic Power Division of theEnglish Electric Company, Whetstone, Leicester, England, for the KDF9Computer.

Dhrystone pun on Whetstone

D-MIPS MIPS using Dhrystone mix of instructions

•Synthetic benchmarks worked well for computers in 1970s and 1980s.

•As caches became larger, entire synthetic program could fit in first-level cache,which resulted in unrealistic performance.

7.3. BENCHMARKS 416

Page 412: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

SPEC Benchmarks

The Spec Benchmarks are among the most respected and accurate predictions ofreal-world performance for desktop PCs and servers.

Definition SPEC: Standard Performance Evaluation Corporation MISSION:“To establish, maintain, and endorse a standardized set of relevantbenchmarks and metrics for performance evaluation of modern computersystems http://www.spec.org.”

The Spec organization has different benchmarks for integer software,floating-point software, web-serving software, etc.

7.3. BENCHMARKS 417

Page 413: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

EEMBC Benchmarks

Embeded Microrprocessor Benchmark Consortium

A variety of benchmarks (Android, web browsing, multicore, etc.) to evaluatemicroprocessors used in smartphones, tablets, and firewall appliances.

7.3. BENCHMARKS 418

Page 414: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.4 Comparing Performance

7.4.1 General Equations

7.4. COMPARING PERFORMANCE 419

Page 415: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Speedup

Example sentences:•A new system has n-times the performance of the old system.• This optimization provides a n× speedup.

Speedup(New ,Old) =PerfNewPerfOld

Using speedup to calculate performance:

PerfNew = PerfOld

=

Using PerfHigh and Perf Low:

Speedup =PerfHighPerf Low

7.4.1 General Equations 420

Page 416: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Performance vs TimePerformance is inversely proportional to time:

Perf =1

Time

Using time to measure performance, the equation for speedup is:

Speedup(New ,Old) =PerfNewPerfOld

=1/TimeNew1/TimeOld

=TimeOldTimeNew

Using TimeSlow and TimeFast:

Speedup =TimeSlowTimeFast

7.4.1 General Equations 421

Page 417: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Bigger Than and Smaller ThanEquation for “New is n% bigger than Old”:

PctBigger =New −Old

Old

New is n% smaller than Old:

PctSmaller =Old −New

Old

Derive n% bigger from speedup:

PctBigger = Speedup−1

=NewOld

−1

=NewOld

−OldOld

=New −Old

Old7.4.1 General Equations 422

Page 418: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Bigger Than (Cont’d)

The performance of New is n% bigger than the performance of Old :

PctBigger =PerfNew −PerfOld

PerfOld

Use percentage-bigger to write equation for PerfNew in terms of PerfOld :

PerfNew = (PctBigger +100%)PerfOld

OR, Equivalently:

PerfNew = PctBigger×PerfOld +PerfOld

7.4.1 General Equations 423

Page 419: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Converting between Bigger and SmallerThan

Question: If A is n% bigger than B, how smaller is B than A?

7.4.1 General Equations 424

Page 420: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Average Performance of Multiple Tasks

Another useful formula is the average time to do one of k different tasks, each ofwhich happens %i of the time and takes an amount of time Ti to do each time it isdone .

TAvg =k

∑i=1

(%i)(Ti)

7.4.1 General Equations 425

Page 421: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.4.2 Example: Performance of Printers

This section reserved for your reading pleasure

7.4.2 Example: Performance of Printers 426

Page 422: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.5 Clock Speed, CPI, Program Length,and Performance

7.5.1 Mathematics

CPI Cycles per instructionNumInsts Number of instructionsClockSpeed Clock speedClockPeriod Clock period

Time = NumInsts×CPI×ClockPeriod

Time = NumInsts×CPIClockSpeed

7.5. CLOCK SPEED, CPI, PROGRAM LENGTH, AND PERFORMANCE 427

Page 423: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.5.2 Example: CISC vs RISC and CPI

Clock Speed SPECintAMD Athlon 1.1GHz 409Fujitsu SPARC64 675MHz 443

The AMD Athlon is a CISC microprocessor (it uses the IA-32 instruction set). TheFujitsu SPARC64 is a RISC microprocessor (it uses Sun’s Sparc instruction set).Assume that it requires 20% more instructions to write a program in the Sparcinstruction set than the same program requires in IA-32.

7.5.2 Example: CISC vs RISC and CPI 428

Page 424: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

SPECint and Performance

Clock Speed SPECintAMD Athlon 1.1GHz 409Fujitsu SPARC64 675MHz 443

Question: Which of the two processors has higher performance?

7.5.2 Example: CISC vs RISC and CPI 429

Page 425: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Relative CPI

Question: What is the ratio between the CPIs of the two microprocessors?

7.5.2 Example: CISC vs RISC and CPI 430

Page 426: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Absolute CPI

Question: Can you determine the absolute (actual) CPI of eithermicroprocessor?

7.5.2 Example: CISC vs RISC and CPI 431

Page 427: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.5.3 Effect of Instruction Set onPerformanceIn this section we examine how changing the instructions that a processorperforms can effect its performance.

Your group designs a microprocessor and you are considering adding a fusedmultiply-accumulate to the instruction set.• Fused multiply-accumulate instruction does a multiply and an addop src1 src2 dstMAC R1, R2, R4 = MUL R1, R2, R3

ADD R3, R4, R4•Often used in digital signal processing. See the multiply-accumulate pattern in

the finite-impulse-response filter below:

C1

C2

C3

C4

C4

a

z

• First added to RISC instruction sets by IBM with its POWER processor family:“Performance With Enhanced Risc”.

7.5.3 Effect of Instruction Set on Performance 432

Page 428: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Using MAC Instruction

Original program

MUL R1, R2, R3

ADD R4, R3, R4

SUB R5, R7, R9

MUL R1, R2, R3

ADD R5, R3, R5

SUB R1, R2, R3

MUL R2, R3, R5

ADD R5, R2, R5

Using MAC

7.5.3 Effect of Instruction Set on Performance 433

Page 429: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Problem Statement

Your studies have shown that, on average, half of the multiply operations arefollowed by an add instruction that could be done with a fused multiply-add.

Additionally, you know:

cpi %ADD 0.8 15%MUL 1.2 5%Other 1.0 80%

7.5.3 Effect of Instruction Set on Performance 434

Page 430: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Options

You have three options:

option 1 : no change

option 2 : add the MAC instruction, increase the clock period by 20%, and MAChas the same CPI as MUL.

option 3 : add the MAC instruction, keep the clock period the same, and the CPIof a MAC is 50% greater than that of a multiply.

Question: Which option will result in the highest overall performance?

7.5.3 Effect of Instruction Set on Performance 435

Page 431: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.5.3 Effect of Instruction Set on Performance 436

Page 432: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Performance of Programs

Which option is better:

1. 90% performance improvement to 10% of instructions

2. 10% performance improvement to 90% of instructions

7.5.3 Effect of Instruction Set on Performance 437

Page 433: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.6 Effect of Time to Market on RelativePerformance

The performance of digital-hardware based system has grown historically at anexponential rate.

To illustrate this concept, imagine com-panies A and B release competing prod-ucts (A1, B1, A2, B2, A3, B3, . . .) over aseries of years where the performanceof the average product in this categorydoubles every year.

2010 2011 2012 2013 2014

1

2

4

6

8

9

12

14

16

3

5

7

9

11

13

15

17

Per

form

ance

A1

A2

A3

A4

B1

B2

B3

Performance doubles every year

Performance of average system

7.6. EFFECT OF TIME TO MARKET ON RELATIVE PERFORMANCE 438

Page 434: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Equation for exponential growth, where P increases by a factor of n every k units oftime:

P(t1) = P(t0)×n(t1−t0)/k

7.6. EFFECT OF TIME TO MARKET ON RELATIVE PERFORMANCE 439

Page 435: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Example Problem

Assume that performance of the average product in your market segment doublesevery 18 months.

You are considering an optimization that will improve the performance of yourproduct by 7%.

Question: If you add the optimization, how much can you allow yourschedule to slip before the delay hurts your relative performance comparedto not doing the optimization and launching the product according to yourcurrent schedule?

7.6. EFFECT OF TIME TO MARKET ON RELATIVE PERFORMANCE 440

Page 436: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Performance of average system

7.6. EFFECT OF TIME TO MARKET ON RELATIVE PERFORMANCE 441

Page 437: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.6. EFFECT OF TIME TO MARKET ON RELATIVE PERFORMANCE 442

Page 438: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.6. EFFECT OF TIME TO MARKET ON RELATIVE PERFORMANCE 443

Page 439: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7.6. EFFECT OF TIME TO MARKET ON RELATIVE PERFORMANCE 444

Page 440: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 8

Timing Analysis

445

Page 441: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1 Delays and Definitions

In this section we will look at the different timing parameters of circuits. Our focuswill be on those parameters that limit the maximum clock speed at which a circuitwill work correctly.

8.1.1 Background Definitions

This section reserved for your reading pleasure

8.1. DELAYS AND DEFINITIONS 446

Page 442: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1.2 Clock-Related Timing Definitions

At the register transfer level, we think of the system as having a single, globalclock.

On the chip, the single clk signal in our source code is implemented as a clocktree containing many buffers and individual wires. On the physical chip, eachflip-flop has its own clock signal.

begin process

wait until rising_edge(clk);

c <= a + b;

e <= c - d;

end process;

clk0

clk1

clk2

clk1.0

clk1.1

clk1.2

clk1.0.1

clk1.1.1

clk1.2.1

clk2.0

clk2.1

clk2.2

clk2.0.1

clk2.1.1

clk2.2.1

8.1.2 Clock-Related Timing Definitions 447

Page 443: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1.2.1 Clock Latency

clk0

clk1

clk2

clk1.0

clk1.1

clk1.2

clk1.0.1

clk1.1.1

clk1.2.1

clk2.0

clk2.1

clk2.2

clk2.0.1

clk2.1.1

clk2.2.1

latency from clk0 to clk1.2.1

clk0

clk1

clk1.2

clk1.2.1

Definition Clock Latency: The delay from the source (oscillator) to a point inthe clock tree.

Note: Clock latency Clock latency does not affect the limit onthe minimim clock period.

8.1.2 Clock-Related Timing Definitions 448

Page 444: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1.2.2 Clock Skew

clk0

clk1

clk2

clk1.0

clk1.1

clk1.2

clk1.0.1

clk1.1.1

clk1.2.1

clk2.0

clk2.1

clk2.2

clk2.0.1

clk2.1.1

clk2.2.1

skew between clk1.0.1

and clk2.1.1

clk0

clk1

clk1.0

clk1.0.1

clk2

clk2.1

clk2.1.1

Definition Clock Skew: The difference in arrival times for the same clockedge at different flip-flops.

Clock skew is caused by the difference in interconnect delays to different points onthe chip.

Skew(clk1.0.1,clk2.1.1) = |Latency(clk1.0.1)−Latency(clk2.1.1)|8.1.2 Clock-Related Timing Definitions 449

Page 445: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Clock Skew (Cont’d)

clk0

clk1

clk2

clk1.0

clk1.1

clk1.2

clk1.0.1

clk1.1.1

clk1.2.1

clk2.0

clk2.1

clk2.2

clk2.0.1

clk2.1.1

clk2.2.1

skew for circuit

clk0

clk1.0.1

clk1.1.1

clk1.2.1

clk2.0.1

clk2.1.1

clk2.2.1

The clock skew for a circuit is the maximum skew between any two flip-flops.

8.1.2 Clock-Related Timing Definitions 450

Page 446: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1.2.3 Clock Jitter

jitter

ideal clock

clock with jitter

10 10 10 10

10

11

8

9

Definition Clock Jitter: Difference between actual clock period and idealclock period.

Clock jitter is caused by:

• temperature and voltage variations over time

• temperature and voltage variations across different locations on a chip

•manufacturing variations between different parts8.1.2 Clock-Related Timing Definitions 451

Page 447: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Clock Tree Design

Clock tree design is critical in high-performance designs to minimize clock skew.Sophisticated synthesis tools put lots of effort into clock tree design, and thetechniques for clock tree design still generate PhD theses.

8.1.2 Clock-Related Timing Definitions 452

Page 448: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1.3 Storage-Related Timing Definitions

8.1.3.1 Flops and Latches

d

clk

q

Flop Behaviour

d

clk

q

Latch Behaviour

Storage devices have two modes: load mode and store mode.

Flops are edge sensitive; they are in load mode just before the clock edge.

Latches are level sensitive; they are in load mode while their enable signal isasserted high (low for active low latches).

8.1.3 Storage-Related Timing Definitions 453

Page 449: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1.3.2 Timing Parameters

In the pictures below, the goal is to load the data α into the flip-flop or latch.

α

d

clk

q

Clock-to-Q

HoldSetup

ω α

Flip-flop

d

clk

q

Clock-to-Q

HoldSetup

ω α

ω α

Active-high latch

d

clk

HoldSetup

ω α

q

Clock-to-Q

ω α

Active-low latch

Setup and hold define the window in which input data are required to be constantin order to guarantee that the storage device will store data correctly.

Clock-to-Q defines the delay from the clock edge to when the output is guaranteedto be stable.

8.1.3 Storage-Related Timing Definitions 454

Page 450: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1.3.3 Timing Parameters for a Flop

Good, Slow, and FastGood timing

b ca d

clk

b

c

α

TSU THO

α

Too slow = setup violation

b ca d

clk

b

c

α

TSU THO

Too fast = hold violation

b ca d

clk

b

c

α

TSU THO

8.1.3 Storage-Related Timing Definitions 455

Page 451: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1.4 Propagation Delays

Propagation delay time it takes a signal to travel from the source (driving) flop tothe destination flop

propagation delay = load delay + interconnect delay

Load delay combinational gates between the flops

Interconnect delay wires between gates and flops

8.1.4 Propagation Delays 456

Page 452: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.1.5 Timing ConstraintsMinimum Clock Period

b c

clk1 clk

2

a d

clk0

signal may change

signal is stable

signal may rise

signal may fall

clk1

clk2

b

c

clock period

ClockPeriod >

8.1.5 Timing Constraints 457

Page 453: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Hold Constraint

Circuitc

b

Simple clock-to-q

clk

b

c α

tco

β

β

Realistic clock-to-q

clk

b

c α

tco.min

β

β

tco.max

Hold violation

clk

b

c

α

TSU THO

α

β

Hold constraint

8.1.5 Timing Constraints 458

Page 454: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Timing Parameters

1. Setup: Time transition that input data is

to being stable

2. Hold: time transition that input data is

to stable

3. Clock-to-Q-Min: Time when output data is

to stable with old data

4. Clock-to-Q-Max: Time when output data is

to stable with new data

8.1.5 Timing Constraints 459

Page 455: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

5. How do you fix a setup violation?

6. How do you fix a hold violation?

7. Draw a timing diagram for a flip-flop with the following timing parameters:

setup 2 nshold 1 nsclock-to-Q 3 ns

clk

d

q

1ns

8. Draw a timing diagram for an active-high latch with the following timingparameters:

setup 2 nshold 1 nsclock-to-Q 3 ns

clk

d

q

1ns

8.1.5 Timing Constraints 460

Page 456: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.2 Timing Analysis of Simple LatchesIn this section, each gate has a delay of 1 time unit.

8.2.1 Structure and Behaviour ofMultiplexer Latch

i o

’1’

i o

’0’

Load mode Store mode

a

b

s

o

a

sel

b

o

d

clk o

Multiplexer: symbol and implementation Latch implementation

8.2. TIMING ANALYSIS OF SIMPLE LATCHES 461

Page 457: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Latch Glitching

d

clk o

d

clk o

Correct latch Buggy latch

The functionality of storage devices depends on timing.• For functionality at the register transfer level, we ignore timing (combinational

logic has zero delay).• For storage devices, functionality depends on timing (delays through

combinational logic).• Ignoring delays, the circuits above are equivalent, but the circuit on the right is

actually incorrect.• The pair of inverters on the clk signal are needed. Together, they prevent a

glitch on the OR gate when clk is deasserted. If there was only one inverter, aglitch would occur. For more on this, see section 8.2.7.

8.2.1 Structure and Behaviour of Multiplexer Latch 462

Page 458: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Loading and Storing Values

0

11

10

0

d=’0’

clk=’1’ o

1

Loading ’0’

1

00

00

0

d=’1’

clk=’1’ o

1

Loading ’1’

01

0

11

d

clk=’0’ o=’0’

0

1

Storing ’0’

10

0

10

d

clk=’0’ o=’1’

0

0

Storing ’1’

8.2.1 Structure and Behaviour of Multiplexer Latch 463

Page 459: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.2.2 Strategy for Timing Analysis ofStorage Devices

The key to calculating setup and hold times of a latch, flop, etc is to identify:

1. how the data is stored when in storage mode (often a combinational loop with apair of inverters)

2. the gate(s) that the clock uses to turn on the load path (allow the input to affectthe internals of the storage element)

3. the gate(s) that the clock uses to turn on the storage loop (allow the stored datacontinue to circulate through the storage loop)

4. the gate where the load path and storage loop join

8.2.2 Strategy for Timing Analysis of Storage Devices 464

Page 460: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.2.3 Clock-to-Q Time of a Latch

clk

d l1

l2

qn q

s2

s1

cn

c2 clk

d l1

l2

qn q

s2

s1

cn

c2

clk

d l1

l2

qn q

s2

s1

cn

c2 clk

d l1

l2

qn q

s2

s1

cn

c2

clk

d l1

l2

qn q

s2

s1

cn

c2 clk

d l1

l2

qn q

s2

s1

cn

c2

8.2.3 Clock-to-Q Time of a Latch 465

Page 461: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.2.4 From Load Mode to Store Modeclk

d α

1 0 1

αα

α α

αα

α

00

Circuit is stable in load mode

clk

d α

0 1 0

α0

α α

αα

α

1

t=3: l2 is set to 0,

because c2 turns off AND gate

α

clk

d α

0 0 1

αα

α α

αα

α

00

t=0: Clk transitions from load to store

clk

d α

0 1 0

α0

α α

αα

α

1

t=4: α from store path propagates to q

α

clk

d α

0 1 1

αα

α α

αα

α

10

t=1: Clk transitions from load to store

clk

d α

0 1 0

α0

α α

αα

α

1

t=5: α from store path completes cycle

α

clk

d α

0 1 0

αα

α α

αα

α

1

t=2: s1 propagates to s2,

because cn turns on AND gate

α

8.2.4 From Load Mode to Store Mode 466

Page 462: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.2.5 Setup Time Analysis

1. When the latch is in store mode, there must be consistent values in the storageloop. Otherwise, the loop will be metastable.

2. As the circuit transitions from load mode to store mode, there must beconsistent values at the point where the load and store paths join.

3. We must saturate the storage loop with the current value (α) before we turn onthe storage loop, otherwise some instances of the old value (ω) will remain inthe loop.

4. Setup time is the time before the clock edge that the d-input must be stablewith the current value (α).

5. The setup time must be sufficient to saturate the storage loop with the currentvalue (α) and flush out all of the old values (ω) before the storage loop isturned on.

6. Paths for this specific circuit :

•Path to saturate storage loop = d→ s2

•Path to turn on storage loop = clk→ s28.2.5 Setup Time Analysis 467

Page 463: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7. Equation for this specific circuit :

TSU = delay(d→ s2)−delay(clk→ s2)

= 6−2= 4

8.2.5 Setup Time Analysis 468

Page 464: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Setup Violation

clk

d 1 0 1

ωω

ω ω

ωω

ω

00

Circuit is stable in load mode with ω

ωclk

d α α

α

ω ω

ωω

ω0

t=1: α propagates through AND gate for load path

ω is on input to AND gate for storage loop

Clk propagates through inverter

0 1 1

1

clk

d α

1 0 1

ωω

ω ω

ωω

ω

00

t=-1: D transitions from ω to α

Trouble: inconsistent values on load path and store path.

Old value (ω) still in store path when store path is enabled.

clk

d α α

α

α ω

ωω

ω

0

ω

t=2: old ω propagates through AND

1 0

1

clk

d α

0 1

αω

ω ω

ωω

ω

00

t=0: α propagates through inverter

Clk transitions from load to store

α

0

clk

d α α

0

α

αω

ω

t=3: l2 is set to 0,

because c2 turns off AND gate

ω

0 1 0

1

ω/α

8.2.5 Setup Time Analysis 469

Page 465: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

clk

d α α

ω ω/α

ω/αα

αω

0 1 0

1

t=4: ω/α from store path propagates to q

clk

d α=1

0 1 0

00

0 1

11

1

1

t=5: Illustrate instability with ω=0, α=1

0

clk

d α

0 1 0

α0

ω

ωω/α

ω/α

t=5: ω/α from store path completes cycle

ω

d ω

l1

l2

qn

q

s1

s2

clk

cn

ω

ω

ω

ω

ω

α

α

α

ω

α ω

ω

ω

ω

setup with negative margin

c2

ω

ω

ω

ω

ω

ω

α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α

-3 -2 -1 0 1 2 3 4 5 6

8.2.5 Setup Time Analysis 470

Page 466: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

We now repeat the analysis of setup violation, but illustrate the minimum violation(input transitions from ω to α 3 time-units before the clock edge).

clk

d 1 0 1

ωω

ω ω

ωω

ω

00

Circuit is stable in load mode with ω

ω

clk

d α

1 0 1

αα

ω ω

ωω

ω

00

t=-1: α propagates through AND

clk

d α

1 0 1

ωω

ω ω

ωω

ω

00

t=-3: D transitions from ω to α

clk

d α

0 0 1

αα

α ω

ωω

ω

00

t=0: Clk transitions from load to store

clk

d α

1 0 1

αω

ω ω

ωω

ω

00

t=-2: α propagates through inverter

α

clk

d α

0 1 1

αα

α α

αω

ω

10

t=1: Clk propagates through inverter8.2.5 Setup Time Analysis 471

Page 467: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

clk

d α

0 1 0

αα

α α

αα

α

1

t=2: old ω propagates through AND

ω

Trouble: inconsistent values on load path and store path.

Old value (ω) still in store path when store path is enabled.

clk

d α

0 1 0

α0

α α

αω/α

ω/α

1

t=5: ω/α from store path completes cycle

α

clk

d α

0 1 0

α0

ω/α α

αα

α

1

t=3: l2 is set to 0,

because c2 turns off AND gate

α

clk

d α=1

0 1 0

00

0 1

11

1

1

t=5: Illustrate instability with ω=0, α=1

0

clk

d α

0 1 0

α0

α ω/α

ω/αα

α

1

t=4: ω/α from store path propagates to q

α

d ω

l1

l2

qn

q

s1

s2

clk

cn

ω

ω

ω

ω

ω

α

α

α

ω

α α

α

α

α

setup with negative margin

c2

α

α

α

α

ω

ω

α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α

-3 -2 -1 0 1 2 3 4 5 6

8.2.5 Setup Time Analysis 472

Page 468: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.2.6 Hold Time of a Multiplexer Latch

Hold Time Behaviour

clk

d l1

l2

qn q

s2

s1

cn

c2clk

d l1

l2

qn q

s2

s1

cn

c2

clk

d l1

l2

qn q

s2

s1

cn

c2clk

d l1

l2

qn q

s2

s1

cn

c2

clk

d l1

l2

qn q

s2

s1

cn

c2clk

d l1

l2

qn q

s2

s1

cn

c2

8.2.6 Hold Time of a Multiplexer Latch 473

Page 469: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Analysis

1. When the latch is in store mode, there must be consistent values in the storageloop. Otherwise, the loop will be metastable.

2. As the circuit transitions from load mode to store mode, there must beconsistent values at the point where the load and store paths join.

3. We must turn off the load path before the next value (β) affects the storageloop.

4. Hold time is the time after the clock edge that the d-input must be stable withthe current value (α).

5. The hold time must be sufficient to turn off the load path before the next datavalue (β) is able to affect the internal circuitry such that it will affect storageloop.

6. Paths for this specific circuit :

•Path to turn off load path = clk→ l2

•Path to affect internals = d→ l2

8.2.6 Hold Time of a Multiplexer Latch 474

Page 470: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

7. Equation for this specific circuit :

THO = delay(clk→ l2)−delay(d→ l2)

= 2−1= 1

8.2.6 Hold Time of a Multiplexer Latch 475

Page 471: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.2.7 Example of a Bad Latch

Build a Bad Latch

clk

d l1

l2

qn q

s2

s1

cn

c2

1: Original latch

clk

d l1

l2

qn q

s2

s1

cn

c2

2: Push inverter from clk through wire-fork

clk

d l1

l2

qn q

s2

s1

cn

3: Delete pair of back-to-back inverters

clk

d l1

l2

qn q

s2

s1

cn

4: Compress figure

8.2.7 Example of a Bad Latch 476

Page 472: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Behaviour of a Bad Latch

clk

d α

1

αα

α α

αα

α

00

Circuit is stable in load mode

clk

d α

0

αα

α α

αα

α

00

t=0: Clk transitions from load to store

clk

d l1

l2

qn q

s2

s1

cn

clk

d l1

l2

qn q

s2

s1

cn

clk

d l1

l2

qn q

s2

s1

cn

clk

d l1

l2

qn q

s2

s1

cn

8.2.7 Example of a Bad Latch 477

Page 473: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Analysis of Bad Latch

1. When the latch is in store mode, there must be consistent values in the storageloop. Otherwise, the loop will be metastable.

2. As the circuit transitions from load mode to store mode, there must beconsistent values at the point where the load and store paths join.

3. The current value (α) must arrive at the join gate for the storage loop and loadpath before the constant “off” value from the load path arrives at the join gate.

4. Paths for this specific circuit :

•Path from clk to store-enable to join =

•Path from clk to load-enable to join =

5. Equation for this specific circuit :

8.2.7 Example of a Bad Latch 478

Page 474: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.2.8 Summary1. Test if latch is correct

(a) Find the storage loop(b) Find the load path(c) Check that load-mode and storage-mode are mutually exclusive(d) Find gates for load-enable, store-enable, and paths-join(e) Check that have even number of inversions on load path when in load mode(f) Check that have an even number of inversions in storage loop when in store

mode(g) Check that path for clk to store-enable to join is faster than path for clk to

load-enable to join2. Determine if latch is active high or active low

3. Find clock-to-Q time: delay along path clk to load-enable to output

4. Find setup time:delay(path for input to saturate storage loop) – delay(path to turn on storage loop)

delay(d to store-enable) – delay(clk to store-enable)

5. Find hold time:delay(path for input to affect internals) – delay(path to turn off load path)

delay(clk to load-enable) – delay(d to load-enable)8.2.8 Summary 479

Page 475: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.3 Advanced Timing Analysis of StorageElements

This is an advanced section.It is not covered in the course

and will not be tested.

8.3. ADVANCED TIMING ANALYSIS OF STORAGE ELEMENTS 480

Page 476: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.4 Critical Path

The critical path of a circuit is used to determine the maximum propagation delayof the circuit, which in turn constrains the minimum clock period.

Scenario:

• a purely combinational circuit

• at t = 0, one or more inputs change value

• record the time of the last value change (edge) on an output

The maximum of the times of the last edge is the maximum delay (sometimes, just“delay”) through the circuit.

8.4. CRITICAL PATH 481

Page 477: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Example of Max Delay

a

b

y

z

Each input may be 0, 1, , or .For n inputs, there are 4n− 2n possibleinput vectors.

delay=5.5ns

a

b

y

z

0 1 2 3 4 5 6 7 8 9

delay=6.5ns

a

b

y

z

0 1 2 3 4 5 6 7 8 9

delay=6.0ns

a

b

y

z

0 1 2 3 4 5 6 7 8 9

delay=7.8ns

a

b

y

z

0 1 2 3 4 5 6 7 8 9

8.4. CRITICAL PATH 482

Page 478: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.4.1 Introduction to Critical and FalsePaths

Definition critical path: The slowest path on the chip between flops or flopsand pins. The critical path limits the maximum clock speed.

Definition false path: a path along which an edge cannot travel frombeginning to end.

Throughout our discussion of critical paths, we will use the delay values for gatesshown in the table below.

gate delayNOT 2AND 4OR 4XOR 6

8.4.1 Introduction to Critical and False Paths 483

Page 479: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.4.1.1 Example of Critical Path in FullAdder

Question: Find the longest path through the full-adder circuit shown below.

ci

a

b

co

si

j

k

Question: Does the excitation ci=1, a= , b=0 exercise the longest path?

ci

a

b

co

si

j

k

8.4.1 Introduction to Critical and False Paths 484

Page 480: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Alternative Excitation

Question: Does the excitation ci=0, a= , b=1 exercise the critical path?

ci

a

b

co

si

j

k

8.4.1 Introduction to Critical and False Paths 485

Page 481: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Exercising the Critical Path

Not all all transitions on the inputs will exercise the critical path. Using timingsimulation to find the maximum delay of a circuit might underestimate the delay,because the inputs values that you simulate might not exercise the critical path.

8.4.1.2 Longest Path and Critical Path

The longest path through the circuit might not be the critical path, because thebehaviour of the gates might prevent an edge from travelling along the path. Usingthe longest path to find the maximum delay of a circuit might ovestimate thedelay, because the longest path might be a false path.

8.4.1 Introduction to Critical and False Paths 486

Page 482: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Example False Path

Question: Determine whether the longest path in the circuit below is a falsepath

a = 0, b = a = 0, b =

ya

b

ya

b

a = 1, b = a = 1, b =

ya

b

ya

b

8.4.1 Introduction to Critical and False Paths 487

Page 483: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Analytic Approach

Question: How can we determine analytically that this is a false path?

ya

b

Note: False paths False paths are an advanced topice and arecovered in section 8.5.

8.4.1 Introduction to Critical and False Paths 488

Page 484: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.4.1.3 Criteria for Critical Path Algorithms

Let Tr be the real (as measured by stimulating the circuit with all possible inputvectors) maximum time of the last edge on an output.

Given an algorithm to calculate the critical path, let Ta be the time of the last edgeas calculated by the algorithm.

Three criteria for evaluating merits of a critical path algorithm:

1. Correctness: Ta ≥ Tr: The delay given by the algorithm must be at least aslong as the real maximum delay.

2. Optimality: The goal is to minimize Ta−Tr. The closer algorithm is to the realmaximum delay, the more optimal the algorithm is.

3. Complexity: The goal is to minimize the computational complexity (i.e., runtime) of the algorithm.

Tr

Question: Is “longest path” a correct critical path algorithm?8.4.1 Introduction to Critical and False Paths 489

Page 485: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.4.2 Longest Path

8.4.2.1 Algorithm to Find Longest Path

The basic idea is to annotate each signal with the maximum delay from it to anoutput.

•Start at destination signals and traverse through fanin to source signals.

– Destination signals have a delay of 0

– At each gate, annotate the inputs by the delay through the gate plus the delayof the output.

– When a signal fans out to multiple gates, annotate the output of the source(driving) gate with maximum delay of the destination signals.

• The primary input signal with the maximum delay is the start of the longest path.The delay annotation of this signal is the delay of the longest path.

• The longest path is found by working from the source signal to the destinationsignals, picking the fanout signal with the maximum delay at each step.

8.4.2 Longest Path 490

Page 486: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.4.2.2 Longest Path Example

Question: Find the longest path through the circuit below.

a

b

c

l

m

d

e

fg

h

i

j

k

8.4.2 Longest Path 491

Page 487: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.4.3 Monotone Speedup

VariabilityVariability:

• The delay through a gate can change over time

– temperature

– supply voltage

– other effects

• The delay through two “identical” gates can be different

– manufacturing variability

– load

Example: measure the delay 1000 timesfor each of 1000 “identical” AND gates.

Delay

Population

8.4.3 Monotone Speedup 492

Page 488: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Timing Models

When you design a circuit, you do not know the precise delays through thephysical gates that will be on the chips.

The manufacturer will discard all chips whose gates are too fast or too slow.

Manufacturers give min/max bounds on the delay for each gate in the cell library.

Critical path analysis with min/max delays is very complex.

Goal: do critical path analysis with just the max delay through a gate.

Problem: using the maximum delay through each gate might not cause themaximum delay in the circuit.

8.4.3 Monotone Speedup 493

Page 489: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Slow Gates, Fast ChipBehaviour with maximum delay through each gate

a

be

f

c

d0 0 2 4

0 2

0

Rising edge excitation

a

be

f

c

d0 0 2 4

0 2

0

106

Falling edge excitation

Behaviour with minimum delay through b and d

a

be

f

c

d0 0 0.5 1

0 2

0

610

Rising edge excitation

a

be

f

c

d0 0 0.5 1

0 2

0

Falling edge excitation

8.4.3 Monotone Speedup 494

Page 490: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

MonotonicityDefinition monotonic: A function ( f ) is monotonic if increasing its input

causes the output to increase or remain the same. Mathematically:x < y =⇒ f (x)≤ f (y).

Definition monotononous: A lecture is monotonous if increasing the length ofthe lecture increases the number of people who are asleep.

Definition monotone speedup: The maximum clockspeed of a circuit shouldbe monotonic with respect to the speed of any gate or sub-circuit. That is, ifwe increase the speed of part of the circuit, we should either increase theclockspeed of the circuit, or leave it unchanged.

Definition monotononous speedup: A lecture has monotonous speedup ifincreasing the pace of the lecture increases the number of people who areawake.

Monotone speedup criteria for a critical path algorithm: if we decrease the delaythrough any part of the circuit (speedup), then the delay calculated by the criticalpath algorithm will decrease or stay the same.

8.4.3 Monotone Speedup 495

Page 491: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Review: Critical Path Analysis

1. If we say that the delay along the longest path is the delay of the circuit, willthis algorithm be correct with respect to monotone speedup?

8.4.3 Monotone Speedup 496

Page 492: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.5 False PathsThis is an advanced section.

It is not covered in the courseand will not be tested.

8.5. FALSE PATHS 497

Page 493: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.6 Analog Timing Model

Goal: define how to compute the delay of a gate or FPGA cell.

a b

• section 8.6: precise differential equations; complex because no closed-formsolutions for realistic circuits

• section 8.7: Elmore’s approximation to precise differential equations

Objectives• How to define the delay through a gate or circuit. (section 8.6.1)

• How to model a circuit as an RC-network. (section 8.6.2)

• How to calculate delay of an RC-network. (section 8.6.3)

8.6. ANALOG TIMING MODEL 498

Page 494: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.6.1 Defining DelayGoal: define “the delay through a gate”.

a b

Easy:

timevolt

age

delay

Va

Vb

Reality:

1. The slope of the output (Vb) is dependent upon the slope of the input (Va):

time

volt

age

which delay?

8.6.1 Defining Delay 499

Page 495: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Defining Delay (cont’d)2. The more gates that a gate drives (the larger the load) the slower the output

voltage will rise.

ab

ab

time

volt

age

delay

Va

Vb

Load of 1 gate: short delaytime

volt

age

delay

Va

Vb

Load of 4 gates: long delay

8.6.1 Defining Delay 500

Page 496: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Defining Delay (cont’d)3. Because the output waveform is sloped, we must choose the voltage level for

Vb at which we will measure the delay.

time

volt

age

which delay?

Va

Vb

Vdd

0

0.65 Vdd

0.35 Vdd

Vb actual waveform

Vb discretized waveform

Definition Trip Points: A high or ’1’ trip point is the voltage level where anupwards transition means the signal represents a ’1’.

A low or ’0’ trip point is the voltage level where a downwards transitionmeans the signal represents a ’0’.

8.6.1 Defining Delay 501

Page 497: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Summary: Analog Delay ModelThe standard approach to define the delay through a gate is:

Input waveform: Step functionLoad circuit: 4 copies of the gateOutput trip points: 0.65 Vdd for a 0 to 1 transition

0.35 Vdd for a 1 to 0 transition.

The delay through a circuit is identical, except that the load is either a standardload such as 4 NAND gates, specified by the user, or the module in which thecircuit is used.

8.6.1 Defining Delay 502

Page 498: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.6.2 Modeling Circuits for Timing•Model the delay of wires, gates, and connections between wires (via,

switch-box, or antifuse).

•Dominant factors in delay are resistance and capacitance

•Resistance and capacitance affected by different parameters.

Resistance CapacitanceWires

•Material•Cross section• Length

•Material (of wire and di-electric)•Cross section• Length•Distance to nearest wire

Gates•Usually negligible •Size

ViasAnti-fusesSwitch-boxes •Usually large •Usually negligible

8.6.2 Modeling Circuits for Timing 503

Page 499: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Resistance and Capacitance of WiresWires

•Material

– Typically aluminum or copper

– Copper has less resistance than aluminum

– Copper is more difficult to work with, typically used only for long wires

– For capacitance, material of di-electric (material surounding wire) is alsoimportant

•Cross section

– Larger cross section has lower resistance

– Tall narrow wires require less area, but have higher capacitance

• Length: longer length has greater resistance and capacitance

8.6.2 Modeling Circuits for Timing 504

Page 500: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Components and Models

Gate Switchbox Wire

Physical chip

Physical model

RC network

8.6.2 Modeling Circuits for Timing 505

Page 501: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.6.2.1 Example: Two Buffers withComplex Wiring

SchematicG1 G2

Physical chip(one of many possible

layouts)

G1

S1

W1

S2

W2

S3

W3S4

G2

Physical model

G1 G2

S1

W1

S2

W2

S3

W3

S4

RC network CW1

G1

Vi

CW2

RW1

CW3

RW2

RW3

CG2

G2

RS1

RS2

RS3

RS4

CG1

8.6.2 Modeling Circuits for Timing 506

Page 502: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.6.2.2 Example: Two Buffers with SimpleWiring

SchematicG1 G2

Physical chipG1

S1

W1

G2

S2

Physical modelG1 G2

W1 S2S1

RC network

8.6.2 Modeling Circuits for Timing 507

Page 503: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.6.3 Calculate Delay

Trim for timinganalysis C

W1

RW1

CG2

RS1

RS2

V0

•Even this simple example is too complex for our first simple example

•Simplify the simple example by simply assuming that the capacitance of thewire is much less than the capacitance of the gate (CW1�CG2).

SimplifyR

W1

CG2

RS1

RS2

VG1

VG2

•Another simplification: collapse the line of resistors into a single resistor (thisjust makes the algebra simpler, it does not affect the precision of the analysis).

R = RS1+RW1+RS2

SimplifyC

R

VG1

VG2

8.6.3 Calculate Delay 508

Page 504: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Two Bufs: Derivation of Delay EquationGoal: calculate delay from VG1 to VG2

C

R

VG1

VG2

RC networktime

delay

VG1

VG2

VDD

0.65 VDD

Waveforms

Derivation of equation for VG2:

VG2(t) = VG1(t) − voltage drop from VG1 to VG2= VG1(t) − I(t)R

Equation for the current through the capacitor

I(t) = CdVG2(t)

dt

8.6.3 Calculate Delay 509

Page 505: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

VG2(t) = VG1(t) −(

CdVG2(t)

dt

)R

= VG1(t) − RCdVG2(t)

dt

8.6.3 Calculate Delay 510

Page 506: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Delay Analysis

With initial condition VG2(0) = 0and forcing function VG1(t) = VDD-step-function,the closed form solution is:

VG2(t) = VDD − VDDe−t/RC

Find delay for VG2 to reach 0.65VDD0.65VDD = VDD−VDDe−t/RC

0.35 = e−t/RC

ln 0.35 = ln(e−t/RC)

−1.05 = −t/RC−1.0 ≈ −t/RC

t = RC

8.6.3 Calculate Delay 511

Page 507: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

With a step function input, the delay forVG2 to reach 0.65VDD is:

(RS1+RW1+RS2)CG2

which is commonly known as the RCtime constant.

time

delay

VG1

VG2

VDD

0.65 VDD

8.6.3 Calculate Delay 512

Page 508: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.6.4 Ex: Two Bufs with Both Caps

To calculate voltages and currents correctly, we number the nodes of the circuitconsistently.

• The voltage source and the top of each capacitor is a node.

•We number the nodes, capacitors, and resistors.

•Resistors are numbered according to the capacitor to their right.

•Multiple resistors in series without an intervening capacitor are lumped into asingle resistor.

•Vi is the voltage at node i.

• Iri is the current flowing through Ri

• Ici is the current flowing into/out-of Ci

CW1

RW1

CG2

RS1

RS2

V0

Original RC network With node numbers

8.6.4 Ex: Two Bufs with Both Caps 513

Page 509: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

C1

R1

C2

R2

V0

1 20

IC2

IC1I

R1IR2

RC networktime

delay

VG1

VG2

VDD

0.65 VDD

Waveforms•Calculate the delay from V0 to V2.

•Derive the equation for V2:

V2(t) = V0(t) − voltage drop from V0 to V2

The voltage drop is the sum of the voltage dropsacross the resistors on the path from V0 to V2

= V0(t)− IR1(t)R1− IR2(t)R2

8.6.4 Ex: Two Bufs with Both Caps 514

Page 510: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

The current through a resistor is the sum of thecurrents through the downstream capacitors

IR1(t) = IC1(t)+ IC2(t)IR2(t) = IC2(t)

V2(t) = V0(t) − (IC1(t)+ IC2(t))R1 − IC2(t)R2

Group by currents= V0(t) − R1IC1(t) − (R1+R2)IC2(t)

Equation for the current through a capacitor

IC(t) = CdV (t)

dt

V2(t) = V0(t) − R1

(C1

dV1(t)dt

)− (R1+R2)

(C2

dV2(t)dt

)

V2(t) = V0(t) − R1C1dV1(t)

dt− (R1+R2)C2

dV2(t)dt

Problem: no closed form solution!

8.6.4 Ex: Two Bufs with Both Caps 515

Page 511: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Summary: Precise Equations

Measure delay from source node (node 0) to a particular destination (node i)• Initial condition: node i is at GND: Vi(0) = 0•Voltage at source node is step function to VDD•Measure time for Vi to reach 0.65VDD•Vi(t) =V0(t) − voltage drop across resistors on path from 0 to i•Voltage drop across resistor: V = IRR•Current through resistor is sum of currents through downstream capacitors

Result is a partial differential equation without a closed-form solution.

To calculate Vi(t) precisely:•Write one equation for each node•Have set of n partial differential equations for n variables•Use numerical methods to calculate Vi(t)•Note: to calculate the voltage at one node,

need to calculate the voltage at every node.8.6.4 Ex: Two Bufs with Both Caps 516

Page 512: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.7 Elmore Delay Model

8.7.1 Elmore Delay as an ApproximationTo avoid solving a set of partial differential equations, Elmore proposed a simple,but effective approximation.

C1

R1

C2

R2

V0

1 20

IC2

IC1I

R1IR2

Exact equation

V2(t) = V0(t) − R1C1dV1(t)

dt− (R1+R2)C2

dV2(t)dt

Template with a closed-form solution:

= V0(t) − kdV (t)

dt

Elmore’s approximation:dV1(t)

dt=

dV2(t)dt

8.7. ELMORE DELAY MODEL 517

Page 513: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

= V0(t) − R1C1dV2(t)

dt− (R1+R2)C2

dV2(t)dt

= V0(t) − (R1C1+(R1+R2)C2)dV2(t)

dt

With initial condition V2(0) = 0and forcing function V0(t) = VDD-step-function:

= VDD − VDDe−t/(R1C1+(R1+R2)C2)

Time for V2 to go from GND to 0.65VDDt = R1C1+(R1+R2)C2

8.7.1 Elmore Delay as an Approximation 518

Page 514: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Plots

0 10 20 30 40 500.0

0.2

0.4

0.6

0.8

1.0

8.7.1 Elmore Delay as an Approximation 519

Page 515: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.7.2 A More Complicated Example

R1

C1

R2

C2

R4

C4

R3

C3

Definition path: The path from the source node to a node i is the set of allresistors between the source and i.

Example: path(2) =

Definition down: The set of capactitors downstream from a resistor is the setof all capacitors where current would flow through the resistor to charge thecapacitor. You can think of this as the set of capacitors that are between thenode and ground.

Example: down(R2) = Example: down(R4) =8.7.2 A More Complicated Example 520

Page 516: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Definition Elmore time constant: Simple formula:

TDi = ∑r∈path(i)

Rr ∑c∈down(r)

Cc

The conventional formula is more complex syntactically, but equivalentmathematically:

TDi = ∑k∈Nodes

Ck ∑r ∈ (path(i) ∩ path(k))

Rr

The equivalence of the two formulations can be shown by observing that if c isdownstream from r, then r is on the path to c:

c ∈ down(r)⇐⇒ r ∈ path(c)

8.7.2 A More Complicated Example 521

Page 517: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Calculate Elmore Delay

Calculate the Elmore delay to node 2.

R1

C1

R2

C2

R4

C4

R3

C3

8.7.2 A More Complicated Example 522

Page 518: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Summary of Analog and Elmore Delay

1.

2.

3.

4.

8.7.2 A More Complicated Example 523

Page 519: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8.8 Practical Usage of Timing Analysis

This is an advanced section.It is not covered in the course

and will not be tested.

8.8. PRACTICAL USAGE OF TIMING ANALYSIS 524

Page 520: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 9

Power Analysis and Power-AwareDesign

525

Page 521: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.1 Overview

9.1.1 Importance of Power and Energy• Laptops, PDA, cell-phones, etc — obvious!

• For microprocessors in personal computers, every watt above 40W adds $1 tomanufacturing cost

•Approx 25% of operating expense of server farm goes to energy bills

• (Dis)Comfort of Unix labs in E2

•Sandia Labs had to build a special sub-station when they took delivery ofTeraflops massively parallel supercomputer (over 9000 Pentium Pros)

•High-speed microprocessors today can run so hot that they will damagethemselves — Athlon reliability problems, Pentium 4 processor thermal throttling

• In 2000, information technology consumed 8% of total power in US.

• Future power viruses: cell phone viruses cause cell phone to run in full powermode and consume battery very quickly; PC viruses that cause CPU tomeltdown batteries

9.1. OVERVIEW 526

Page 522: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.1.2 Power vs Energy

Most people talk about “power” reduction, but sometimes they mean “power” andsometimes “energy.”

•Power minimization is usually about heat removal

•Energy minimization is usually about battery life or energy costs

Type Units Equivalent Types EquationsEnergy Joules Work = Volts×Coulombs

= 12×C×Volts2

Power Watts Energy / Time = Volts×Current= Joules/sec

9.1.2 Power vs Energy 527

Page 523: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.1.3 Batteries, Power and Energy

9.1.3.1 Do Batteries Store Energy orPower?

Energy = Volts×Coulombs

Power =Energy

Time

Batteries rated in Amp-hours at a voltage.

battery = Amps×Seconds×Volts

= CoulombsSeconds ×Seconds×Volts

= Coulombs×Volts

= Energy

Batteries store energy.9.1.3 Batteries, Power and Energy 528

Page 524: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.1.3.2 Battery Life and EfficiencyTo extend battery life, we want to increase the amount of work done and/ordecrease energy consumed.

Work and energy are same units, therefore to extend battery life, we truly want toimprove efficiency.

“Power efficiency” of microprocessors normally measured in MIPS/Watt. Is this areal measure of efficiency?

MIPsWatts = millions of instructions

Seconds × SecondsEnergy

= millions of instructionsEnergy

Both instructions executed and energy are measures of work, so MIPs/Watt is ameasure of efficiency.

Question: What is the weakness of this analysis?9.1.3 Batteries, Power and Energy 529

Page 525: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.1.3.3 Battery Life and Power

Question: Running a VHDL simulation requires executing an average of 1million instructions per simulation step. My computer runs at 1.5GHz, has aCPI of 0.67, and burns 18W of power. My battery is rated at 11V and 5.6Ah.Assuming all of my computer’s clock cycles go towards running VHDLsimulations, how many simulation steps can I run on one battery charge?

9.1.3 Batteries, Power and Energy 530

Page 526: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Battery Life and Power

Question: In low-power mode, the clock speed is reduced to 1.0GHz andthe power consumpsion is 10W. In low-power mode, how much longer can Irun the computer on one battery charge?

9.1.3 Batteries, Power and Energy 531

Page 527: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Battery Life and Power

Question: In low-power mode, how many more simulation steps can I runon one battery?

9.1.3 Batteries, Power and Energy 532

Page 528: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.2 Power Equations

Power = SwitchPower+ShortPower︸ ︷︷ ︸ + LeakagePower︸ ︷︷ ︸DynamicPower StaticPower

Dynamic Power dependent upon clock speed

Switching Power useful — charges up transistors

Short Circuit Power not useful — both N and P transistors are on

Static Power independent of clock speed

Leakage Power not useful — leaks around transistor

9.2. POWER EQUATIONS 533

Page 529: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Dynamic Power

Dynamic power is proportional to how often signals change their value (switch).

•Roughly 20% of signals switch during a clock cycle.

•Need to take glitches into account when calculating activity factor. Glitchesincrease the activity factor.

•Equations for dynamic power contain clock speed and activity factor.

Activity factor =number of value changes

number of signal×number of clock cycles

9.2. POWER EQUATIONS 534

Page 530: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.2.1 Switching Power

1->0

0->1CapLoad

Charging a capacitor

0->1

1->0CapLoad

Discharging a capacitor

energy to (dis)charge capacitor =12×CapLoad×VoltSup2

9.2.1 Switching Power 535

Page 531: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Switching Power

When a capacitor C is charged to a voltage V , the energy stored in capacitor is12CV 2.

The energy required to charge the capacitor from 0 to V is CV 2. Half of the energy(12CV 2) is dissipated as heat through the pullup resistance. Half of energy is

transfered to the capacitor.

When the capacitor discharges from V to 0, the energy stored in the capacitor(12CV 2) is dissipated as heat through the pulldown resistance.

ClockSpeed clock speedActFact average number of times that signal switches from 0→ 1

or from 1→ 0 during a clock cycle

average switching power =12×ActFact×ClockSpeed×CapLoad×VoltSup2

9.2.1 Switching Power 536

Page 532: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.2.2 Short-Circuited Power

Vi Vo

IShort

VoltSup

GND

VoltThresh

VoltSup - VoltThresh

P-trans on

N-trans on

TimeShort

Gate Voltage

PwrShort = ActFact×ClockSpeed×TimeShort× IShort×VoltSup9.2.2 Short-Circuited Power 537

Page 533: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.2.3 Leakage Power

N-substrate

P

Vi

Vo

N N P

P

Cross section of invertor showingparasitic diode

I

V

ILeak

Leakage current through parasitic diode

PwrLk = ILeak×VoltSup

ILeak ∝ e

(−q×VoltThresh

k×T

)

9.2.3 Leakage Power 538

Page 534: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.2.4 Glossary

This section reserved for your reading pleasure

9.2.5 Note on Power Equations

This section reserved for your reading pleasure

9.3 Overview of Power ReductionTechniques

We can divide power reduction techniques into two classes: analog and digital.

9.2.4 Glossary 539

Page 535: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Analog Parameters

Power reduction parameters at the analog level.

capacitance for example: Silicon on Insulator (SOI) and high-K dielectrics

resistance for example: copper wires rather than aluminum

voltage low-voltage circuits

9.3. OVERVIEW OF POWER REDUCTION TECHNIQUES 540

Page 536: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Analog Techniques

Power reduction techniques at the analog level.

dual-VDD Two different supply voltages: high voltage for performance-criticalportions of design, low voltage for remainder of circuit. Alternatively, can varyvoltage over time: high voltage when running performance-critical software andlow voltage when running software that is less sensitive to performance.

dual-Vt Two different threshold voltages: transistors with low threshold voltagefor performance-critical portions of design (can switch more quickly, but moreleakage power), transistors with high threshold voltage for remainder of circuit(switches more slowly, but reduces leakage power).

exotic circuits Special flops, latches, and combinational circuitry that run at ahigh frequency while minimizing power

adiabatic circuits Special circuitry that consumes power on 0→ 1 transitions,but not 1→ 0 transitions. These sacrifice performance for reduced power.

clock trees Up to 30% of total power can be consumed in clock generation andclock tree

9.3. OVERVIEW OF POWER REDUCTION TECHNIQUES 541

Page 537: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Digital Parameters

Power-reduction parameters at the digital level.

capacitance (number of gates)

activity factor

clock frequency

9.3. OVERVIEW OF POWER REDUCTION TECHNIQUES 542

Page 538: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Digital Techniques

Power-reduction techniques at the digital level.

multiple clocks Put a high speed clock in performance-critical parts of designand a low speed clock for remainder of circuit

clock gating Turn off clock to portions of a chip when it’s not being used

data encoding Gray coding vs one-hot vs fully encoded vs ...

glitch reduction Adjust circuit delays or add redundant circuitry to reduce oreliminate glitches.

asynchronous circuits Get rid of clocks altogether....

9.3. OVERVIEW OF POWER REDUCTION TECHNIQUES 543

Page 539: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.4 Voltage Reduction for PowerReduction

If our goal is to reduce power, the most promising approach is to reduce thesupply voltage, because, from:

Power = (ActFact×ClockSpeed× 12CapLoad×VoltSup2)

+ (ActFact×ClockSpeed×TimeShort× IShort×VoltSup)+ (ILeak×VoltSup)

we observe:

Power ∝ VoltSup2

9.4. VOLTAGE REDUCTION FOR POWER REDUCTION 544

Page 540: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Reducing Difference Between Supply andThreshold Voltage

As the supply voltage decreases, it takes longer to charge up the capacitive load,which increases the load delay of a circuit.

In the chapter on timing analysis, we saw that increasing the supply voltage willdecrease the delay through a circuit. (From V = IR, increasing V causes anincrease in I, which causes the capacitive load to charge more quickly.) However,it is more accurate to take into account both the value of the supply voltage, andthe difference between the supply voltage and the threshold voltage.

LoadDelay ∝VoltSup

(VoltSup−VoltThresh)2

9.4. VOLTAGE REDUCTION FOR POWER REDUCTION 545

Page 541: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Effect of Decreasing Supply Voltage onDelay

Question: If the delay through a circuit is 20 ns, the supply voltage is 2.8 V,and the threshold voltage is 0.7 V, calculate the delay if the supply voltageis dropped to 2.2 V.

9.4. VOLTAGE REDUCTION FOR POWER REDUCTION 546

Page 542: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Sacrifice or Optimization?•Decreasing the supply voltage increases the delay through the circuit

• Increasing the clock period allows us to:

9.4. VOLTAGE REDUCTION FOR POWER REDUCTION 547

Page 543: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Clock Speed and Power Consumption

Question: In the question on high-performance / low-power for VHDLsimulation earlier in the chapter, the laptop was able to execute 27.7 millionsimulation steps in high power mode and 33.2 million simulation steps inlow-power mode. What percentage of the additional simulation steps wasdue to reducing the clock speed and what percentage was due to reducingthe supply voltage?

9.4. VOLTAGE REDUCTION FOR POWER REDUCTION 548

Page 544: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Reducing Threshold Voltage IncreasesLeakage Current

If we reduce the supply voltage, we want to also reduce the threshold voltage, sothat we do not increase the delay through the circuit. However, as thresholdvoltage drops, leakage current increases:

ILeak ∝ e

(−q×VoltThresh

k×T

)

And increasing the leakage current increases the power:

Power ∝ ILeak

So, need to strike a balance between reducing VoltSup (which has a quadraticaffect on reducing power), and increasing ILeak, which has a linear affect onincreasing power.

9.4. VOLTAGE REDUCTION FOR POWER REDUCTION 549

Page 545: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.5 Data Encoding for Power Reduction

9.5.1 How Data Encoding Can ReducePower

Data encoding is a technique that chooses data values so that normal executionwill have a low activity factor.

The most common example is “Gray coding” where exactly one bit changes valueeach clock cycle when counting.

9.5. DATA ENCODING FOR POWER REDUCTION 550

Page 546: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Decimal Gray Binary0 0000 00001 0001 00012 0011 00103 0010 00114 0110 01005 0111 01016 0101 01107 0100 01118 1100 10009 1101 1001

10 1111 101011 1110 101112 1010 110013 1011 110114 1001 111015 1000 1111

9.5.1 How Data Encoding Can Reduce Power 551

Page 547: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

8-bit Counter

Question: For an eight-bit counter, how much more power will a binarycounter consume than a Gray-code counter?

9.5.1 How Data Encoding Can Reduce Power 552

Page 548: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Random Data

Question: For completely random eight-bit data, how much more power willa binary circuit consume than a Gray-code circuit?

9.5.1 How Data Encoding Can Reduce Power 553

Page 549: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.5.2 Example Problem: Sixteen Pulser

9.5.2.1 Problem StatementYour task is to do the power analysis for a circuit that should send out aone-clock-cycle pulse on the done signal once every 16 clock cycles. (That is,done is ’0’ for 15 clock cycles, then ’1’ for one cycle, then repeat with 15 cycles of’0’ followed by a ’1’, etc.)

done

1 2 3 1615 17 3231 33

clk

Required behaviour

You have been asked to consider three different types of counters: a binarycounter, a Gray-code counter, and a one-hot counter. (The table below shows thevalues from 0 to 15 for the different encodings.)

Question: What is the relative amount of power consumption for thedifferent options?

9.5.2 Example Problem: Sixteen Pulser 554

Page 550: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.5.2.2 Additional Information

Your implementation technology is an FPGA where each cell has a programablecombinational circuit and a flip-flop. The combinational circuit has 4 inputs and 1output. The capacitive load of the combinational circuit is twice that of the flip-flop.

PLA

cell1. You may neglect power associated with clocks.

2. You may assume that all counters:

(a) are implemented on the same fabrication process

(b) run at the same clock speed

(c) have negligible leakage and short-circuit currents

9.5.2 Example Problem: Sixteen Pulser 555

Page 551: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Data Encoding

Decimal Gray One-Hot Binary0 0000 0000000000000001 00001 0001 0000000000000010 00012 0011 0000000000000100 00103 0010 0000000000001000 00114 0110 0000000000010000 01005 0111 0000000000100000 01016 0101 0000000001000000 01107 0100 0000000010000000 01118 1100 0000000100000000 10009 1101 0000001000000000 1001

10 1111 0000010000000000 101011 1110 0000100000000000 101112 1010 0001000000000000 110013 1011 0010000000000000 110114 1001 0100000000000000 111015 1000 1000000000000000 1111

9.5.2 Example Problem: Sixteen Pulser 556

Page 552: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.5.2.3 Answer

Sketch the CircuitryName the output “done” and the count digits “d()”.

9.5.2 Example Problem: Sixteen Pulser 557

Page 553: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Capacitance

cap number subtotal capGray d() PLAs

Flops

done PLAs

Flops

1-Hot d() PLAs

Flops

done PLAs

Flops

Binary d() PLAs

Flops

done PLAs

Flops

9.5.2 Example Problem: Sixteen Pulser 558

Page 554: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Activity Factors

Gray Coding Activity Factor

d(0)

d(1)

d(2)

d(3)

done

clk

4/16

2/16

2/16

2/16

8/16

Gray coding

9.5.2 Example Problem: Sixteen Pulser 559

Page 555: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

One-Hot Activity Factor

d(0)

d(1)

d(2)

done

clk

2/16

2/16

2/16

2/16

2/16

One-hot coding

9.5.2 Example Problem: Sixteen Pulser 560

Page 556: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Binary Coding Activity Factor

d(0)

d(1)

d(2)

d(3)

done

clk

8/16

4/16

2/16

2/16

16/16

Binary coding

9.5.2 Example Problem: Sixteen Pulser 561

Page 557: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Putting it all Togethersubtotal cap act fact power

Gray d() PLAsFlops

done PLAsFlopsTotal

1-Hot d() PLAsFlops

done PLAsFlopsTotal

Binary d() PLAsFlops

done PLAsFlopsTotal

9.5.2 Example Problem: Sixteen Pulser 562

Page 558: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6 Clock Gating

The basic idea of clock gating is to reduce power by turning off the clock when acircuit isn’t needed. This reduces the activity factor.

Related to clock gating:

Chip enable Use the chip-enable on a flip flop to hold the output constant whennot needed.

Operand gating Use AND gates and an enable signal to set data (operand)values to zero when a datapath circuit is not needed.

Power gating Turn off supply voltage to part of the chip.

9.6. CLOCK GATING 563

Page 559: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.1 Introduction to Clock Gating

Examples of Clock Gating

Condition Circuitry turned offO/S in standby mode Everything except “core” state (PC, registers,

caches, etc)No floating point instruc-tions for k clock cycles

floating point circuitry

Instruction cache miss Instruction decode circuitryNo instruction in pipestage i

Pipe stage i

9.6.1 Introduction to Clock Gating 564

Page 560: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.2 Implementing Clock Gating

Clock gating is implemented by adding a component that disables the clock whenthe circuit isn’t needed.

i_data

clk

o_data

i_valid

o_valid

Without clock gating

Clock Enable

State Machine

clk

i_wakeup

clk_en

cool_clk

i_data o_data

i_valid

o_valid

With clock gating

9.6.2 Implementing Clock Gating 565

Page 561: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.3 Design Process

This section reserved for your reading pleasure

9.6.3 Design Process 566

Page 562: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.4 Effectiveness of Clock GatingPctClk = Percentage of clock cycles that clock toggles.PctBusy = Percentage of clock cycles when busy doing useful work.A = Activity factor without clock gatingA′ = Activity factor with clock gatingA′min = Activity factor if clock is on only when busyEff = Effectiveness of clock gating

Act

ivit

y f

acto

r

PctClk

A

A’min

0

0% 100%

0%

PctBusy

100%Eff

Eff =0% =⇒ A′ =

Eff =100% =⇒ A′ =

Effectiveness measures the percentage of clock cycles when the circuit is idle(contains only bubbles) that the clock is turned off.

9.6.4 Effectiveness of Clock Gating 567

Page 563: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Effectiveness (Cont’d)A

ctiv

ity f

acto

r

PctClk

A

A’min

0

0% 100%

0%

PctBusy

100%Eff

Eff

0%

100%

PctClk

PctBusy

100%

Act

ivit

y f

acto

r

A

A’min

0

0% 100%Eff

Eff =

PctClk =

A’ =

9.6.4 Effectiveness of Clock Gating 568

Page 564: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Clock Gating Effectiveness Questions

Question: What is the effectiveness if the clock toggles only when thecircuit contains a parcel?

Question: What is the effectiveness of a clock that always toggles?

9.6.4 Effectiveness of Clock Gating 569

Page 565: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Clock Gating Effectiveness Questions

Question: What does it mean for a clock gating scheme to be 75%effective?

Question: What happens if PctClk < PctBusy?

9.6.4 Effectiveness of Clock Gating 570

Page 566: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.5 Example: Reduced Activity Factorwith Clock Gating

Question: How much power will be saved in the following clock-gatingscheme?

• 70% of the time the main circuit contains at least one parcel

• clock gating circuit is 90% effective

• clock gating circuit has 10% of the area of the main circuit

• clock gating circuit has same activity factor as main circuit

• neglect short-circuiting and leakage power

9.6.5 Example: Reduced Activity Factor with Clock Gating 571

Page 567: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.5 Example: Reduced Activity Factor with Clock Gating 572

Page 568: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.6 Calculating PctBusy

9.6.6.1 Valid Bits and Busy

Use valid bits to determine when a circuit is busy.

clk

i_valid

i_data o_data

o_valid

clk

i_valid

i_data

o_data

o_valid

α β γ

α β γ

9.6.6 Calculating PctBusy 573

Page 569: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Microscopic Analysis

Which clock edges are needed?

i_valid

clk

o_valid

clk

i_valid

o_valid

1 2 3 4

cool_clk

clk_en

5

9.6.6 Calculating PctBusy 574

Page 570: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.6.2 Calculating LenBusyLenBusy

i_valid

o_valid

clk_en

LatencyNumPcls LenBusy

i_valid

o_valid

clk_en

i_valid

o_valid

clk_en

i_valid

o_valid

clk_en

i_valid

o_valid

clk_en

i_valid

o_valid

clk_en

i_valid

o_valid

clk_en

1 2 3 4 5 6 7

1 2 3 4 5 6 7

1 2 3 4 5 6 7

1 2 3 4 5 6 7

1 2 3 4 5 6 7

1 2 3 4 5 6 7

1 2 3 4 5 6 7

LatencyNumPcls

9.6.6 Calculating PctBusy 575

Page 571: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.6.3 From LenBusy to PctBusy

Find the core of a repeating pattern of parcels and bubbles.

LenCore =

LenBusy =

PctBusy =

=

Question: What happens if Lat > NumBubbles?

9.6.6 Calculating PctBusy 576

Page 572: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.7 Example: Pipelined Circuit withClock-Gating

Design a “clock enable state machine” for the pipelined component describedbelow.

• area of pipelined component = 100

• latency varies from 5 to 10 clock cycles, uniform distribution of latencies

• contains a maximum of 6 parcels

• 60% of clock cycles have a parcel on the inputs

• average length of continuous sequence of valid parcels is 80

• area of clock-enable state machine = 13

• use input and output valid bits for wakeup

• leakage current is negligible

• short-circuit current is negligible

9.6.7 Example: Pipelined Circuit with Clock-Gating 577

Page 573: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Waveforms for Parcel Count

i_valid

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

o_valid

parcel_count

parcel_clk_en

18 19 20 21 22 23 24

i_data α β γ δ ε

α β γ δ εo_data

Waveforms for Cycle Count

i_valid

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

o_valid

cycle_count

1 2 0 0 0 1 2 3 4 1 2 3 4 5 6 7 8 9 1000

cycle_clk_en

18 19 20 21 22 23 24

5

i_data α β γ δ ε

α β γ δ εo_data

9.6.7 Example: Pipelined Circuit with Clock-Gating 578

Page 574: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Behavioural Analysis

Question: Without further detailed analysis, can we determine whichdesign is the better option?

9.6.7 Example: Pipelined Circuit with Clock-Gating 579

Page 575: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Question: Which design option has lower power and how much lower is it?

9.6.7 Example: Pipelined Circuit with Clock-Gating 580

Page 576: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.7 Example: Pipelined Circuit with Clock-Gating 581

Page 577: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.7 Example: Pipelined Circuit with Clock-Gating 582

Page 578: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.8 Clock Gating in ASICs

EN

en

clk

en_clk

clk

en

q

en_clk

A register with a chip-enable can be synthesized into a register with a gated clock.

process (clk) begin

if rising_edge( clk ) and en = ’1’ then

a <= i_a;

b <= i_b;

end if;

end process;

Synthesis tools have commands and flags that cause this type of code to besynthesized into a circuit with a gated clock.

FPGAs do not support clock gating — use alternatives in section 9.6.9.9.6.8 Clock Gating in ASICs 583

Page 579: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.9 Alternatives to Clock Gating

9.6.9.1 Use Chip Enables

Same coding as with clock gating, just do not enable clock-gating in the synthesistool.

This technique is used with both ASICs and FPGAs.

9.6.9 Alternatives to Clock Gating 584

Page 580: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.9.2 Operand Gatingen

a

b

n

n

n

n

nz

Advantages

Disadvantages

9.6.9 Alternatives to Clock Gating 585

Page 581: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

9.6.9 Alternatives to Clock Gating 586

Page 582: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

Chapter 10

Review

This chapter lists the major topics of the term. The “Topics List” section for eachmajor area is meant to be relatively complete.

587

Page 583: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.1 Overview of the Term

• The purely digital world

– VHDL

– design and optimization methods

– performance analysis

•Analog effects in the digitalworld

– timing analysis

– power

10.1. OVERVIEW OF THE TERM 588

Page 584: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.2 VHDL

10.2.1 VHDL Topics• simple syntax and semantics — things that you should know simply by having

done the labs and project

• behavioural semantics of VHDL

• synthesis semantics of VHDL

•VHDL code as legal, synthesizable, and good-practice

10.2. VHDL 589

Page 585: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.2.2 VHDL Example Problems• identify whether a particular signal will be the output of combinational circuitry or

a flop

• identify whether a particular process is combinational or clocked

• legal, synthesizable, and good code

• perform delta-cycle simulation of VHDL

• perform RTL simulation of VHDL

• identify whether two VHDL fragments have same behaviour

• analyze area, approximate clock period, latency, throughput, etc.of VHDL code

10.2.2 VHDL Example Problems 590

Page 586: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.3 RTL Design Techniques

10.3.1 Design Topics• coding guidelines

• generic FPGA hardware

• area estimation

• finite state machines

– implicit

– explicit-current

– explicit-current+next

• from algorithm to hardware

– dependency graph

– dataflow diagram

– scheduling

– allocation

– hardware block diagram

– state machine

•memory dependencies

•memory arrays and dataflow diagrams

•Pipelining

•Retiming

•Area and performance optimizations

10.3. RTL DESIGN TECHNIQUES 591

Page 587: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.3.2 Design Example Problems• estimate area to implement a circuit in an FPGA

• calculate resource usage for a dataflow diagram

• calculate performance data for a dataflow diagram

• given an algorithm, design a dataflow diagram

• given a dataflow diagram, draw a control table and do resource allocation

• optimize a dataflow diagram to improve performance or reduce area

• analyze and compare the functionality, area, and performance of

– VHDL code

– pseudocode

– state machine

– dataflow diagram

– waveforms

– schematics

• use retiming to improve clock speed or area

• use retiming to determine if two circuits have the same behaviour10.3.2 Design Example Problems 592

Page 588: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.4 Performance Analysis andOptimization

10.4.1 Performance Topics• time to execute a program

• definition of performance

• speedup

• n% bigger, smaller

• calculating performance of different different tasks and of average task

• changing frequency of task and overall performance

• choosing which task to optimize to best improve overall performance

• performance increase over time

• design tradeoffs (CPI vs NumInsts vs ClockSpeed vs time-to-market)

•Clock speed vs. performance

•Optimality — performance / area tradeoffs

10.4. PERFORMANCE ANALYSIS AND OPTIMIZATION 593

Page 589: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.4.2 Performance Example Problems• calculate tradeoffs between performance, area, schedule, and power

• evaluate performance criteria

10.4.2 Performance Example Problems 594

Page 590: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.5 Timing Analysis

10.5.1 Timing Topics• circuit parameters that affect delay

– clock period

– clock skew

– clock jitter

– propagation delay

– load delay

– setup time

– hold time

– clock-to-Q time

• timing analysis of latch

• concepts of critical path vs false path,timing models, monotonic speedup

• elmore timing model

10.5. TIMING ANALYSIS 595

Page 591: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.5.2 Timing Example Problems• timing parameters for minimum clock period

• timing parameters for hold constraint

• determine if a latch will work correctly

• compute timing parameters of a latch

• identify timing violation, suggest remedy

• find the longest path

• test if an excitation excites a particular path

• compute the Elmore delay constant

• use concepts of Elmore delay to compare delay of two circuits

• compare accuracy of different timing models

• suggest design change to increase clock speed

10.5.2 Timing Example Problems 596

Page 592: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.6 Power

10.6.1 Power Topics• power vs energy

• equations for power

– dynamic power

– static power

– switching power

– short circuit power

– leakage power

– activity factor

– leakage current

– threshold voltage

– supply voltage

• analog power reduction techniques

• rtl power reduction techniques

– clock gating

10.6. POWER 597

Page 593: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.6.2 Power Example Problems• predict effect of new fabrication process (supply voltage, threshold voltage,

capacitance, circuit delay) on power

• predict effect of environment change (temp, supply voltage, etc) on powerconsumption

• predict effect of design change on power consumption (capacitance, activityfactor, clock speed)

• design clock gating scheme for a circuit, predict effect on power consumption

• asses validity of various power- or energy-consumption metrics

10.6.2 Power Example Problems 598

Page 594: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

10.7 Formulas to be Given on Final Exam

10.7. FORMULAS TO BE GIVEN ON FINAL EXAM 599

Page 595: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

P =12(A×C×V2×F)+(τ×A×V× ISh×F)+(V× IL)

T =Ins×C

F

F ∝(V−Vt)2

V

P = V×I

P =WT

IL ∝ e

−q×Vtk×T

10.7. FORMULAS TO BE GIVEN ON FINAL EXAM 600

Page 596: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

S =T1T2

M =F/106

(n

∑i=0

PIi×Ci)

A′ = (1−E(1−Pv))A

q = 1.60218×10−19C

k = 1.38066×10−23J/K

10.7. FORMULAS TO BE GIVEN ON FINAL EXAM 601

Page 597: 2018t1 (Winter) Mark Aagaard University of Waterloo ...ece327/course-notes/slides-up1.pdfUniversity of Waterloo Department of Electrical and Computer Engineering. ii. ... 1.11.3.3

logx y =logylogx

(xy)z = x(yz)

(xy)(xz) = x(y+z)

a = bc is equivalent to:

a1/c = b

10.7. FORMULAS TO BE GIVEN ON FINAL EXAM 602