Upload
tan
View
67
Download
2
Embed Size (px)
DESCRIPTION
ECE 448 Lecture 20. ASIC Front-End Design. Two competing implementation approaches. FPGA F ield P rogrammable G ate A rray. ASIC A pplication S pecific I ntegrated C ircuit. designed all the way from behavioral description to physical layout. no physical layout design; - PowerPoint PPT Presentation
Citation preview
George Mason UniversityECE 448 – FPGA and ASIC Design with VHDL
ASIC Front-End Design
ECE 448Lecture 20
2ECE 448 – FPGA and ASIC Design with VHDL
• designs must be sent for expensive and time consuming fabrication in semiconductor foundry
• bought off the shelf and reconfigured by designers themselves
Two competing implementation approaches
ASICApplication Specific
Integrated Circuit
FPGAField Programmable
Gate Array
• designed all the way from behavioral description to physical layout
• no physical layout design; design ends with a bitstream used to configure a device
3ECE 448 – FPGA and ASIC Design with VHDL
FPGAs vs. ASICs
ASICs FPGAs
High performanceOff-the-shelf
Short time to the market
Low development costs
Reconfigurability
Low power
Low cost (but only in high volumes)
4ECE 448 – FPGA and ASIC Design with VHDL
Local Memory
Global Memory
ASIC Design Example – Factoring circuit/GMU
5ECE 448 – FPGA and ASIC Design with VHDL
51x
ASIC 130 nm vs. Virtex II 6000Factoring/GMU
19.80 mm
19.6
8 m
m
2.7 mm
2.82 mm
Area of Xilinx Virtex II 6000FPGA
(estimation by R.J. Lim Fong, MS Thesis, VPI, 2004)
Area of an ASIC with equivalent functionality
6ECE 448 – FPGA and ASIC Design with VHDL
Source:I. Kuon, J. Rose,
University of Toronto
“Measuring the Gap Between
FPGAs and ASICs”
IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems,
vol. 62, no. 2, Feb 2007.
ASICs vs. FPGAs
7ECE 448 – FPGA and ASIC Design with VHDL
23 representative circuits implemented using
FPGAs and ASICs
- computer arithmetic (booth, cordic18, cordic8, etc.)
- digital signal processing (rs_encoder, fir3, fir24, etc.)
- communications (ethernet, mac1, atm, etc.)
- cryptography (des_area, des_perf, aes, aes192, etc.)
- scientific computations (molecular, raytracer, etc.)
ASICs vs. FPGAs
8ECE 448 – FPGA and ASIC Design with VHDL
9ECE 448 – FPGA and ASIC Design with VHDL
10ECE 448 – FPGA and ASIC Design with VHDL
11ECE 448 – FPGA and ASIC Design with VHDL
12ECE 448 – FPGA and ASIC Design with VHDL
Simplified ASIC Design Flow
SynthesisSynthesis
PlacementPlacement
Clock Tree SynthesisClock Tree Synthesis
RoutingRouting
FloorplanningFloorplanning
Timing AnalysisTiming Analysis
Design for ManufacturingDesign for Manufacturing
31
Front-End
Design
Back-End
Design
13ECE 448 – FPGA and ASIC Design with VHDL
Major ASIC Toolsets
Cadence
Magma
14ECE 448 – FPGA and ASIC Design with VHDL
Simplified ASIC Design Flow
SynthesisSynthesis
PlacementPlacement
Clock Tree SynthesisClock Tree Synthesis
RoutingRouting
FloorplanningFloorplanning
Timing AnalysisTiming Analysis
Design for ManufacturingDesign for Manufacturing
31
Front-End
Design
Back-End
Design
Synopsys
Tools
Design Compiler
Primetime
Astro
15ECE 448 – FPGA and ASIC Design with VHDL
A Complete Placed and Routed Chip
IP
28
16ECE 448 – FPGA and ASIC Design with VHDL
What is “Physical Layout”?
Physical Layout – Topography of devices and interconnects, made up of polygons that represent different layers of material (diffusion, polysilicon, metal, contact, etc)
NMOS
PMOS
OUT
VDD
GNDPhysical or Layout View
ININ OUT
PMOS
NMOS
Transistor or Device View
VDD
GND
17ECE 448 – FPGA and ASIC Design with VHDL
Layout or Mask (aerial) view
Silicon Substrate
Process of Device Fabrication
• Devices are fabricated vertically on a silicon substrate wafer by layering different materials in specific locations and shapes on top of each other
• Each of many process masks defines the shapes and locations of a specific layer of material (diffusion, polysilicon, metal, contact, etc)
• Mask shapes, derived from the layout view, are transformed to silicon via photolithographic and chemical processes
Wafer (cross-sectional) view
40
18ECE 448 – FPGA and ASIC Design with VHDL
Wafer Representation of Layout Polygons
Input
VDD
GND
Output
PMOS
NMOS
0.25 um
Aerial or Layout View Wafer Cross-sectional View
41
19ECE 448 – FPGA and ASIC Design with VHDL
Front-End Design Flow
20ECE 448 – FPGA and ASIC Design with VHDL
Simplified RTL Synthesis
Write RTL HDLCode
SimulateOK
Synthesize RTLCode to Gates
ConstraintsMet?
Gate LevelTesting
OK?
HDL
No
Yes
Gate LevelNetlist
No
Yes
No
Yes
Proceed withBackend
Processing
21ECE 448 – FPGA and ASIC Design with VHDL
VHDL vs. VerilogGovernment Developed
Commercially Developed
Ada based C based
Strongly Type Cast Mildly Type Cast
Difficult to learn Easy to Learn
More Powerful Less Powerful
22ECE 448 – FPGA and ASIC Design with VHDL
architecture MLU_DATAFLOW of MLU is
signal A1:STD_LOGIC;signal B1:STD_LOGIC;signal Y1:STD_LOGIC;signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;
beginA1<=A when (NEG_A='0') else
not A;B1<=B when (NEG_B='0') else
not B;Y<=Y1 when (NEG_Y='0') else
not Y1;
MUX_0<=A1 and B1;MUX_1<=A1 or B1;MUX_2<=A1 xor B1;MUX_3<=A1 xnor B1;
with (L1 & L0) selectY1<=MUX_0 when "00",
MUX_1 when "01",MUX_2 when "10",MUX_3 when others;
end MLU_DATAFLOW;
VHDL description Circuit netlist
Logic Synthesis
23ECE 448 – FPGA and ASIC Design with VHDL
Basic Synthesis Flow
24ECE 448 – FPGA and ASIC Design with VHDL
Synthesis using Design Compiler
25ECE 448 – FPGA and ASIC Design with VHDL
26ECE 448 – FPGA and ASIC Design with VHDL
27ECE 448 – FPGA and ASIC Design with VHDL
Script Language:TCL – Tool Command Language
• Created by John Ousterhout of UC Berkeley• Scripting Language
• Very simple to automate routine tasks.• Extension Language
• Used to customize tools with user/company specific aplications.
• Nearly all of modern EDA tools have a TCL interface.
• Very simple to learn and use.
28ECE 448 – FPGA and ASIC Design with VHDL
TCL References
• Practical Programming in Tcl and TK• Brent B. Welch• Ken Jones
• TCL/TK in a Nutshell• Paul Raines• Jeff Tranter
29ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (1)
designer = "Pawel Chodowiec"company = "George Mason University"search_path = "./opt3/synopsys/TSMCHOME/digital/Front_End/timing_power/tcb013ghp_200a "link_library = "* tcb013ghptc.db" /* Typical case library */target_library = "tcb013ghptc.db "symbol_library = "tcb013ghp.sdb "
/* Directory configuration */
src_directory = ~/exam1/vhdl/report_directory = ~/exam1/reports/db_directory = ~/exam1/db/
30ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (2)
/* Packages can be only read */
read_file -format vhdl -rtl src_directory + "components.vhd"
blocks = {regne, upcount, RAM_16Xn_DISTRIBUTED, exam1}
foreach (block, blocks) {block_source = src_directory + block + ".vhd"read_file -format vhdl -rtl block_sourceanalyze -format vhdl -lib WORK block_source}
current_design block/* All commands now apply to the entity "exam1" */
31ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (3)
uniquify/* Creates unique instances of multiple refrenced entities */
linkcheck_design/* Checks the current design for consistency */
/*******************************************//* apply block attributes and constraints *//*******************************************/create_clock -period 10 clk/* Defines that the port "clk" on the entity "clk" is the clock for the design. Period=10ns 50% duty cycleUse -waveform option to define duty cycle other than 50%*/
set_operating_conditions NCCOM/*Normal Case Commercial Operating Conditions*/
32ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (4)
/***************************************************//* Apply these constraints to the top-level entity*//***************************************************/
set_max_fanout 100 blockset_clock_latency 0.1 find(clock, "clk")set_clock_transition 0.01 find(clock, "clk")set_clock_uncertainty -setup 0.1 find(clock, "clk")set_clock_uncertainty -hold 0.1 find(clock, "clk")set_load 0 all_outputs()set_input_delay 1.0 -clock clk -max all_inputs()set_output_delay -max 1.0 -clock clk all_outputs()set_wire_load_model -library tcb013ghptc -name "TSMC8K_Fsg_Conservative"
33ECE 448 – FPGA and ASIC Design with VHDL
Wireload model basics (1)
34ECE 448 – FPGA and ASIC Design with VHDL
Wireload model basics (2)
35ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (5)
set_dont_touch block
compile -map_effort medium
change_names -rules vhdlvhdlout_architecture_name = "sort_syn"vhdlout_use_packages = {"IEEE.std_logic_1164"}write -f db -hierarchy -output db_directory + "exam1.db"/*write -f vhdl -hierarchy -output db_directory + "exam1_syn.vhd"*/report -area > report_directory + "exam1.report_area"report -timing -all > report_directory + "exam1.report_timing"
36ECE 448 – FPGA and ASIC Design with VHDL
Results of synthesis
37ECE 448 – FPGA and ASIC Design with VHDL
Area report after synthesis (1)
report_areaInformation: Updating design information... (UID-85) ****************************************Report : areaDesign : exam1Version: V-2003.12-SP1Date: Tue Nov 15 20:39:06 2005****************************************
Library(s) Used:
tcb013ghptc (File: /opt3/synopsys/TSMCHOME/digital/Front_End/timing_power/
tcb013ghp_200a/tcb013ghptc.db)
38ECE 448 – FPGA and ASIC Design with VHDL
Area report after synthesis (2)
Number of ports: 75Number of nets: 346Number of cells: 107Number of references: 28
Combinational area: 10593.477539Noncombinational area: 14295.521484Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 24888.976562Total area: undefined
39ECE 448 – FPGA and ASIC Design with VHDL
Critical Path (1)
• Critical Path – The Longest Path From Outputs of Registers to Inputs of Registers
D Qin
clk
D Qout
t logic
tCritical = tFF-P + tlogic + tFF-setup
40ECE 448 – FPGA and ASIC Design with VHDL
Critical Path (2)
• Min. Clock Period = Length of The Critical Path
• Max. Clock Frequency = 1 / Min. Clock Period
41ECE 448 – FPGA and ASIC Design with VHDL
n+m
n+m
42ECE 448 – FPGA and ASIC Design with VHDL
Clock Jitter
• Rising Edge of The Clock Does Not Occur Precisely Periodically• May cause faults in the circuit
clk
43ECE 448 – FPGA and ASIC Design with VHDL
Clock Skew
• Rising Edge of the Clock Does Not Arrive at Clock Inputs of All Flip-flops at The Same Time
D Qin
clk
D Qout
delay
D Qin
clk
D Qout
delay
44ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (1)
****************************************Report : timing -path full -delay max -max_paths 1Design : exam1Version: V-2003.12-SP1Date : Tue Nov 15 20:39:06 2005****************************************
Operating Conditions: NCCOM Library: tcb013ghptcWire Load Model Mode: segmented
45ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (2)
Startpoint: in_addr(1) (input port clocked by clk) Endpoint: RegSUM/Q_reg[34] (rising edge-triggered flip-flop clocked by clk) Path Group: clk Path Type: max
Des/Clust/Port Wire Load Model Library ----------------------------------------------------------------------------------- exam1 TSMC8K_Fsg_Conservative tcb013ghptc RAM_16Xn_DISTRIBUTED ZeroWireload tcb013ghptc exam1_DW01_cmp2_32_0 ZeroWireload tcb013ghptc exam1_DW01_cmp2_32_1 ZeroWireload tcb013ghptc exam1_DW01_add_35_0 ZeroWireload tcb013ghptc regne_1 ZeroWireload tcb013ghptc regne_2 ZeroWireload tcb013ghptc regne_n35 ZeroWireload tcb013ghptc
46ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (3)
Point Incr Path ------------------------------------------------------------------------------------------------ clock clk (rise edge) 0.00 0.00 clock network delay (ideal) 0.10 0.10 input external delay 1.00 1.10 f in_addr(1) (in) 0.00 1.10 f U98/Z (CKMUX2D1) 0.13 1.23 f Memory/ADDR[1] (RAM_16Xn_DISTRIBUTED) 0.00 1.23 f Memory/U41/ZN (INVD1) 0.08 1.31 r Memory/U343/Z (OR3D1) 0.10 1.41 r Memory/U338/ZN (INVD2) 0.20 1.61 f Memory/U40/ZN (MOAI22D0) 0.17 1.78 f Memory/U350/Z (OR4D1) 0.26 2.03 f Memory/DATA_OUT[0] (RAM_16Xn_DISTRIBUTED) 0.00 2.03 f
47ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (4)
add_96xplusxplus/B[0] (exam1_DW01_add_35_0) 0.00 2.03 f add_96xplusxplus/U9/Z (AN2D0) 0.12 2.15 f add_96xplusxplus/U1_1/CO (CMPE32D1) 0.10 2.25 f add_96xplusxplus/U1_2/CO (CMPE32D1) 0.10 2.34 f add_96xplusxplus/U1_3/CO (CMPE32D1) 0.10 2.44 f add_96xplusxplus/U1_4/CO (CMPE32D1) 0.10 2.54 f add_96xplusxplus/U1_5/CO (CMPE32D1) 0.10 2.63 f add_96xplusxplus/U1_6/CO (CMPE32D1) 0.10 2.73 f add_96xplusxplus/U1_7/CO (CMPE32D1) 0.10 2.82 f add_96xplusxplus/U1_8/CO (CMPE32D1) 0.10 2.92 f add_96xplusxplus/U1_9/CO (CMPE32D1) 0.10 3.02 f add_96xplusxplus/U1_10/CO (CMPE32D1) 0.10 3.11 f add_96xplusxplus/U1_11/CO (CMPE32D1) 0.10 3.21 f add_96xplusxplus/U1_12/CO (CMPE32D1) 0.10 3.31 f add_96xplusxplus/U1_13/CO (CMPE32D1) 0.10 3.40 f add_96xplusxplus/U1_14/CO (CMPE32D1) 0.10 3.50 f
48ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (5) add_96xplusxplus/U1_15/CO (CMPE32D1) 0.10 3.60 f add_96xplusxplus/U1_16/CO (CMPE32D1) 0.10 3.69 f add_96xplusxplus/U1_17/CO (CMPE32D1) 0.10 3.79 f add_96xplusxplus/U1_18/CO (CMPE32D1) 0.10 3.88 f add_96xplusxplus/U1_19/CO (CMPE32D1) 0.10 3.98 f add_96xplusxplus/U1_20/CO (CMPE32D1) 0.10 4.08 f add_96xplusxplus/U1_21/CO (CMPE32D1) 0.10 4.17 f add_96xplusxplus/U1_22/CO (CMPE32D1) 0.10 4.27 f add_96xplusxplus/U1_23/CO (CMPE32D1) 0.10 4.37 f add_96xplusxplus/U1_24/CO (CMPE32D1) 0.10 4.46 f add_96xplusxplus/U1_25/CO (CMPE32D1) 0.10 4.56 f add_96xplusxplus/U1_26/CO (CMPE32D1) 0.10 4.66 f add_96xplusxplus/U1_27/CO (CMPE32D1) 0.10 4.75 f add_96xplusxplus/U1_28/CO (CMPE32D1) 0.10 4.85 f add_96xplusxplus/U1_29/CO (CMPE32D1) 0.10 4.94 f add_96xplusxplus/U1_30/CO (CMPE32D1) 0.10 5.04 f add_96xplusxplus/U1_31/CO (CMPE32D1) 0.10 5.14 f
49ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (6)
add_96xplusxplus/U7/Z (AN2D0) 0.10 5.24 f add_96xplusxplus/U5/Z (AN2D0) 0.08 5.32 f add_96xplusxplus/U4/Z (CKXOR2D0) 0.15 5.47 f add_96xplusxplus/SUM[34] (exam1_DW01_add_35_0) 0.00 5.47 f RegSUM/R[34] (regne_n35) 0.00 5.47 f RegSUM/U32/Z (AO21D0) 0.11 5.57 f RegSUM/Q_reg[34]/D (EDFQD1) 0.00 5.57 f data arrival time 5.57
50ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (7)
clock clk (rise edge) 10.00 10.00 clock network delay (ideal) 0.10 10.10 clock uncertainty -0.10 10.00 RegSUM/Q_reg[34]/CP (EDFQD1) 0.00 10.00 r library setup time -0.12 9.88 data required time 9.88 ------------------------------------------------------------------------------------- data required time 9.88 data arrival time -5.57 ------------------------------------------------------------------------------------- slack (MET) 4.31
51ECE 448 – FPGA and ASIC Design with VHDL
Static Timing Analysis
52ECE 448 – FPGA and ASIC Design with VHDL
Static Timing Analysis Review
• Tools will calculate all paths from sequential start point to sequential end point.
• The worst case path will be used for Setup analysis, and the best case path will be used for hold analysis.
• All paths are considered for design rule checking
53ECE 448 – FPGA and ASIC Design with VHDL
Review of Setup and Hold Checks
54ECE 448 – FPGA and ASIC Design with VHDL
False and Multicycle paths
• False path• Very slow signals like reset, test mode enable,
that are not used under normal conditions are classified as false paths
• Multicycle path• Paths that take more than one clock cycle are
known as multicycle paths. • You have to define the multicylce paths in the
analyzer and the tool takes those constraints into account when synthesizing
55ECE 448 – FPGA and ASIC Design with VHDL
Multicycle path - Example
56ECE 448 – FPGA and ASIC Design with VHDL
Optimizationcriteria
57ECE 448 – FPGA and ASIC Design with VHDL
Degrees of freedom and possible trade-offs
speed area
power testability
58ECE 448 – FPGA and ASIC Design with VHDL
speed
area
latency
throughput
Degrees of freedom and possible trade-offs
59ECE 448 – FPGA and ASIC Design with VHDL
VHDL Coding
for Synthesis
60ECE 448 – FPGA and ASIC Design with VHDL
Recommended rules for Synthesis
• When implementing combinational paths do not use hierarchy
• Register all outputs• Do not implement glue logic between blocks,
partition them well• Separate designs on functional boundary• Keep block sizes to a reasonable size
61ECE 448 – FPGA and ASIC Design with VHDL
Avoid hierarchical combinational blocks
The path between reg1 and reg2 is divided between three different block
Due to hierarchical boundaries, optimization of the combinational logic cannot be achieved
Synthesis tools (Synopsys) maintain the integrity of the I/O ports, combinational optimization cannot be achieved between blocks (unless “grouping” is used).
Not recommended Design Practice
CombinatorialLogic1
CombinatorialLogic2
CombinatorialLogic3
Block A Block B Block C
reg1 reg2
62ECE 448 – FPGA and ASIC Design with VHDL
Recommend way to handle Combinational Paths
All the combinational circuitry is grouped in the same block that has its output connected the destination flip flop
It allows the optimal minimization of the combinational logic during synthesis
Allows simplified description of the timing interface
Recommended practice
CombinatorialLogic1 &
Logic2& Logic3
Block A Block C
reg1reg2
63ECE 448 – FPGA and ASIC Design with VHDL
Register all outputs
Simplifies the synthesis design environment: Inputs to the individual block arrive within the same relative delay (caused by wire delays)
Don’t really need to specify output requirements since paths starts at flip flop outputs.
Take care of fanouts, rule of thumb, keep the fanout to 16 (dependent on technology and components that are being driven by the output)
Register all outputs
Block X Block Y
reg1reg2
Block Y
reg3
64ECE 448 – FPGA and ASIC Design with VHDL
NO GLUE LOGIC between blocks
No Glue Logic between Blocks, nomatter what the temptation
Block X
reg1
Block Y
reg3
Top
Due to time pressures, and a bug found that can be simply be fixed by adding some simple glue logic. RESIST THE TEMPTATION!!!
At this level in the hierarchy, this implementation will not allow the glue logic to be absorbed within any lower level block.
65ECE 448 – FPGA and ASIC Design with VHDL
Separate design with different goals
reg1
Slow Logic
Top
Timecritical path
reg3
reg1 may be driven by time critical function, hence will have different optimization constraints
reg3 may be driven by slow logic, hence no need to constrain it for speed
66ECE 448 – FPGA and ASIC Design with VHDL
Optimization based on design requirements
reg1
Slow Logic
Top
Timecritical path
reg3
Area optimized block
Speed optimized block• Use different entities
to partition design blocks
• Allows different constraints during synthesis to optimize for area or speed or both.
67ECE 448 – FPGA and ASIC Design with VHDL
Separate FSM with random logic
• Separation of the FSM and the random logic allows you to use FSM optimized synthesis
reg1
RandomLogic
Top
FSM
reg3
Standard optimizationtechniques used
Use FSM optimization tool
68ECE 448 – FPGA and ASIC Design with VHDL
Maintain a reasonable block size
• Partition your design such that each block is between 1000-10000 gates (this is strictly tools and technology dependent)
• Larger the blocks, longer the run time -> quick iterations cannot be done.