1
이 강좌는 C & S Technology 사의 지원으로
제작되었으며 copyright가 없으므로
비영리적인 목적에 한하여 누구든지 복사,
배포가 가능합니다. 연구실 홈페이지에는
고성능 마이크로프로세서에 관련된 많은
강좌가 있으며 누구나 무료로 다운로드 받을
수 있습니다.
TopTop--down Designdown DesignMethodologyMethodology
2002. 12.2002. 12.
연세대학교연세대학교 전기전자공학과전기전자공학과
프로세서프로세서 연구실연구실
박사과정박사과정 정정 우우 경경EE--mail: mail: [email protected]@yonsei.ac.kr
Homepage: http://Homepage: http://mpu.yonsei.ac.krmpu.yonsei.ac.kr
전화전화: 02: 02--21232123--28722872
2002. 12.2002. 12.연세대학교연세대학교 전기전자공학과전기전자공학과
프로세서프로세서 연구실연구실박사과정박사과정 정정 우우 경경
EE--mail: mail: [email protected]@yonsei.ac.kr
ReferencesReferences
[1] [1] Advanced ASIC Chip SynthesisAdvanced ASIC Chip Synthesis, , Using Using SynopsysSynopsys Design Compiler and Design Compiler and PrimeTimePrimeTime, , HimanshuHimanshu BhatnagarBhatnagar, , KluwerKluwer Academic Academic Publishers, 1999Publishers, 1999
[2] [2] The The VerilogVerilog Hardware Description Hardware Description LanguageLanguage, Donald E. Thomas, Philip , Donald E. Thomas, Philip MoorbyMoorby, , KluwerKluwer Academic Publishers, 1991Academic Publishers, 1991
4
[3] [3] HDL Chip Design, A Practical Guide for HDL Chip Design, A Practical Guide for Designing, Synthesizing and Simulating Designing, Synthesizing and Simulating ASICsASICs and and FPGAsFPGAs using VHDL or using VHDL or VerilogVerilog, ,
Douglas J. Smith, Douglas J. Smith, DooneDoone Publications, 1996Publications, 1996
[4] [4] Design Compiler User GuideDesign Compiler User Guide, , SynopsysSynopsys
[5] [5] PrimeTimePrimeTime User GuideUser Guide, , SynopsysSynopsys
[6] [6] Chip Synthesis Workshop, Instructor Chip Synthesis Workshop, Instructor GuideGuide, , SynopsysSynopsys
5
Advances in SemiconductorAdvances in Semiconductor
MooreMoore’’s Law: Number of transistors doubles s Law: Number of transistors doubles every 18 months.every 18 months.–– UltraSPARCIIIUltraSPARCIII: 87.5M : 87.5M trtr, Pentium4: 55M , Pentium4: 55M trtr, ,
Itanium2: 221M Itanium2: 221M trtrTopTop--down Design Methodologydown Design Methodology–– Short time to marketShort time to market–– Reduced NRE costReduced NRE cost–– Design reuseDesign reuse–– Increased flexibilityIncreased flexibility–– Alternative technology librariesAlternative technology libraries–– Alternative architecturesAlternative architectures
6
2
Design MethodologyDesign Methodology
BottomBottom--upup–– Full customFull custom–– Small area, high Small area, high
performanceperformanceTopTop--downdown–– HDL based designHDL based design–– Synthesis using Synthesis using
automatic CAD toolsautomatic CAD tools–– Easy development Easy development
and verificationand verification Transistor
Gate
RTL
Architecture
Algorithm
Systemconcept
IncreasingBehavioralabstraction
IncreasingDetailed
Realization &Complexity
7
TopTop--down Design Methodologydown Design Methodology
System
PCB1
PCB2
PCB3
uP ROM
RAM ASIC
Peri FPGA
Board Chip
ARTLcode
BRTLcode
Gates
RTLsynthesis
Layout
Layoutsynthesis
8
Design Automation ToolsDesign Automation Tools
HDL simulation: HDL simulation: –– VerilogVerilog--XL, NCXL, NC--VerilogVerilog, NC, NC--VHDL(CadenceVHDL(Cadence), VSS, ), VSS,
VSC(SynopsysVSC(Synopsys), Model ), Model Sim(MentoSim(Mento))Synthesis:Synthesis:–– Design Design Compiler(SynopsysCompiler(Synopsys), Build ), Build Gates(CadenceGates(Cadence), ),
Leonardo(MentoLeonardo(Mento))Verification:Verification:–– Prime Time, Prime Time, EPIC(SynopsysEPIC(Synopsys), ), Calibre(MentoCalibre(Mento), Star), Star--SimSim, ,
Hercules(AvantiHercules(Avanti), Diva, ), Diva, Dracula(CadenceDracula(Cadence))Layout:Layout:–– Apollo(AvantiApollo(Avanti), Silicon Ensemble, ), Silicon Ensemble, Virtuoso(CadenceVirtuoso(Cadence), IC), IC--
Station(MentoStation(Mento))9
Design FlowDesign Flow
BehavioralHDL Model
RTL HDL Model
Gate LevelNetlist
AlgorithmVerification
NC-Verilog
FunctionalVerification
NC-Verilog
SynthesisDesign Compiler
Dynamic/Static Timing VerificationNC-Verilog,Prime Time
Post-LayoutTiming VerificationNC-Verilog,Prime Time
Layout
Place & Route
Apollo
Fabrication 10
HDL (Hardware Description Language)HDL (Hardware Description Language)
Description aspectsDescription aspects–– Abstract behavior modelingAbstract behavior modeling–– Hardware structure modelingHardware structure modeling
VHDLVHDL–– 1980 USA Department of Defense1980 USA Department of Defense–– 1987 IEEE Standard 10761987 IEEE Standard 1076
VerilogVerilog–– 1981 Gateway Design Automation1981 Gateway Design Automation–– 1995 IEEE Standard 13641995 IEEE Standard 1364
11
HDL DescriptionHDL Description
Behavioral modelBehavioral model-- abstraction of working, little abstraction of working, little regard to implementation, similar to a programming regard to implementation, similar to a programming languagelanguageStructural modelStructural model-- describe consisting modules and describe consisting modules and interconnections, hierarchical designinterconnections, hierarchical designRTL (Register Transfer Level) model RTL (Register Transfer Level) model –– specify specify registers which store data and interconnect them registers which store data and interconnect them through logic equationsthrough logic equations–– Register Register =>=> Combinational Logic Combinational Logic =>=> RegisterRegister–– Reveal hardware structureReveal hardware structure–– SynthesizableSynthesizable
12
3
HDL GuidelinesHDL Guidelines
Technology independenceTechnology independenceClock logicClock logic–– Clock logic (clock gating logic, reset generation) should be Clock logic (clock gating logic, reset generation) should be
kept in one block.kept in one block.–– Avoid multiple clocks per blockAvoid multiple clocks per block–– Meaningful names for clocksMeaningful names for clocks–– For DFT scan, clocks be controlled from primary inputs.For DFT scan, clocks be controlled from primary inputs.
No glue logic at the topNo glue logic at the topModule name same as file nameModule name same as file namePads separate from core logicPads separate from core logicMinimize unnecessary hierarchyMinimize unnecessary hierarchyRegister all outputsRegister all outputs
13
Memory Element InferenceMemory Element Inference
Incomplete sensitivity lists: simulation Incomplete sensitivity lists: simulation missmatchmissmatch or infer a latchor infer a latchLatch Latch vsvs flipflip--flopflop–– Latch: levelLatch: level--sensitive, small area, more sensitive, small area, more
troublesometroublesomealways @(enable)always @(enable)
–– FlipFlip--flop: edgeflop: edge--sensitivesensitiveSynchronous resetSynchronous reset
always @(always @(posedgeposedge clkclk))Asynchronous resetAsynchronous reset
always @(always @(posedgeposedge clkclk or or negedgenegedge reset)reset)14
Synthesis of if statementsSynthesis of if statements
if without else if without else infer a latchinfer a latchifif--else implies else implies multiplxermultiplxerifif--elseifelseif imply imply prioritypriority
Priority Logic
int0
int1
int2
int3
int0_active
int1_active
int2_active
int3_active
15
Synthesis of case statementSynthesis of case statement
SynopsysSynopsys synthesis directive: synthesis directive: parallel_caseparallel_case, , full_casefull_case –– remove priorityremove priority
always @(int0 or int1 or int2 or int3)always @(int0 or int1 or int2 or int3)beginbegin
case (1case (1’’b1) b1) // // synopsyssynopsys full_casefull_caseint0:int0: int0_active = 1int0_active = 1’’b1;b1;int1:int1: int1_active = 1int1_active = 1’’b1;b1;int2:int2: int2_active = 1int2_active = 1’’b1;b1;int3:int3: int3_active = 1int3_active = 1’’b1;b1;
endcaseendcaseendend 16
Procedural AssignmentProcedural Assignment
Blocking versus NonBlocking versus Non--blockingblocking–– Blocking assignment(Blocking assignment(==): order dependent, may ): order dependent, may
cause simulation cause simulation missmatchmissmatch–– NonNon--blocking assignment(blocking assignment(<=<=) : order independent, ) : order independent,
same operations with synthesis resultssame operations with synthesis results
always @(always @(posedgeposedge clkclk))beginbegin
firstRegfirstReg <=<= data;data;secondRegsecondReg <=<= firstRegfirstReg;;thirdRegthirdReg <=<= secondRegsecondReg;;
endend 17
HDL VerificationHDL Verification
HDL Test Bench
Modelunder test
Waveformgeneration
Compareresults
Reference vectors
Stimulusvectors
Outputvectors
Testvectors
file
Resultsfile
Pass/failindication
Dynamic functional test 18
4
Test BenchTest Bench
ObjectiveObjective–– instantiate hardware model under testinstantiate hardware model under test–– generate stimulus waveforms and applygenerate stimulus waveforms and apply–– generate waveforms of reference vectors and comparegenerate waveforms of reference vectors and compare–– automatically provide pass/fail indicationautomatically provide pass/fail indication
Writing in the same HDL as the hardware modelWriting in the same HDL as the hardware model–– no need to learn a special toolno need to learn a special tool–– transportable across different design toolstransportable across different design tools–– wide variety in coding test benchwide variety in coding test bench–– also used for functional verifications of synthesis resultsalso used for functional verifications of synthesis results
Waveform Viewer: Waveform Viewer: SignalscanSignalscan–– $$shm_openshm_open, $, $shm_proveshm_prove
19
SignalscanSignalscan
20
SynthesisSynthesis
Convert RTL level HDL models into gate Convert RTL level HDL models into gate level level netlistsnetlists–– Translation + Optimization + MappingTranslation + Optimization + Mapping
Utilize standard cell librariesUtilize standard cell librariesPhysical macro cells: PLL, memoryPhysical macro cells: PLL, memorySynopsysSynopsys Design CompilerDesign Compiler
21
Design Analyzer & Design Design Analyzer & Design CompilerCompiler
Menu-DrivenInterface
Command LineInterface
DesignCompiler
Design Analyzer
dc_shell
(New user,debug)
(Experienceduser)
22
Initial SetupInitial Setup
..synopsys_dc.setupsynopsys_dc.setup–– Path informationPath information–– Specify libraries: target library, link Specify libraries: target library, link
library, symbol librarylibrary, symbol library–– Conditions: worst case condition (Conditions: worst case condition (--10% 10%
VDD, worst case process, 85~125VDD, worst case process, 85~125℃℃))–– Naming ruleNaming rule–– AliasesAliases
23
Technology LibraryTechnology Library
A set of primitive cellsA set of primitive cells–– Timing and electrical characteristicsTiming and electrical characteristics–– Net delay and net Net delay and net parasticparastic informationinformation–– Definition of capacitance, time, resistance unitsDefinition of capacitance, time, resistance units
ProviedProvied by silicon vendorby silicon vendorSynopsysSynopsys settingsetting–– target_librarytarget_library: cells to be mapped (.db): cells to be mapped (.db)–– link_librarylink_library: instanced cells, wire load or operating : instanced cells, wire load or operating
condition models (.db)condition models (.db)–– symbol_librarysymbol_library: symbols for GUI : symbols for GUI chematicchematic viewer viewer
(.(.sdbsdb))24
5
Synthesizing a DesignSynthesizing a Design
1.1. Bring in the designBring in the design–– Translate using readTranslate using read
2.2. Constrain the designConstrain the design–– Timing, area, environmentalTiming, area, environmental
3.3. Synthesize the designSynthesize the design–– Optimize and map to gates with compileOptimize and map to gates with compile
4.4. Inspect the designInspect the design–– View synthesized schematicView synthesized schematic–– Area, timing, and constraint reportsArea, timing, and constraint reports
5.5. Save the designSave the design–– write the write the netlistnetlist to a fileto a file
25
DC Shell ScriptDC Shell Script
A command file for Design Compiler that can A command file for Design Compiler that can be run iteratively or in batch modebe run iteratively or in batch mode
Contains:Contains:–– Setup information (.Setup information (.synopsys_dc.setupsynopsys_dc.setup))
–– Attribute and constraint informationAttribute and constraint information
–– Synthesis commands (read, compile, write,..)Synthesis commands (read, compile, write,..)
–– Control flow commandsControl flow commands
26
Constraining the DesignConstraining the Design
Area GoalArea Goal–– set_max_areaset_max_area: Specify area target for : Specify area target for
current_designcurrent_design
Timing GoalTiming Goal–– Define constraints for all pathsDefine constraints for all paths
All input pathsAll input pathsInternal pathsInternal pathsAll output pathsAll output paths
27
Defining a ClockDefining a Clock
ClockClock–– Source (port or pin)Source (port or pin)–– PeriodPeriod–– Duty cyclesDuty cycles–– Offset/skewOffset/skew
Creating a clock constraints timing paths Creating a clock constraints timing paths between registersbetween registersPreserve clock tree: Preserve clock tree: set_dont_touch_networkset_dont_touch_network
28
Constraining Timing PathsConstraining Timing Paths
FF1Q
QB
DFF2
Q
QB
DFF3
Q
QB
DFF4
Q
QB
DXNM S T
TO_BE_SYNTHESIZED
clk
set_input_delay
create_clock(period) (period) (period)
create_clock create_clock
set_output_delay
29
Environmental AttributesEnvironmental Attributes
FF2Q
QB
DFF3
Q
QB
DXN S
TO_BE_SYNTHESIZED
CLK
set_operating_conditionsDefines operating conditions
for current design set_loadSets load valueon ports andnets
set_driving_cellModels a library
cell drivinginput ports
set_wire_loadSets wire load modelfor current design 30
6
Operating ConditionsOperating Conditions
Temperature
Delaybest
norminalworst
Voltage
Delaybestnorminal
worst Process
Delaybest
norminalworst
31
Design Rule ConstraintsDesign Rule Constraints
Maximum transition timeMaximum transition time–– set_max_transitionset_max_transition–– Ports or designsPorts or designs
Maximum Maximum fanoutfanout–– set_max_fanoutset_max_fanout–– Input ports or designsInput ports or designs
Maximum capacitanceMaximum capacitance–– set_max_capacitanceset_max_capacitance
Minimum capacitanceMinimum capacitance–– set_min_capacitanceset_min_capacitance
32
Report outReport out
AreaArea–– report_areareport_area: hardware area in equivalent gate : hardware area in equivalent gate
number (2number (2--input NAND gate)input NAND gate)
TimingTiming–– report_timingreport_timing: report path with the worst slack: report path with the worst slack
PowerPower–– report_powerreport_power: estimated power consumption: estimated power consumption
ConstraintsConstraints–– report_constraintreport_constraint: displays constraint violators: displays constraint violators
33
allsum.scrallsum.scr::active_designactive_design = = allsumallsumread read ––format format verilogverilog active_designactive_design + + ““.v.v””current_designcurrent_design active_designactive_designlinklinkcheck_designcheck_designset_wire_load_modelset_wire_load_model ““enclosedenclosed””set_operating_conditionsset_operating_conditions ““V270WTP0850V270WTP0850”” ––library library ““std90std90””
create_clockcreate_clock ––name name clkclk ––period 2 period 2 ––waveform {0 1} waveform {0 1} find(portfind(port, , ““clkclk””))
set_dont_touch_networkset_dont_touch_network {{clkclk resetbresetb}}set_clock_skewset_clock_skew ––plus_uncertaintyplus_uncertainty 0.2 0.2 ––minus_uncertaintyminus_uncertainty 0.2 0.2
clkclkset_fix_holdset_fix_hold find (clock, find (clock, ““clkclk””))set_input_delayset_input_delay 0.5 0.5 ––clock clock clkclk ––max max all_inputsall_inputs()()set_output_delayset_output_delay 0.5 0.5 ––clock clock clkclk ––max max all_outputsall_outputs()()set_max_areaset_max_area 00
set_max_fanoutset_max_fanout 1 1 all_inputsall_inputs()()set_max_transitionset_max_transition 3 3 current_designcurrent_design 34
set_driveset_drive 1 1 all_inputsall_inputs()()set_driveset_drive 0 {0 {clkclk resetbresetb}}set_fix_multiple_port_netsset_fix_multiple_port_nets ––feedthroughsfeedthroughs ––constantsconstants
compile compile ––map mediummap medium
remove_unconnected_portsremove_unconnected_ports find(find(--hierarchy cell, hierarchy cell, ““**””))change_nameschange_names ––h rules h rules sec_verilogsec_verilogset_dont_touchset_dont_touch current_designcurrent_design
report_constraintreport_constraint ––all_violatorsall_violators ––verbose > verbose > active_designactive_design + + ““.cons.cons””
report_timingreport_timing > > active_designactive_design + + ““.time.time””report_areareport_area > > active_designactive_design + + ““.area.area””report_powerreport_power > > active_designactive_design + + ““..powpow””
write write ––format db format db ––hierarchy hierarchy ––output output active_designactive_design + + ““.db.db””write write ––format format verilogverilog ––hierarchy hierarchy ––output output active_designactive_design + + ““..vnetvnet””quitquit
dc_shell –f allsum.scr > allsum.log35
********************************************************************************Report : areaReport : areaDesign : Design : allsumallsumVersion: 2000.05Version: 2000.05Date : Tue Nov 26 15:28:58 2002Date : Tue Nov 26 15:28:58 2002********************************************************************************
Library(sLibrary(s) Used:) Used:std90 (File: std90 (File: /user3/samsung_design_kit/secstd90_synopsys/syn/STD90/std9/user3/samsung_design_kit/secstd90_synopsys/syn/STD90/std90.db)0.db)
Number of ports: Number of ports: 1515Number of nets: Number of nets: 2626Number of cells: Number of cells: 99Number of references: Number of references: 44
Combinational area: Combinational area: 104.666656104.666656NoncombinationalNoncombinational area: area: 65.00003165.000031Net Interconnect area: Net Interconnect area: 7.268500 7.268500
Total cell area: Total cell area: 169.666687169.666687Total area: Total area: 176.935181176.935181 36
7
********************************************************************************Report : timingReport : timing
--path fullpath full--delay maxdelay max--max_pathsmax_paths 11
Design : Design : allsumallsumVersion: 2000.05Version: 2000.05Date : Tue Nov 26 15:28:58 2002Date : Tue Nov 26 15:28:58 2002********************************************************************************Operating Conditions: V270WTP0850 Library: std90Operating Conditions: V270WTP0850 Library: std90Wire Load Model Mode: enclosedWire Load Model Mode: enclosedStartpointStartpoint: U2/a_reg_1A: U2/a_reg_1A
(rising edge(rising edge--triggered fliptriggered flip--flop clocked by flop clocked by clkclk))Endpoint: sumout_reg_5AEndpoint: sumout_reg_5A
(rising edge(rising edge--triggered fliptriggered flip--flop clocked by flop clocked by clkclk))Path Group: Path Group: clkclkPath Type: maxPath Type: max
Des/Des/ClustClust/Port Wire Load Model /Port Wire Load Model LibraryLibrary------------------------------------------------------------------------------------------------------------------------allsumallsum std90_5000_t std90_5000_t std90std90adder4 adder4 std90_5000_t std90_5000_t std90std90 37
Point Point IncrIncr PathPath------------------------------------------------------------------------------------------------------------------------------------clock clock clkclk (rise edge) (rise edge) 0.00 0.00 0.000.00clock network delay (ideal) 0.00 clock network delay (ideal) 0.00 0.000.00U2/a_reg_1A/CK (fd2qd2) U2/a_reg_1A/CK (fd2qd2) 0.00 0.00 0.000.00 rrU2/a_reg_1A/Q (fd2qd2) U2/a_reg_1A/Q (fd2qd2) 0.85 0.85 0.850.85 rrU2/a[1] (U2/a[1] (allsum_cntallsum_cnt) ) 0.00 0.00 0.85 r0.85 r
....................U1/c[5] (adder4) U1/c[5] (adder4) 0.00 0.00 2.50 r2.50 rsumout_reg_5A/D (fd2q) sumout_reg_5A/D (fd2q) 0.00 0.00 2.50 r2.50 rdata arrival time data arrival time 2.502.50
clock clock clkclk (rise edge) (rise edge) 2.00 2.00 2.002.00clock network delay (ideal) clock network delay (ideal) 0.00 0.00 2.002.00clock uncertainty clock uncertainty --0.20 0.20 1.801.80sumout_reg_5A/CK (fd2q) sumout_reg_5A/CK (fd2q) 0.00 0.00 1.80 r1.80 rlibrary setup time library setup time --0.43 0.43 1.371.37data required time data required time 1.371.37--------------------------------------------------------------------------------------------------------------------------------------data required time data required time 1.371.37data arrival time data arrival time --2.502.50--------------------------------------------------------------------------------------------------------------------------------------slack (VIOLATED) slack (VIOLATED) --1.121.12
38
********************************************************************************Report : powerReport : power --analysis_effortanalysis_effort lowlowDesign : Design : allsumallsum Version: 2000.05Version: 2000.05Date : Tue Nov 26 15:28:59 2002Date : Tue Nov 26 15:28:59 2002********************************************************************************Library(sLibrary(s) Used: std90 (File: ) Used: std90 (File:
/user3/samsung_design_kit/secstd90_synopsys/syn/STD90/std90.db)/user3/samsung_design_kit/secstd90_synopsys/syn/STD90/std90.db)Operating Conditions: V270WTP0850 Library: std90Operating Conditions: V270WTP0850 Library: std90Wire Load Model Mode: enclosedWire Load Model Mode: enclosedGlobal Operating Voltage = 2.7 Global Operating Voltage = 2.7 PowerPower--specific unit information :specific unit information :
Voltage Units = 1VVoltage Units = 1VCapacitance Units = 1.000000pfCapacitance Units = 1.000000pfTime Units = 1nsTime Units = 1nsDynamic Power Units = 1mW (derived from V,C,T units)Dynamic Power Units = 1mW (derived from V,C,T units)Leakage Power Units = 1mWLeakage Power Units = 1mW
Cell Internal Power = 0.0000 Cell Internal Power = 0.0000 mWmW (0%)(0%)Net Switching Power = 2.8393 Net Switching Power = 2.8393 mWmW (100%)(100%)
------------------Total Dynamic Power = 2.8393 Total Dynamic Power = 2.8393 mWmW (100%)(100%)Cell Leakage Power = 0.0000 Cell Leakage Power = 0.0000 mWmW
39
Synthesis ResultsSynthesis Results
40
PartitioningPartitioning
ObjectivesObjectives–– Separate distinct functionsSeparate distinct functions–– Achieve workable size and complexityAchieve workable size and complexity–– Manage project in team environmentManage project in team environment–– Design reuseDesign reuse
AdvantagesAdvantages–– Better results: smaller and fasterBetter results: smaller and faster–– Easier synthesis: simplified constraints and scriptsEasier synthesis: simplified constraints and scripts–– Faster compiles: quicker turnaroundFaster compiles: quicker turnaround
41
Group and UngroupGroup and Ungroup
GroupGroup–– creates a new hierarchical blockcreates a new hierarchical block
UngroupUngroup–– Remove unnecessary hierarchiesRemove unnecessary hierarchies–– Logic optimization: cannot cross block boundariesLogic optimization: cannot cross block boundaries–– Combinational logic cannot be mergedCombinational logic cannot be merged
No combinational path crossing hierarchy No combinational path crossing hierarchy boundariesboundaries
42
8
Partitioning StrategiesPartitioning Strategies
No hierarchy in combinational pathsNo hierarchy in combinational pathsPlace hierarchy boundaries at register outputsPlace hierarchy boundaries at register outputsLimit block size for reasonable runtimes (20K~100K Limit block size for reasonable runtimes (20K~100K gates)gates)Related combinational logic in the same moduleRelated combinational logic in the same modulePartition for design reusePartition for design reuseSeparate structural logic from random logicSeparate structural logic from random logicSeparate core logic, pads, clocks, and JTAGSeparate core logic, pads, clocks, and JTAGRemove glue logicRemove glue logicIsolate stateIsolate state--machine from other logicmachine from other logicThink of layout styleThink of layout style
43
Compile a Hierarchical DesignCompile a Hierarchical Design
TopTop--down hierarchical compile: small designsdown hierarchical compile: small designs–– Only top level constraintsOnly top level constraints–– Optimization across entire designOptimization across entire design–– Long compile times (memory intensive)Long compile times (memory intensive)–– Changes to subChanges to sub--blocks require complete reblocks require complete re--synthesissynthesis–– Does not perform well for multiple clocksDoes not perform well for multiple clocks
TimeTime--budgeting compile (Bottombudgeting compile (Bottom--up): up): medium~largemedium~large–– Specify timing requirements for each blockSpecify timing requirements for each block–– Easier to manageEasier to manage–– Changes to subChanges to sub--blocks do not require reblocks do not require re--synthesissynthesis–– Does not suffer from design styleDoes not suffer from design style–– Good quality results in generalGood quality results in general–– Tedious to update and maintain multiple scriptsTedious to update and maintain multiple scripts–– Critical paths at topCritical paths at top--level are not critical at lower levellevel are not critical at lower level
44
Multiple InstancesMultiple Instances
Resolve to optimizeResolve to optimize–– uniquifyuniquify: creates unique definitions of multiple : creates unique definitions of multiple
instances, map each instance to specific instances, map each instance to specific environmentenvironment
–– compile + compile + dont_touchdont_touch: prevents modification of : prevents modification of design object, identical copy of design in N placesdesign object, identical copy of design in N places
UniquifyUniquify is recommendedis recommended–– Better optimization resultsBetter optimization results–– Clock tree insertionClock tree insertion
45
Dynamic timing simulationDynamic timing simulation
SDF: Standard Delay FormatSDF: Standard Delay Format–– Timing information of each cell in the designTiming information of each cell in the design–– Provide timing information for simulating gateProvide timing information for simulating gate--
level level netlistnetlist–– Used for preUsed for pre--layout, postlayout, post--layout simulationlayout simulation
VerilogVerilog netlistnetlist + SDF: Dynamic timing + SDF: Dynamic timing simulationsimulation–– Use Use VerilogVerilog simulation toolssimulation tools–– Use the same test vectors for functional testUse the same test vectors for functional test
46
SDF FileSDF File
Timing dataTiming data–– IOPATH delayIOPATH delay–– INTERCONNECT delayINTERCONNECT delay–– SETUP timing checkSETUP timing check–– HOLD timing checkHOLD timing check
Generating preGenerating pre--layout SDF filelayout SDF file–– Approximate postApproximate post--route clock tree: clock delay, route clock tree: clock delay,
skew, transition timeskew, transition time
write_timingwrite_timing ––format sdfformat sdf--v2.1 v2.1 ––output <filename>output <filename>47
SDF Generation ExampleSDF Generation Example
allsum_sdf.scrallsum_sdf.scr::active_designactive_design = = allsumallsumread read ““db/db/”” + + active_designactive_design + + ““.db.db””current_designcurrent_design active_designactive_designlinklink
write_timingwrite_timing ––format sdfformat sdf--v2.1 v2.1 ––output output active_designactive_design + + ““..sdfsdf””quitquit
dc_shelldc_shell ––f f allsum_sdf.scrallsum_sdf.scr=>=> allsum.sdfallsum.sdf is generatedis generated
48
9
Timing simulation ExampleTiming simulation Example
Modify Modify VeriogVeriog test benchtest bench–– `include `include ““std90.vstd90.v”” -- library simulation filelibrary simulation file
–– `include `include ““allsum.vnetallsum.vnet”” -- synthesized synthesized verilogverilog netlistnetlist
–– $$sdf_annotate(sdf_annotate(““allsum.sdfallsum.sdf””, U0, , , , U0, , , ““MAXIMUMMAXIMUM””, , , , ““FROM_MAXIMUMFROM_MAXIMUM””););
Execute simulations as Execute simulations as VerilogVerilog RTL functional RTL functional simulationssimulations
49
Timing SimulationTiming Simulation
50
Static timing analysisStatic timing analysis
Analyze gateAnalyze gate--level designs using dynamic simulationlevel designs using dynamic simulation–– Use input vectors and logic simulatorUse input vectors and logic simulator–– No false path, brad support for design stylesNo false path, brad support for design styles–– Long run times: bottleneck for large complex designLong run times: bottleneck for large complex design–– Relies on quality and coverage of test benchRelies on quality and coverage of test bench
Static timing analysisStatic timing analysis–– Exhaustive method of analyzing, debugging, validating the Exhaustive method of analyzing, debugging, validating the
performance of designperformance of design–– Identification of critical pathsIdentification of critical paths–– Infinitely fast compared to dynamic simulationInfinitely fast compared to dynamic simulation–– Verifies all parts of gateVerifies all parts of gate--level design for timinglevel design for timing–– False paths induce violationsFalse paths induce violations
51
PrimeTimePrimeTime
SynopsysSynopsys standstand--alone full chip analyzer for gatealone full chip analyzer for gate--level level static timingstatic timing–– Analyze timing of modules in the context of full chipAnalyze timing of modules in the context of full chip–– Identify Identify intermoduleintermodule timing problemstiming problems–– Analyze entire chip, including nonAnalyze entire chip, including non--synthesized blockssynthesized blocks–– Create blockCreate block--level constraints for level constraints for reoptimizationreoptimization
InterfaceInterface–– primetime: GUI interface, showing details of a single timing primetime: GUI interface, showing details of a single timing
pathpath–– pt_shellpt_shell: command: command--line interface, for scripts and batch modeline interface, for scripts and batch mode
52
PT shell scriptPT shell script
PrimeTimePrimeTime flowflow–– Read in and link design and librariesRead in and link design and libraries–– Specify attributes, environment, constraints, timing Specify attributes, environment, constraints, timing
exceptionsexceptions–– Perform analysis: Perform analysis: check_timingcheck_timing, reports, visual analysis, reports, visual analysis–– Characterize context and write script for Design Compiler, Characterize context and write script for Design Compiler,
perform mode analysis and case analysis (optional)perform mode analysis and case analysis (optional)..synopsys_pt.setupsynopsys_pt.setup: Prime time initial setup file: Prime time initial setup filept_shellpt_shell script: Script file for static time analysisscript: Script file for static time analysisTrnascriptTrnascript: automatically convert a : automatically convert a dc_shelldc_shell script script file into a file into a pt_shellpt_shell script filescript file
53
allsum_sta.scrallsum_sta.scr::read_verilogread_verilog allsum.vnetallsum.vnetcurrent_designcurrent_design allsumallsumlink_designlink_design allsumallsumset_wire_load_modeset_wire_load_mode enclosedenclosedset_operating_conditionsset_operating_conditions ––library {std90} library {std90} ––min V360BTP0000 min V360BTP0000 ––max max
V270WTP0850V270WTP0850create_clockcreate_clock ––period 2 period 2 ––waveform {0 1} {waveform {0 1} {clkclk}}set_clock_uncertaintyset_clock_uncertainty ––setup setup --0.2 0.2 clkclkset_clock_uncertaintyset_clock_uncertainty ––hold 0.2 hold 0.2 clkclkset_input_delayset_input_delay ––clock clock clkclk ––max 0.5 [max 0.5 [all_inputsall_inputs]]set_output_delayset_output_delay ––clock clock clkclk ––max 0.5 [max 0.5 [all_outputsall_outputs]]set_max_fanoutset_max_fanout 1 [1 [all_inputsall_inputs]]set_max_transitionset_max_transition 3 [3 [current_designcurrent_design]]set_driveset_drive 1 [1 [all_inputsall_inputs]]set_driveset_drive 0 [list 0 [list clkclk resetbresetb]]report_timingreport_timing > > allsum.timeallsum.timereport_constraintreport_constraint ––all_violatorsall_violators ––verbose > verbose > allsum.consallsum.consquitquit 54
10
********************************************************************************Report : timingReport : timing
--path fullpath full--delay max delay max --max_pathsmax_paths 11
Design : Design : allsumallsumVersion: 1999.10Version: 1999.10--44Date : Wed Nov 27 18:04:52 2002Date : Wed Nov 27 18:04:52 2002********************************************************************************
StartpointStartpoint: U2/a_reg_1A (rising edge: U2/a_reg_1A (rising edge--triggered fliptriggered flip--flop clocked by flop clocked by clkclk))
Endpoint: sumout_reg_5A (rising edgeEndpoint: sumout_reg_5A (rising edge--triggered fliptriggered flip--flop clocked by flop clocked by clkclk))
Path Group: Path Group: clkclkPath Type: maxPath Type: maxPoint Point IncrIncr PathPath------------------------------------------------------------------------------------------------------------------------------clock clock clkclk (rise edge) (rise edge) 0.00 0.00 0.000.00clock network delay (ideal) clock network delay (ideal) 0.00 0.00 0.000.00U2/a_reg_1A/CK (fd2qd2) U2/a_reg_1A/CK (fd2qd2) 0.00 0.00 0.000.00 rr
55
U1/U27/Y (ivd2) U1/U27/Y (ivd2) 0.10 0.10 1.66 f1.66 fU1/U28/Y (ao21d2) U1/U28/Y (ao21d2) 0.25 0.25 1.91 r1.91 rU1/U31/Y (oa21d2) U1/U31/Y (oa21d2) 0.20 0.20 2.11 f2.11 fU1/U25/Y (xn2) U1/U25/Y (xn2) 0.39 0.39 2.50 r2.50 rU1/c[5] (adder4) U1/c[5] (adder4) 0.00 0.00 2.50 r2.50 rsumout_reg_5A/D (fd2q) sumout_reg_5A/D (fd2q) 0.00 0.00 2.50 r2.50 rdata arrival time data arrival time 2.502.50
clock clock clkclk (rise edge) (rise edge) 2.00 2.00 2.002.00clock network delay (ideal) clock network delay (ideal) 0.00 0.00 2.002.00clock uncertainty clock uncertainty --0.20 0.20 1.801.80sumout_reg_5A/CK (fd2q) sumout_reg_5A/CK (fd2q) 1.80 r1.80 rlibrary setup time library setup time --0.43 0.43 1.371.37data required time data required time 1.371.37------------------------------------------------------------------------------------------------------------------------------data required time data required time 1.371.37data arrival time data arrival time --2.502.50------------------------------------------------------------------------------------------------------------------------------slack (VIOLATED) slack (VIOLATED) --1.121.12
56
Critical PathCritical Path
57
WaveformWaveform
58
Path ProfilerPath Profiler
59
LayoutLayout
Place & routePlace & route–– FloorplanningFloorplanning–– Clock tree insertionClock tree insertion–– Routing the databaseRouting the database
60
11
FloorplanningFloorplanning
Most critical stepMost critical stepPlace cells and macros in proper location: reduce net Place cells and macros in proper location: reduce net RC delays and routing capacitancesRC delays and routing capacitancesMinimum possible area while meeting timing Minimum possible area while meeting timing requirementsrequirementsDivide design into manageable blocks: hierarchical Divide design into manageable blocks: hierarchical placement and routingplacement and routingTDL: Timing Driven LayoutTDL: Timing Driven Layout–– Forward annotating timing information to layout toolForward annotating timing information to layout tool–– Place cells with timing priority not to violate path constraintsPlace cells with timing priority not to violate path constraints
61
Clock Tree InsertionClock Tree Insertion
CTS: Clock Tree SynthesisCTS: Clock Tree Synthesis–– Control clock latency and skewControl clock latency and skew–– After cell placement, before routingAfter cell placement, before routing
RecommendationsRecommendations–– Use a balanced tree structure with minimum Use a balanced tree structure with minimum
number of levelsnumber of levels–– Use high drive strength buffers (inverters)Use high drive strength buffers (inverters)–– First level: a single buffer driven by Pad, placed First level: a single buffer driven by Pad, placed
near center, connected to next level through equal near center, connected to next level through equal interconnect wiresinterconnect wires
62
RoutingRouting
Global routingGlobal routing–– Assigns a general pathwayAssigns a general pathway–– Divide layout surface into several regionsDivide layout surface into several regions
Detailed routingDetailed routing–– Make use of information gathered by global Make use of information gathered by global
routeroute–– Routes geometric wires within each region Routes geometric wires within each region
of layout surfaceof layout surface63
ExtractionExtraction
WireWire--load modelload model–– Statistically estimating: inaccurateStatistically estimating: inaccurate
ExtractionExtraction–– Produce delay valuesProduce delay values–– Back annotated to PT for static timing analysisBack annotated to PT for static timing analysis
Net RC delays in SDF formatNet RC delays in SDF formatCapacitive net loading values in Capacitive net loading values in set_loadset_load formatformatParasitic information for clock and other critical netsParasitic information for clock and other critical nets
–– To DC for further optimizationTo DC for further optimizationNet RC delays in SDF formatNet RC delays in SDF formatCapacitive net loading values in Capacitive net loading values in set_loadset_load formatformat
64
Routing & Extraction FlowRouting & Extraction Flow
Synthesis and Optimization
Floorplanning, Placementand Clock Tree Insertion
Global Routing
Extract Estimated Delays
TimingOK?
Detailed Routing
Extract Real Delays
TimingOK?M
ajor
Tim
ing
Viol
atio
ns
No
Min
or T
imin
g Vi
olat
ions
No
Yes
No
Yes
65
PostPost--Layout OptimizationLayout Optimization
Major violations: full synthesisMajor violations: full synthesisMinor violations: Minor violations: IPO(InIPO(In--Place Place Optimization)Optimization)Back annotation to DCBack annotation to DC–– Net RC delays in SDF formatNet RC delays in SDF format–– Capacitive net loading in Capacitive net loading in set_loadset_load filefile–– Physical placement information in PDEFPhysical placement information in PDEF
Fixing holdFixing hold--time violationstime violations66
12
DFT (DesignDFT (Design--ForFor--Test)Test)
Merging testability features early in design Merging testability features early in design cyclecycleFault models: stuckFault models: stuck--at fault modelat fault model
–– High fault coverage correlates to high detect High fault coverage correlates to high detect coveragecoverage
faults possible of number totalfaults detectable of number converage Fault =
67
DFT MethodsDFT Methods
TypesTypes–– Scan insertionScan insertion
link multiplexed fliplink multiplexed flip--flops(scanflops(scan--flops) to form a scan flops) to form a scan chainchain
–– Memory BIST (BuiltMemory BIST (Built--InIn--SelfSelf--Test) insertionTest) insertion–– BoundaryBoundary--Scan insertion (test board connections)Scan insertion (test board connections)
SynopsysSynopsys Test Compiler (TC)Test Compiler (TC)–– Scan insertion, test pattern generation, JTAG or Scan insertion, test pattern generation, JTAG or
boundary scan insertion, JTAG controller and boundary scan insertion, JTAG controller and surrounding logic generationsurrounding logic generation
68