Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Optimization of VHDL Intermediate FormsKeith D. Cooper, John Bennett & Linda Torczon Rice University
Q
QN
D
G
Q
QN
D
G
Q
QN
S
R
QN
Q
S
R
Optimization
New Ideas
Impact Schedule
1998 1999 2000
Benchmark Activity
OptimizationActivity
GatherExamples
• Work from standard intermediate forms• Develop benchmarks & metrics• Develop and apply analogs of classic code optimizations
• Smaller, faster circuits from software specifications• Wider range of application for VHDL • Method for assessing quality of different compilers
DevelopMetrics
Select IF
Get Tools
IF MeasurementTools
PrototypeOptimizations
DistributeCode
ContinueOptimization
Project Vision
Set out to improve the quality of VHDL-derived circuits• View circuit as a graph
→Nodes perform operations
→Edges represent flow of values
• Resembles compiler internal representations from the 60’s & 70’s
Project relies on analogy to compilation• Cells in the design resemble operations in code
• Simple set of values
• Derive knowledge & use it to improve implementation
• Adapt classic compiler optimizations to these graphs
Motivating Example (one last time)
QN
Q
S
R
Instead of this
Q
QN
D
G
Q
QN
D
G
Q
QN
S
R
Produced this design
ENTITY rs IS port (s,r: IN STD_LOGIC := '0';
q,qn: OUT STD_LOGIC);END rs;--ARCHITECTURE behavior OF rs ISBEGIN PROCESS(s,r)
BEGIN -- the ff will not change if r and s are both '0' IF s = '0' AND r = '1' THEN
q <= '0';qn <= '1' ;
ELSIF s = '1' AND r = '0' THENq <= '1' ;qn <= '0' ;
ELSIF s = '1' AND r = '1' THENq <= 'X' ;qn <= 'X' ;
END IF; END PROCESS;END behavior;
This VHDL
We now know more about this simpleflip-flop than anyone should
To try this, we needed a fulcrum• Commercial VHDL systems export & import standard IRs
• Our idea: transform those IRs and re-insert them into system
• Using commercial tools lends credibility & realism to experiment
Project Vision
Prototype Optimizer
Looks like a classiccompiler
CircuitSimulationVendor Tools
CircuitDesign
VHDLCode
Standard IRs
Project Team
Keith D. Cooper• Compiler-based optimization
• Analysis of programs
John Bennett• DSM multiprocessing & performance
• Safety & reliability of latex gloves
Linda Torczon• Back end compiler techniques
• Programming environments & tools
We propose to apply the insights derived in classic scalar compiler research to the problems of compiling the hardware description language VHDL.
Two main subprojects
• Prototype optimizer for some VHDL intermediate representation> choice of IR generated inordinate interest> must be used in both simulation & synthesis
• Collection of VHDL codes to evaluate translation & optimization> small examples to demonstrate correctness & effectiveness> large examples to observe & understand interactions
If we can demonstrate that the analogy to scalar optimization is valid, wewill create a connection to forty years of research in code improvement.
The Big Picture (Chicago PI Meeting, 11/97)
Two distinct world views
• View circuit as a set of logic equations> rearrange & simplify with Boolean algebra> fewer terms ⇒ smaller circuits> shorter paths ⇒ faster circuits
• View circuit as a graph that specifies a computation> analyze flow of values through the graph> transform the graph to improve it> makes “state” explicit & exposed
We will focus on the latter approach• successful optimizer will incorporate both styles• transformations provide framework for simplification
Our Approach (Chicago PI Meeting, 11/97)
Project Schedule
Year 1 Select an IR
Get VHDL & IR tools in house & running
Collect test suite of VHDL Examples
Develop metrics for IR files
Year 2 Build IR measurement tools
Build prototype IR-to-IR transformations
Distribute test suite & measurement tools
Year 3 Continue optimizer development
Distribute prototype optimizer
We are 2.5 years into a 3 year project
Administrative Issues
Fiscal Matters• Contract began 10/97, scheduled run through 9/00
• Have received $670,000; committed $620,000 (out of $751,000)
• Under current plan, could spend most of the money by October
Personnel Changes• John Bennett is leaving Rice in June
→E. Speight & B. Balabanos are also leaving
→The E&CE portion of the team will be gone
• Keith Cooper & Linda Torczon are staying
Road Map for Rest of Talk
Research Progress• Roughly chronological
• Hit the insights rather than the details
Perspective• What have we learned?
• Where should we go from here?
Choosing an IR
A Critical Decision, Made in Year 1• Wanted available commercial tools• Wanted low-level, detailed view of designs• Multiple vendors would be an advantage
Examined EDIF, SAVANT /AIRE, Ocean, & Alliance• None were ideal• Ocean & Alliance deal make strong assumptions (& subset EDIF)• SAVANT /AIRE lacked commercial strength synthesis tools
• EDIF had the best blend of coverage & tools
Selected EDIF
• Expected to find EDIF tools available (EIA standard)• Knew it provided pathways in & out of commercial tools
Focus on analysis &improvement fromlow-level facts, asopposed to source-to-source approach
Reality: available opensource tools handledsmall EDIF subsets
Graph builder instantiated designs from EDIF files• Significant expansion & renaming (EDIF is hierarchical)
• Graph properties measured by walking the graph
We identified several design properties as metrics
Can walk graph and measure most of these ...
Number of gates Number of nets Placer time
Levels of logic Outputs per net Router time
Gates per level Number of cycles Connectedness
Choosing Design Metrics for Evaluation
From Santa Fe Poster
Acquiring and Building the Tools
Lack of open source tools meant much more development• Designed & built parser for EDIF 2.0
• Distributed via WWW, used in other projects
• Needed (& built) many other components
→Graph package, pretty-printer, AST→graph, graph→EDIF
→Value numbering, reordering tool, peephole pass
→Tool to understand libraries & build a database
• More development than we anticipated
• If these have value to others,we will distribute them
About 5,000 linesof specs + C code,becomes 11,000
Parser
Graph
Opt’ns Printer
Prototype Optimizer
Building Prototype Optimizer & Measurement Tools
Designed & built a value numbering tool• Extended Balke’s classic algorithm for boolean circuits
• Assigns a name to each cell output s.t. same name ⇔ same value
• Finds many value identities
Built a small peephole optimization phase• Identifies cascaded operations that can be combined
• Common in EDIF generated from behavioral codes
• Broaden the range of inputs that generate good designs
Measurement tools involve graph traversals• Most built into graph package & infrastructure
• Can pull them out and package them separately
Vendor tools makesame measurements
Gathering Examples
Benchmarks for translator performance• Small examples to illustrate specific principles
• Large examples to get realistic results
Built up a collection of VHDL/EDIF codes• Some developed in-house, some from outside sources
• Include various ACS Program benchmarks
• Do not have permission to redistribute some of them
• Range from 100 to 40,000 lines of EDIF
Have used these extensively in our debugging efforts
Diversion on Place & Route Time
Dr. Munoz asked us to look at place & route problem• Did a small study on the impact of order on place & route time
• Reversing generated order showed small, consistent improvements
• Raised some skepticism, but has a simple rational basis
Our examples are not hard enough to be conclusive• Need larger circuits with very long place & route times
• Need to explore more reordering metrics, along with selectivereplication to reduce connectivity
• Need tools that handle larger examples (more on this later)
Generating the nodes in hierarchical orderplaces the most constrained nodes last
Diversion on Optimizing Mapped Circuits
We were asked about optimizing mapped circuits (4/99)
Developed technique to address this problem
• Analysis to isolate sequential circuits set off by latches
• Extract equations for these sequential portions
• Use external simplifier & resynthesize circuit
• Hand-simulation removed 1 of 18 CLBs from pipelined DFA
• Balabanos is working to improve methods & broaden theirapplicability
Status Today
Several working tools• Parser, pretty-printer, graph-builder, IR measurements
• Value numbering appears to work
• Still debugging graph→EDIF translation
→Difficulties getting big examples through translation
• Minimal ability to work with larger codes
→Limits place & route work, testing of benchmark codes
Have proved that the ideas, in concept, have merit• Can optimize original motivating example
• Find redundancy in many of the example
• Our analogy to programming languages worked
Stymies work on almostall fronts
Building a full-scale prototype is a huge task
EDIF Issues• Level of detail in EDIF (and in the designs) is enormous
• EDIF encodes much of that detail into vendor-specific libraries
→Requires a deep understanding of libraries
→New examples involve new & obscure corners of the library
• Scaling up entails changing devices and libraries (larger FPGAs)
Debugging Issues• Debugging has been the Achilles’ heel of this project
• Inherent problem in the design of the system
Perspective
Forced to reimplement many of theinternal details of the synthesis tools
Debugging the Optimizer
Prototype Optimizer
CircuitSimulationVendor Tools
CircuitDesign
VHDLCode
Standard IRs
Building the graphmakes the designunrecognizable
If it all works, it works• New EDIF becomes new design
If the optimizer has bugs, ...• It destroyed the name space
• It removed any landmarks
• Vendor tools no longer help
Perspective
This was a high-risk effort• We succeeded in proving the concept
• Prototype works on modest examples
• Prototype does not scale to large examples (details)
To build a full-scale prototype• Should work with internals of an existing commercial system
• Substantial programming effort (4-5 people)
⇒ Better done by a vendor than by an academic project
Vendors are moving in this direction
• Synopsis, for example, has been hiring classical compiler writers
Our options include• Continue on our current path until the end of the contract
→Focus on debugging and scaling the prototype
→May succeed, or may not
• Clean up what is there & make it available on the WWW
→Caveats on prototype & scaling
→Focus on publishing algorithms, problems, & solutions
We’re looking for guidance
Where do we go from here?
Do Not Cross This Line
The following slides are the bits and pieces used to put themain talk together. These are not intended for publicconsumption, and are not part of the main talk. However,they may be useful in any subsequent discussion
The Really Big Picture
If we succeed• Behavioral VHDL might become practical
• Larger pool of designers, larger set of applications
• PC-based FPGA board used for low-volume applications
VHDL generated from other tools becomes more attractive• LabView
• High-level programming languages
• All require better optimization
And, faster place & route times
There are several lessons here.
1) The example turns out to be quite subtle. We need to look at more diverse examples, large and small.
2) The translator should be cautious about mapping indeterminate values to equal values. (In data-flow analysis, a similar issue arises with uninitialized values. We end up representing them with a “top” in lattice theory rather than a “bottom”. Something similar might be useful here.)
3) Hand simulation of value numbering pays off on an appropriate circuit.
4) We need parser to look at larger circuits, such as the FPU.
Lessons
Lessons We’ve Learned
Need full power of “global” algorithms• Cyclic graphs abound
• Regional methods are NOT sufficient
Our “language” analogy holds up moderately well• Value numbering, peephole optimization carry over
• Database on library resembles target-specific knowledge
There is no substitute for studying examples• Understand the weaknesses of translation
• Recognize opportunities for improvement
An Aside about EDIF
Adopted LISP-like syntax model• Parentheses as brackets• Operator immediately follows opening parenthesis• Should simplify parsing
However• Keyword options are syntactically constrained• Many options, many “statements”, each idiosyncratic• Grammar has roughly 1,000 productions
→Twice the size of Fortran, three times the size of C• Throws away many of the benefits of LISP-like syntax
Library Knowledge Base
Optimizer needs to “understand” the library• Encoded in a 2-way hash table• Similar to a relational database• Records # of ports, their names, commutativity, function code• Cell with >1 output needs a hash encoding for value numbering
Building the knowledge base• Version 1 was hand-coded C function (5 to 10 lines per cell type)
• Version 2 automates much of the work• Tool walks designs & generates the “generic” information• Still need to generate some custom information
→Commutativity, hash encodings for multi-output cells→Note which “dead” cells are critical (pads, buffers,…)
Place & Route Time
Many things affect place & route time
Some things we are doing may help, too• Optimization shrinks & simplifies circuits
• Are there other simple tricks we can use?
This suggested a simple experiment• Examine impact of order on place & route time
• Might lead to preprocessor that speeds up place & route
• Experiment is a proof of concept, not a finished work
Caveat: We are not working on place & route algorithms
Place & Route Time
Many heuristic techniques are order-sensitive• Simple experiment - try routing in permuted orders
• Built a tool to reorder presentation of circuit in EDIF
• Generates four orders (original, reverse, ⇑ net size, ⇓ net size)
• Ran a series of codes through test
Goal was to determine efficacy of approach• Any significant variation => should choose an order carefully
• Experiment was a proof of concept
Results suggest that we should investigate better orderings
0
100
200
300
400
500
600
Sca
labi
lity
fft2
Sca
labi
lity
fft4
Sca
labi
lity
fft1
Sca
labi
lity
fft3
top
me
m
Tim
ing
Ver
satil
ity
Inte
rfac
ing
Ero
de
Inte
rfac
ing
Dila
te
fpad
d6
dp14
4x8
m1
1x1
u4
swap
Cap
acity
sma
llsh
ift
ma
ntm
ux
dp 1
6x8
expo
sel3
mu
x4_
10
mu
x4_
5
Codes
Pla
ce
+
Ro
ute
T
ime
(S
eco
nd
s)
Orig
Asc
Des
Rev
Optimizing VHDL Intermediate FormsDARPA/USAFRL/Rice University
February 1999Page 33
These codes may be too small to matter
Place & Route Time
Summary• Experiment shows that order matters
• Need to explore other ordering schemes
• Potential improvement is significant
Future work• Need examples that are hard to place & route
• Need ability to work with different device libraries
→Automatic generation of knowledge base
• Try orders based on locality, on adjacency, on difficulty, …
• Need tools to scale to large examples
Value Numbering Circuits
Have built a prototype value numbering pass• Based on Balke’s 1968 algorithm• Walks graph & assigns an integer to each value (a value number)
• Detects redundancies, knowable values• Need extensions to handle cyclic graphs• Operates on acyclic graphs (basic blocks)
Discovers identical values in linear-time pass over code• Natural framework for algebraic identities
• Recognizes value identity, not lexical identity
• Easily handles commutativity
• Encodes equivalence into hashing operation
5,000 lines of code
Value Numbering Circuits
Have built a prototype value numbering pass• Based on Balke’s 1968 algorithm• Walks graph & assigns an integer to each value (a value number)
• Detects redundancies, knowable values• Need extensions to handle cyclic graphs• Operates on acyclic graphs (basic blocks)
Results:• Can eliminate one D-latch from our RS flip-flop• Finds redundancies in other circuits• Need to make large-scale measurements
Need to build the transformer to rewrite the circuit
Value Numbering Circuits
Balke’s original algorithmFor each instruction i in the block
1. Get vn’s for each operand
2. Hash operand & operators to get i’s vn
3. Already exists => replace i with a reference
4. Operands all constant => evaluate & replace with value
Discovers identical values in linear-time pass over code• Natural framework for algebraic identities
• Recognizes value identity, not lexical identity
• Easily handles commutativity
• Encodes equivalence into hashing operation
Value Numbering Circuits
Required several extensions to classic approach• Propagation of negation
→Use positive & negative integers→ Invert(x) => -x, rather than a new VN→Simplify double negatives arithmetically
• Variable number of inputs & outputs• Features like IN_OUT ports and UNKNOWN ports
Points way to further extensions• Algebraic identities & simplifications• Constant (or known) values• Controlled replication
Work should be of interest to compiler community
Value Numbering Circuits
Need for “optimism”• Allow propagation of values into a cycle• In classic data-flow analysis, add top to the lattice• In value numbering, use dfa minimization algorithm (partitioning)
Use Simpson’s SCC technique• Use two tables, hopeful and proven• Iterate over cycles, hopefully, then record truth• Much faster than partitioning algorithm (1:4:10)
• Allows algebraic identities & simplifications (not in partitioning)
• Uses Balke’s algorithm as the base step
Cycles introduced by designs & by libraries (XC7000 vs. XC4000E)
Value Numbering Circuits
The implementation is a prototype• Uses a hard-coded database for specific target
• Uses a worklist iterative algorithm
• Uses classical (pessimistic) analysis (5,000 lines of code)
It should be refined and (perhaps) rewritten• Deriving database from library specification
• Implement efficient optimism (Simpson’s SCC)
• Build code to re-generate the EDIF
We expect to distribute the code later this year
ENTITY rs ISport (s,r: IN STD_LOGIC := '0'; q,qn: OUT STD_LOGIC);
END rs;--ARCHITECTURE behavior OF rs ISBEGIN
PROCESS(s,r)BEGIN -- the ff will not change if r and s are both '0'
IF s = '0' AND r = '1' THENq <= '0';qn <= '1' ;
ELSIF s = '1' AND r = '0' THENq <= '1' ;qn <= '0' ;
ELSIF s = '1' AND r = '1' THENq <= '1' ;qn <= '0' ;
END IF;END PROCESS;
END behavior;
Prototype pass reduces this to use a single latch
RS Flip-Flop Example
1
2
2
1
-24
3
5
6
6
5
-4
5
6
Peephole Optimization Pass
Analogy is to instruction selection• Discover rough edges in translation• Clean up leftovers from other passes• Place to apply limited pattern matching• Find operations that can be combined (AND2 & AND2)
• May be place to do re-association
Implementation• Knowledge base about target library• Linear traversal of circuit graph with limited window• Uses logical rather than physical adjacency• Code is quite small and efficient (100s of lines of code)
EDIF Parser
We built & distributed an EDIF parser• At last review, issue of a parser on the EDIF CD
• CD includes an EBNF grammar, not a parser• We evaluated the publicly available parsers, none was acceptable• This took about six weeks
Our parser• Full EDIF 2.0.0 grammar• Builds an abstract syntax tree, serves as basis for tools• 3,400 lines of specification => 11,000 lines of code• Additional 1,500 lines of support code• Available from our web site (Release 2)
Brent Nelson (BYU) is using our parser in his ACS project (active user)
Design Graph
We designed & implemented a design graph• Constructed by walking the AST & instantiating
• Serves as basis for analysis & transformation
• Enables “whole design” analysis and transformation
• Have working implementation
• 4,800 lines of code (1,000 interface + 3,800 low-level details)
• Will distribute more robust version this summer
This will serve as basis for transformations• Used by value numbering & peephole passes
• Used for interim evaluation of design properties
ENTITY rs ISport (s,r: IN STD_LOGIC := ' 0' ; q,qn: OUT STD_LOGIC);
END rs;--ARCHITECTURE behavior OF rs ISBEGIN
PROCESS(s,r)BEGIN -- the ff will not change if r and s are both ' 0'
IF s = ' 0' AND r = ' 1' THENq <= ' 0' ;qn <= ' 1' ;
ELSIF s = ' 1' AND r = ' 0' THENq <= ' 1' ;qn <= ' 0' ;
ELSIF s = ' 1' AND r = ' 1' THENq <= ' X' ;qn <= ' X' ;
END IF;END PROCESS;
END behavior;
This version cannot beoptimized down to 1 latch
(It’s not a flip-flop!)
Original Example
ENTITY rs ISport (s,r: IN STD_LOGIC := '0'; q,qn: OUT STD_LOGIC);
END rs;--ARCHITECTURE behavior OF rs ISBEGIN
PROCESS(s,r)BEGIN -- the ff will not change if r and s are both '0'
IF s = '0' AND r = '1' THENq <= '0';qn <= '1' ;
ELSIF s = '1' AND r = '0' THENq <= '1' ;qn <= '0' ;
ELSIF s = '1' AND r = '1' THENq <= '1' ;qn <= '0' ;
END IF;END PROCESS;
END behavior;
Prototype pass reduces this to use a single latch
Slight Variation
1
2
2
1
-24
3
5
6
6
5
-4
5
6
ENTITY rs ISport (s,r: IN STD_LOGIC := ' 0' ; q,qn: OUT STD_LOGIC);
END rs;--ARCHITECTURE behavior OF rs ISBEGIN
PROCESS(s,r)BEGIN -- the ff will not change if r and s are both ' 0'
IF s = ' 0' AND r = ' 1' THENq <= ' 0' ;qn <= ' 1' ;
ELSIF s = ' 1' AND r = ' 0' THENq <= ' 1' ;qn <= ' 0' ;
ELSIF s = ' 1' AND r = ' 1' THENq <= ' 0' ;qn <= ' 1' ;
END IF;END PROCESS;
END behavior;
Value numbering also finds asingle-latch version of this circuit
The Other Variation