Upload
conroy
View
26
Download
2
Tags:
Embed Size (px)
DESCRIPTION
A Design Flow for the Development, Characterization, and Refinement of System Level Architectural Services. Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15 th , 2007. Committee Prof. Alberto Sangiovanni-Vincentelli (EECS) - Chair Prof. Jan Rabaey (EECS) - PowerPoint PPT Presentation
Citation preview
A Design Flow for the Development, Characterization, and Refinement of System Level Architectural ServicesDouglas DensmoreDissertation Talk and DES/CHESS SeminarMay 15th, 2007
Committee Prof. Alberto Sangiovanni-Vincentelli (EECS) - ChairProf. Jan Rabaey (EECS) Prof. Lee Schruben (IEOR)
2/60
Objective• To demonstrate that architecture service modeling in system level design (SLD) can allow abstraction and modularity while maintaining accuracy and efficiency.
Factor Solutions Techniques Outcomes
Heterogeneity
Modularity
Event Based Architecture Service ModelingArchitecture Service Characterization
Accuracy
Efficiency
Complexity Abstraction
Architecture Service Refinement Verification Time to Market
#1
#2
#3
3/60
Outline
1. Problem Statement
2. Approach3. Contribution
• Motivating Factors• Design Trends and EDA Growth• Software Solutions• Programmable Platforms• Naïve Approach• My Improved Approach
4/60
Motivating FactorsFactor 1: Heterogeneity
Problem StatementApproachContribution
Solution 1: Modularity1. D. Edenfeld, et. al., 2003 Technology Roadmap for Semiconductors, IEEE Computer,
January 2004.
Existing and Predicted First Integration of SoC Technologies with Standard CMOS Processes1
Year
Intel's PXA270Mypal A730 PDA (digital camera and a VGA-TFT display) Courtesy:
http://www.intel.com/design/embeddedpca/applicationsprocessors/302302.htm
Various Component TypesVarious Communication Types
System on a Chip (SoC): Block Diagram of the Intel PXA270
PCMCIAUSB
System Bus
(SRAM, Quick Capture Interface)
5/60
Motivating FactorsFactor 2: Complexity
Problem StatementApproachContribution
Solution 2: Abstraction
Courtesy: 1999 International Technology Roadmap for Semiconductors (ITRS)
Pro
du
ctivity (K
) Tran
s./Staff – M
o.
Lo
gic
Tra
nsi
sto
rs p
er C
hip
(M
)
10,000
1,000
100
10
1
0.1
0.01
0.001
Potential Design Complexity and Designer Productivity100,000
10,000
1,000
100
10
1
0.1
0.01
1981
1983
1985
1987
1989
1991
1993
1997
1999
2001
2003
2005
2007
2009
1995
Equivalent Added Complexity
58%/Yr. compounded Complexity growth rate
21%/Yr. compounded Productivity growth rate
(Top)
(Bottom)
Logic Tr./Chip
Tr./S.M
6/60
Motivating FactorsFactor 3: Time to Market
Problem StatementApproachContribution
Solution 3: Accuracy and EfficiencyChallenge: Remain modular and abstract
Courtesy: http://www.ibm.com
Year late effectively ends chance of revenue!50%+ revenue loss when nine months late.Three months late still loses 15%+ of revenue.
37% of new digital products were late to market! (Ivo Bolsens, CTO Xilinx)
Digital Consumer DevicesSet-Top EquipmentAutomotive
0
2
4
6
8
10
12
14
16
18
1991 2000 2005
Year
Mo
nth
s
16
11 10.7
Gartner DataQuest. Market Trends: ASIC and FPGA, Worldwide, 1Q05 Update edition, 2002-2008.
7/60
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
2003 2004 2009
Embedded Software Embedded ICs Embedded Boards
$Mill
ion
s
~$78.7 Billion
(Left) (Center) (Right)
Ravi Krishnan. Future of Embedded Systems Technology. BCC Research, June 2005.
Design Trends and EDA Growth
Des
ign
Co
mp
lexi
ty (
# T
ran
sist
ors
)
Today
Design Gap
Maximum Tolerable Design Gap
Methodology Gap
Design Trend
Gartner Dataquest projections of EDA industry revenue
Gartner Dataquest projection of ESL revenues
Gate Level
RTL
ESL
~22% Growthfor 2007
Tre
men
do
us
Gro
wth
Richard Goering. ESL May Rescue EDA, Analysts Say. EE Times, June 2005.
Problem StatementApproachContribution
8/60
Software Tools Solution Problem StatementApproachContribution
1. A. Sangiovanni-Vincentelli, Defining Platform-based Design, EE Design, March 5, 2002.2. K. Keutzer, Jan Rabaey, et al, System Level Design: Orthogonalization of Concerns and
Platform-Based Design, IEEE Transactions on Computer-Aided Design, Vol. 19, No. 12, December 2000.
3. 2004 International Technology Roadmap for Semiconductors (ITRS).4. F. Balarin, et al, Metropolis: an Integrated Electronic System Design Environment, IEEE
Computer, Vol. 36, No. 4, April, 2003.
PlatformDesign-Space
Export
PlatformMapping
Architectural Space
Application SpaceApplication Instance
Platform Instance
System
Platform (HW and SW)
DT Improvement
Year Productivity Delta
Productivity (Gates/Desn-Year)
Cost of Component Affected
Description of improvement
Electronic System Level (ES-Level) Methodology
2005 +60% 200K SW Development Verification
Level above RTL including both HW and SW design.
Design Technology Improvements and Impact on Designer Productivity3
• Metropolis Meta Modeling (MMM) language4 and compiler are the core components.
• Backend tools provide various operations for manipulating designs and performing analysis.
Platform Based DesignPlatform Based Design11 is composed of three is composed of three aspects:aspects:
1.1. Top Down Application DevelopmentTop Down Application Development2.2. Platform MappingPlatform Mapping3.3. Bottom Up Design Space ExplorationBottom Up Design Space Exploration
Orthogolization of concernsOrthogolization of concerns22
• Functionality and ArchitectureFunctionality and Architecture• Behavior and Performance IndicesBehavior and Performance Indices• Computation, Communication, and Computation, Communication, and
Coordination.Coordination.
Meta model compiler
Verification tool
Synthesis tool
Front end
Meta model language
Simulator tool
...Back end1
Abstract syntax trees
Back end2 Back endNBack end3
Verification tool
9/60
Programmable Platforms Problem StatementApproachContribution
1. Tsugio Makimoto, Paradigm Shift in the Electronics Industry, UCB, March 2005.
A system for implementing an electronic design. Distinguished by its ability to be programmed regarding its functionality.
At extremes:Software Programmable – GPPs, DSPsBit Programmable – FPGAs, CPLDs
What devices should the tools target? Programmable Platforms
Platform FPGAs – FPGA Fabric & Embedded Computation ElementsStrengths:Rapid Time-to-MarketVersatile, Flexible (increase product lifespan)In-Field UpgradeabilityPerformance: 2-100X compared to GPPs
Next “digital wave” will require programmable devices.1
Courtesy: K.Keuzter
Modeling focus of this work
Customization
Standardization
Source Electronics Weekly, Jan 1991
Standard Discretes
Custom LSIs
Memories, Micro-
processors
ASICs
Field Program-mability
‘57
‘67
‘77
‘87 ‘97
Weakness:Performance: 2-6x slower than ASICPower: 13x compared to ASICs
One set of models represent a very large design space of individual instantiations.
What? Why?
10/60
Programmable Platform FocusClassification Description
Granularity Abstraction level:CLB, Functional Unit, ISA
Host Coupling Coupling to host processor:I/O, direct communication, same chip
Reconfiguration Methodology
How device is programmed:Static, dynamic, partial
Memory Organization
How computations access memory:Large block, distributed
Design Levels Design Elements
Communication Storage Processing
Implementation
Switches, MUXES RAM Organization
CLB/ IP Block
uArch Crossbar, Bus Register File Size Execution Unit Type
ISA Address Size Register Set Custom Instructions
System Arch Intercon. Network Buffer Size Number/Types of tasks
K. Bondalapati, V. Prasanna, Reconfigurable Computing Systems, USC
P. Schaumont, et al, A Quick Safari Through the Reconfigurable Jungle, DAC, June 2001.
Problem StatementApproachContribution
What do MY system level models need to capture?
Xilinx Virtex II ML310 Board
Xilinx Virtex IIXC2VP30
IBM’s CoreConnect Architecture
MicroBlaze
PowerPC
11/60
Naïve Approach Problem StatementApproachContribution
ImplementationPlatform
“C” Model
RTL “Golden Model”
DisconnectedInaccurate!
InefficientMiss Time to Market!
Imple
menta
tion G
ap!
EstimatedPerformance
Data Datasheets Expertise
Bridge the Gap!!
AbstractModularSLD Tools
Architecture Model
Sim
ula
tio
n1. Design Space Exploration
2. Synthesis
Infl
exib
le A
uto
mat
ic T
oo
l Flo
w
Lengthy Feedback
Manual
Manual
12/60
My Improved Approach Problem StatementApproachContribution
AbstractModularSLD
EstimatedPerformance
Data
Technique 1: Modeling style and characterization for programmable platforms Real
Performance Data
Actual ProgrammablePlatform Description
Technique 2: Refinement Verification
Narrow the Gap
ManualInformal
Formal Checking Methods
RefinedAbstract Correct!!
New approach has improved accuracy and efficiency by relating programmable devices and their tool flow with SLD (Metropolis). Retains modularity and abstraction.
From characterization flow
Functional level blocks of programmable components
13/60
Approach Statement
Problem:SLD of architecture service models potentially is inaccurate and inefficient.
My Approach:A PBD approach to Architecture Service Modeling which allows modularity and abstraction. By relating service models to:
Problem StatementApproachContribution
Chapter 2 – System Level Architecture Services
Xilinx Virtex II
FLEETGeneral Purpose
Real Performance
Data
Chapter 3 – Architecture Services Characterization
Select architecture services from libraries
1.... ...
Assemble SLD, transaction based architecture from services.
2.
Abstract, Modular
GeneralSpecial Purpose
Augment model with real
performance data
3.
Simulation based, Design Space Exploration
4.
Structure Extractor
Produce an actual programmable platform description
5.
(i.e. MHS File)
Narrow the Gap
Programmable
Functional Modeling(Not discussed in this work)
Program actual device directly
6.
Abstract Refined
Based on DSE results, modify architecture model if needed4a.
Perform refinement check (event based, interface based, compositional component based)
4b.
Yes? No?
MHS
Chapter 4 – System Level Service Refinement
•programmable platforms,• platform characterization,• and refinement verification,
they will retain accuracy and efficiency.
14/60
Outline Revisited
1. Problem Statement2. Approach
3. Contribution
• My Improved Approach• Approach Statement• Architecture Service Descriptions • Metropolis Overview
• Programmable Architecture Service Modeling• Programmable Platform Characterization• Example of Techniques
Focus: Modularity
15/60
Problem StatementApproachContribution
Architecture Service Taxonomy
Component
Component
Component Component
Component
Component
Single Component, Single Interface
Multiple Component, Multiple Interface
Multiple Component, Single Interface
Service
Provided Interface Provided Interface
Pro
vid
ed In
terf
ace
Service
Cost CostA (C1, C2)
CostB (C2)
Provided Interface
Cost (C1, C2, C3)Service
Internal Interface
Internal Interface
Services are library elements, <F, C> where F is a set of interface functions (capabilities) and C is a set of cost models.
Single Component, Single Interface (SCSI) – One provided interface and one simple cost modelMultiple Component, Multiple Interface (MCMI) – Two or more provided interfaces, zero or more internal interfaces, one or more simple cost functions, and zero or more complex cost functions. Multiple Component, Single Interface (MCSI) – One provided interface, one or more internal interfaces, and one or more complex cost functions.
Services also classified as active or passive.
General Purpose Processor
Xilinx Virtex II Pro
Add
Multi
CF
CF
DCT
FFT
CF
CF Bus
CPU
CF
CF
Abstract
16/60
Problem StatementApproachContribution
Service Based Arch. Styles
SCSI
SCSI
SCMI
SCSI
SCMI
SCMI
SCSI
SCSI
SCSI SCSI
MCSI MCSI MCSI
MCSI
MCMI
Ovals – Passive ServicesSquares – Active Services
Architecture Style 1 - Branching
Architecture Style 2 - Ring
SCSI
MCSI
SCSI
MCSI
MCSI
MCSI
MCSI
MCSI
MCSIMCSI
SCSI SCSI
SCSI
SCSI
SCSISCSI
Branching Style – Allows for the usage of all types of services
Ring Style – Allows for the usage of Single Interface (SI) services only
Both Styles – Allow for the usage of active/passive and single/multiple component services.
Assemble collections of services to provide larger sets of capabilities and cost functions.
MCMIHierarchy – Each style can be abstracted into composite services.
17/60
Metropolis Objects Problem StatementApproachContribution
• Metropolis elements adhere to a “separation of concerns” ideology.
Proc1P1 P2
I1 I2Media1
QM1
Active ObjectsSequential Executing Thread
Passive ObjectsImplement Interface Services
Schedule access to resources and quantities
• Processes (Computation)
• Media (Communication)
• Quantity Managers (Coordination)
18/60
Metro. Netlists and EventsProblem StatementApproachContribution
Proc1
P1
Media1 QM1
Scheduled Netlist Scheduling Netlist
GlobalTime
Metropolis Architectures are created via two netlists:• Scheduled – generate events1 for services in the scheduled netlist.• Scheduling – allow these events access to the services and annotate events with quantities.
I1
I2 1. E. Lee and A. Sangiovanni-Vincentelli, A Unified Framework for Comparing Models of Computation, IEEE Trans. on Computer Aided Design of Integrated Circuits and Systems, Vol. 17, N. 12, pg. 1217-1229, December 1998
Proc2
P2
19/60
Services in Design Flow
Select architecture services from libraries
1.
Assemble SLD, transaction based architecture from services.
2. Produce an actual programmable platform description
3.
(i.e. MHS File)
Program actual device directly
4.
Process Expanded
Structure Extractor
BRAM
PowerPC
PLB
Mapping Process
Xilinx Virtex II Libraries
OPB
SynMaster uBlaze
Structure Extraction
Characterization Data Input (Chapter 3)
Problem StatementApproachContribution
Metropolis Media
Chapter 2 – System Level Architecture Services
Xilinx Virtex II
FLEETGeneral Purpose
Real Performance
Data
Chapter 3 – Architecture Services Characterization
Select architecture services from libraries
1.... ...
Assemble SLD, transaction based architecture from services.
2.
Abstract, Modular
GeneralSpecial Purpose
Augment model with real
performance data
3.
Simulation based, Design Space Exploration
4.
Structure Extractor
Produce an actual programmable platform description
5.
(i.e. MHS File)
Narrow the Gap
Programmable
Functional Modeling(Not discussed in this work)
Program actual device directly
6.
Abstract Refined
Based on DSE results, modify architecture model if needed4a.
Perform refinement check (event based, interface based, compositional component based)
4b.
Yes? No?
MHS
Chapter 4 – System Level Service Refinement
20/60
Programmable Arch. ModelingProblem StatementApproachContribution
• Computation Services
• Communication Services
• Other Services
PPC405 MicroBlaze SynthSlaveSynthMaster
ProcessorLocalBus
(PLB)
On-ChipPeripheral
Bus(OPB)
OPB/PLB BridgeMapping Process
Computation InterfacesRead (addr, offset, cnt, size), Write(addr, offset, cnt, size), Execute (operation, complexity)
BRAM
Task Before MappingRead (addr, offset, cnt, size)Task After MappingRead (0x34, 8, 10, 4)
Communication Interfaces addrTransfer(target, master) addrReq(base, offset, transType, device) addrAck(device)
dataTransfer(device, readSeq, writeSeq)dataAck(device)
• Transaction Level• IP Parameters• I/O Interfaces
Services are organized by orthogonal aspects of the system. All services created here are XCMI with more than two provided interfaces each.
Leverage function level granularity; 1-to-1 model/IP correspondence
21/60
Sample Metropolis Service
Interface Function Assumptions Cycle Count
cpuRead(int bus) Bus Dependent 1(LMB), 7(OPB) cycle
cpuWrite(int bus) Bus Dependent 1(LMB), 2(OPB) cycle
fslRead(int size) Transfer Size (1 * size) cycles
fslWrite(int size) Transfer Size (1 * size) cycles
execute(int inst, int comp)
Valid INST Field (1 * complexity) cycles
Each service profiled manually and given a set of cost models
Problem StatementApproachContribution
uBlaze
Parameterspublic medium uBlaze implements uBlazeISA, GPPOperation, OPBMaster{...}
Ports
_portOPB
_portSM
_portChar
port OPBTrans _portOPB; //connection to characterizerport cycleLookup _portChar;//FSL portsport FSLMasterInterface[] _portMFSL;port FSLSlaveInterface[] _portSFSL;//connection to StateMedia port SchedReq _portSM;//StateMedia to global timeport GTimeSMInterface _portGT;
_portGT
_portMFSL _portSFSL
private int C_FSL_LINKS;private int C_FSL_DATA_SIZE;private int C_USE_BARREL;private int C_USE_DIV;private int C_USE_HW_MUL;
Non-Ideal
22/60
Programmable Arch. ModelingProblem StatementApproachContribution
• Coordination Services
PPC Sched OPB SchedPLB SchedMicroBlaze
Sched
BRAM Sched General Sched
Request (event e)
-Adds event to pending queue of requested events
Resolve()
-Uses algorithm to select an event from the pending queue
PostCond()
-Augment event with information(annotation). This is typically the interaction with the quantity manager
GTime
23/60
Sample Metropolis QMInterfaces
public quantity PLBArb implements QuantityManager {…}
Ports
portTaskSM
public quantity SeqQM implements QuantityManager {…}
port StateMediumSched[] portTaskSM;
public eval void request(event e, RequestClass rc) {public update void resolve() {…}public update void postcond() {…} public eval boolean stable(){…}
public event getRequestEvent() {…}public int getserviceType() {…}public int getTaskId() {…}public int getComplexity() {…}public void setTaskId(int id) {…}public int getFlag() {…}public void setFlag(int flag) {…}public int getDeviceId() {…}
Quantity Manager
Request Class
Interfaces{
Problem StatementApproachContribution
Each resolve() function is unique
24/60
Architecture Extensions for Preemption•Some Services are naturally preempted
–CPU context switch, Bus transactions•Notion of Atomic Transactions
–Prior to dispatching events to a quantity manager via the request() method, decompose events in the scheduled netlist into non-preemptable chunks.–Maintain status with an FSM object (counter) and controller.
Decoder (Process)
.
3. Dispatch the atomic transaction (AT) to the quantity manager (individual events which make up the AT).
Service (Media)Process(Task)
FSM
1
3
2
2 Decoder transforms the transaction into atomic transactions
A
C
B Quantity Manager1
A
2
B
3
C
Initial State
4. Update the FSM to track the state of the transaction.
S1 S3S2
SMsetMustDo()setMustNotDo()
5. Communication with preempted processes through StateMedia
. Trans1 FSM1
Trans0 FSM0
6. Use Stack data structure to store transactions and FSMs
Transaction(i.e. Read)
1. A transaction is introduced into the architecture model.
Event
S1
Problem StatementApproachContribution
25/60
Architecture Extensions for Mapping•Programmable platforms allow for both SW and HW implementations of a function.•Need to express which architecture components can provide which services and with what affinity.
Potential Mapping StrategiesGreedyBest Average Task Specific
Mapping Process
Mapping Process
Dedicated HW DCT
FFT
DCT
Execute
AffinityTask
0/100
100/100
0/100
General PurposeuProc
2/100FFT
20/100DCT
50/100
Affinity
Execute
Task
Export information from service associated with mapping process
Export information from service associated with mapping process
HW DCT(Service)
uBlaze(Service)
Operations available
Ability to perform operations
Only can perform DCT ! Can perform multiple operations
(Task) (Task)
public HashMap getCapabilityList()
Problem StatementApproachContribution
26/60
4. Extractor Script Tasks
Structure Extractor
A. Identify parameters for service. For example MHZ,cache settings, etc.
•Type•Parameters• Etc
B. Examine port connections to determine topology.
Programmable Arch. Modeling•Compose scheduling and scheduled netlists in top level netlist.•Extract structure for programmable platform tool flow.
Problem StatementApproachContribution
Modular Modeling Style Accurate & Efficient
Scheduled Netlist Scheduling Netlist
Mapping
Process
MicroBlaze
OPB OPB Sched
MicroBlazeSched
Connec tio
ns
Topolo
gy
D. Check port names, instance names, etc for instantiation.
2. Provide Service Parameters
1. Assemble Netlists
C. Examine address mapping for bus, I/O, etc.
Top Level NetlistPublic netlist XlinxCCArchXilinxCCArchSched schedNetlist ;XilinxCCArchScheduling schedulingNetlistSchedToQuantity [] _stateMedia
5. Gather information and parse into appropriate tool format
File for Programmable Platform Tool Flow (MHS)
3. Simulate Model Decide on final topology.
27/60
Characterization in Design Flow
Real Performance
Data
Categorize and store data
3.
Physical Timing
Execution Time for Processing
Transaction Cycles
Characterizer Database
1. Select device or family
2. Create systems
Sys
tem
Cre
ato
r
S2
S1
S3
SN
Extract data from systems
Da
ta E
xtr
act
or
4.
Process Expanded
Chapter 2 – System Level Architecture Services
Xilinx Virtex II
FLEETGeneral Purpose
Real Performance
Data
Chapter 3 – Architecture Services Characterization
Select architecture services from libraries
1.... ...
Assemble SLD, transaction based architecture from services.
2.
Abstract, Modular
GeneralSpecial Purpose
Augment model with real
performance data
3.
Simulation based, Design Space Exploration
4.
Structure Extractor
Produce an actual programmable platform description
5.
(i.e. MHS File)
Narrow the Gap
Programmable
Functional Modeling(Not discussed in this work)
Program actual device directly
6.
Abstract Refined
Based on DSE results, modify architecture model if needed4a.
Perform refinement check (event based, interface based, compositional component based)
4b.
Yes? No?
MHS
Chapter 4 – System Level Service Refinement
1. Douglas Densmore, Adam Donlin, A.Sangiovanni-Vincentelli, FPGA Architecture Characterization in System Level Design, Submitted to CODES 2005.
2. Adam Donlin and Douglas Densmore, Method and Apparatus for Precharacterizing Systems for Use in System Level Design of Integrated Circuits, Patent Pending.
Work with Xilinx Research Labs
Problem StatementApproachContribution
28/60
Prog. Platform CharacterizationProblem StatementApproachContribution
1. Create template system description.
2. Generate many permutations of the architecture using this template and run them through programmable platform tool flow.
3. Extract the desired performance information from the tool reports for database population.
Need to tie the model to actual implementation data!
Process from Structure Extraction
29/60
Prog. Platform CharacterizationProblem StatementApproachContribution
From Char Flow ShownFrom Metro Model Design
From ISS for PPC
Create database ONCE prior to simulation and populate with independent (modular) information.
1. Data detailing performance based on physical implementation.2. Data detailing the composition of communication transactions.3. Data detailing the processing elements computation.
30/60
Characterized Data OrganizationProblem StatementApproachContribution
Each system interface function characterized has an entry. These indices can be a hashed if appropriate.
Entries can share data or be independent.
Entries can have all, partial, or no information.
4.2ns4ns
3.8ns3.2ns
System 1 System N
ISS uProc1FFT 20 Cycles
Filter 35 Cycles
ISS uProc2FFT 10 Cycles
Filter 30 Cycles
} PhysicalTiming
} IndexMethod
Computation } Timing
}Transaction Timing
Read = ACK, Trans, Data
Write = ACK, Data, ACK NULL
Metro Characterizer
Model
How is the data associated with each service interface function?
? ???
?
??
31/60
Slice Count Frequency
Combo Frequency and Resource Usage
0
1000
2000
3000
4000
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64Samples
Slic
e C
ou
nt
0
20
40
60
80
100
120
140
MH
Z
High Spikes in Adjacent (Similar) Samples
Decreasing but not monotonic or linear
Area Measure Often Plateaus
Added BRAM -1sAdded uBlaze – 2s
Increasing System
Complexity
1 2 111 2 2
(Are
a)
(Per
form
ance
)
Periodic Changes
Prog. Platform Characterization
PowerPC System Address Changes
0
500
1000
1500
2000
2500
1 3 5 7 9 11 13 15Sample
Slic
e C
ou
nt
0
20
40
60
80
100
120
140
MH
ZLoose Addr SlicesTight Addr Slices
Loose Addr MHZTight Addr MHZ
10%+ Delta
Area Curves Overlap
Table 3.3 Data
Top Two Curves
(Are
a)
(Per
form
an
ce
)
Problem StatementApproachContribution
Modular Characterization
Accurate & Efficient
P B U Addr. Area Max MHZ MHZ Area
1 2 1 T 1611 119 16.17% 39.7%
1 2 1 L 1613 102 -14.07% 0.12%
1 3 0 T 1334 117 14.56% -17.29%
1 3 0 L 1337 95 -18.57% 0.22%
1 3 1 T 1787 120 26.04% 33.65%
• As resource usage increases system frequency generally decreases.• Not linear nor monotonic.• 15% change is a speed grade for the devices.
•Design from rows 1, 3, and 5 of the table.
•Three abstraction levels: 1, 3, and 10 cycle transactions.
•Metropolis JPEG version: 112,500 write transactions for 3 MegaPixel, 24 bit color depth, 95% compressed image.
• 19% difference between intuition and characterization.
Created database once prior to simulation.
PLB Write Transfer Performance Comparison
Why can’t you just use a static estimation?
32/60
Modeling & Char. Review Problem StatementApproachContribution
DedHW Sched
PLB Sched
BRAM Sched
GlobalTime
PPC Sched
Task1 Task2
PPC
Task3 Task4
DEDICATED HW
BRAM
PLB
Scheduled Netlist Characterizer
Scheduling Netlist
Media (scheduled) Process
Quantity ManagerQuantity
Enabled Event
Disabled Event
SCSI
MCMI
MCMI
MCMI
Branching Architecture Example
33/60
Outline Revisited
1. Problem Statement2. Approach 3. Contribution• Architecture Refinement Verification• Vertical Refinement• Horizontal Refinement• Surface Refinement• Depth Refinement• Design Flow Examples• Summary and Conclusions
Focus: Abstraction
34/60
Arch. Refinement VerificationProblem StatementApproachContribution
• Architectures often involve hierarchy and multiple abstraction levels.
– Limited if it is not possible to check if elements in hierarchy or less abstract components are implementations of their counterparts.
• Asks “Can I substitute M1 for M2?”1. Representing the internal structure of a component.2. Recasting an architectural description in a new style.3. Applying tools developed for one style to another style.
Refinement Technique
Description Metropolis
Style/Pattern Based Define template components. Prove they have a desired relationship once. Build arch. from them.
Potential; TTL YAPI
Event Based Properties (behaviors) expressed as event lists. Explicitly look for this event patterns.
Discussed
Interface Based Create structure capturing all behavior of a components interface. Compare two models.
Discussed
Compositional Component Based
Create structures capturing local behavior. Compose larger systems by synchronizing these smaller pieces.
Discussed
D. Garlan, Style-Based Refinement for Software Architectures, SIGSOFT 96, San Francisco, CA, pg. 72-75.
35/60
Refinement Verification in Design Flow Problem Statement
ApproachContribution
Chapter 2 – System Level Architecture Services
Xilinx Virtex II
FLEETGeneral Purpose
Real Performance
Data
Chapter 3 – Architecture Services Characterization
Select architecture services from libraries
1.... ...
Assemble SLD, transaction based architecture from services.
2.
Abstract, Modular
GeneralSpecial Purpose
Augment model with real
performance data
3.
Simulation based, Design Space Exploration
4.
Structure Extractor
Produce an actual programmable platform description
5.
(i.e. MHS File)
Narrow the Gap
Programmable
Functional Modeling(Not discussed in this work)
Program actual device directly
6.
Abstract Refined
Based on DSE results, modify architecture model if needed4a.
Perform refinement check (event based, interface based, compositional component based)
4b.
Yes? No?
MHS
Chapter 4 – System Level Service Refinement
1.Identify changes to be made (structural or component)
Process Expanded
P1
P2
P3M1
M2
P1
P2
P3M1
M2
P4
M3
Abstract
Refined
Yes? No?
P1
P2
P31M1
M2
P32
MN
P3StructuralComponent
A. Inter-component structural changes (compositional component based)
Run verification tools2.
B. Structural changes between scheduled and scheduling components (event based)
P1
M1
Scheduled Scheduling
P3
C. Intra-component changes (Interface based)
P2
A
C
B
P2
A
C
B
(More Functionality)
Events
Refinement Question
Surface Refinement1
• Interface Based• Control Flow Graph
• Focus on introducing new behaviors (Reason 1)
Vertical Refinement1
Horizontal Refinement1
• Event Based• Event Based Properties
• Focus on abstraction & synthesis (Reasons 2 & 3)
Depth Refinement• Compositional Component Based
• Labeled Transition Systems
• Focus on reasons 1, 2, and 3
1. Douglas Densmore, Metropolis Architecture Refinement Styles and Methodology, University of California, Berkeley, UCB/ERL M04/36, 14 September 2004.
36/60
Vertical Refinement Problem StatementApproachContribution
BRAM Sched
Cache Sched
Scheduled Netlist
Scheduling Netlist
Mapping Process
Mapping Process
Rtos Sched
Neworigins andamountsof eventsscheduledand annotated
Sequential
Concurrent
BRAM
PPC405
PLBCache
Rtos
PLB Sched
PPC Sched
Original Sequential Concurrent 1 Concurrent 2
E1 (CPURead) E1 (RTOSRead) E1 (CPURead) E1 (CPURead)
E2 (BusRead) E2 (CPURead) E2 (CacheRead) E2 (CacheRead)
E3 (MemRead) E3 (BusRead) E3 (BusRead)
E4 (MemRead) E4 (MemRead)
•Definition: A manipulation to the scheduled netlist structure to introduce/remove the number or origin of events as seen by the scheduling netlist.
37/60
Horizontal Refinement Problem StatementApproachContribution
BRAM Sched
Cache Sched
Scheduled Netlist
Scheduling Netlist
Mapping Process
Mapping Process Rtos Sched
Orderingof eventrequestschanged
BRAM
PPC405
PLBCache
Rtos
PLB Sched
PPC SchedArb
ControlThread
Original* Refined (interleaved)E1 (BusRead) -> From CPU1 E1 (BusRead) -> From CPU1
E2 (BusRead) -> From CPU1 E3 (BusRead) -> From CPU2
E3 (BusRead) -> From CPU2 E2 (BusRead) -> From CPU1
E4 (BusRead) -> From CPU2 E4 (BusRead) -> From CPU2
PPC405 PPC Sched •Definition: A manipulation of both the scheduled and scheduling netlist which changes the possible ordering of events as seen by the scheduling netlist.
*Contains all possible orderings if abstract enough
38/60
Event Based Properties Problem StatementApproachContribution
• Properties expressed as event sequences as seen by the scheduling netlist.
Bad Resolve() Good Resolve()
CPU E1, E2, E3, E4 E4, E1, E2, E3
Bus X, X, X, X, E4 X, E4
Mem X, X, X, X, X, E4 X, X, E4
Bad Resolve() Good Resolve()
CPU (0) E1, E2 E1 E1, E2 E1
CPU (1) E2, E3 E2 E2, E3 E2
Bus (1) E1, EX EX E1, EX EX
CPU (2) E3 E3 E3 E3
Bus (2) E1, E2 E2 E1, E2 E1
CPU(3)
Bus (3) E1, E3 E3 E2, E3 E2
E1 (CPUExe)E2 (CPUExe)E3 (CPUExe)
E4(CPURead)
Resource Utilization
Latency
39/60
Macro and MicroProperties
MicroProperty - The combination of one or more attributes (or quantities) and an event relation defined with these attributes.
MacroProperty – A property which implies a set of MicroProperties. Defined by the property which ensures the other’s adherence. The satisfaction (i.e. the property holds or is true) of the MacroProperty ensures all MicroProperties covered by this MacroProperty are also satisfied.
Snoop Complete (SC)0
Data Valid (DV)0
Sufficient Space (SS)
0
Data Coherency (DCo)Level 1
Read Access (RA)1
Write Access (WA)1Data Consistency (DC)
Level 2
No Overflow (NO)1Sufficient Bits (SB)
2
Data Precision (DP)Level 3
Problem StatementApproachContribution
40/60
Event Petri Net
DVSC
pDCo
t1 t2
t3
NO
SB
SS
pDP
RA
WA
2
pDC
t4
t5
t6
t7
t8
t9t10
t11
t12
t13
t14
3
3 4
Bu
sRea
d BusWrite
CP
UW
rite
CP
UE
xecute
Mem
Rea
d
MemWriteCPURead
tE1 tE2tE3
tE3 tE4
tE5 tE6
tC1
tC2tC3
pC1
pC2
pC4
pC3pC5
pC6
start1
start2 start3
Model EPN
Prop EPN
Two Petri Nets – One for the service model and one for the events of interest.
Model Event Petri Net – One transition set which represents events of interest, tEN. Transitions also are used to indicated interface functions.
Property Event Petri Net – Initial marking vector is empty. One place per Macroproperty, p<prop>. Created such that in order to create a token in each MacroProperty place, all transitions must fire once and only once.
Link the two event petri nets together such that select tENs feed connection transitions, tCN, which produce the needed tokens for the property EPN.
Problem StatementApproachContribution
41/60
Surface Refinement Def.
• Defined as in Hierarchical Verification1
–Model: An object which can generate a set of finite sequences of behaviors, B–Trace: a B–Given a model X and a model Y, X refines the model, denoted X < Y if given a trace a of X then the projection a[ObsY] is a trace of Y.–Two models are trace equivalent, X Y if X < Y and Y < X.
• The answer to the refinement problem (X,Y) is YES if X refines Y, otherwise NO
Problem StatementApproachContribution
1. T.Henzinger, S.Qadeer, S.K. Rajamani, “You Assume, We Guarantee: Methodology and Case Studies”, 10th International Conference on Computer Aided Verification (CAV), Lecture Notes in Computer Science 1427, Springer-Verlag, 1998, p.440-451.
Component P1
P3
P2
1
3
2
4
Interfaces(Ports)
Internal OperationNot Visible
Example:Interface Calls
on Ports
Unknown MoC(DataFlow, KPN, Etc)
Observable
Provides Services
Component
Surface
Su
rfac
e
Surface
Required Services
Restriction on the location and information available to define component behavior.
Surface
Su
rface
Surface
TraceM – Trace in Metropolis = a finite set of function calls to media via interfaces
42/60
Control Flow Graph •Defined much like*•Tuple <Q, qo, X, Op, >
–Q – Control Locations–qo – initial CL–X – set of variables–Op – function calls to media, basic block start and end - transition relation//sample code
Process example{port Read port1;port Write port2;
Void thread(){int x = 0;while (x < 2){port1.callRead();x++;}port2.callWrite();}}
Problem StatementApproachContribution
1
73
2
4
6
5
8
Control Location 1Group Node Type: ProcessDeclNodeInitial Control Location
X = 0
Control Location 2Group Node Type: LoopNodewhile loop
X < 2 X >= 2
9
Control Location 7Group Node Type: ThisPortAccessNode
Control Location 8Group Node Type: NoneEnding of basic block
Port2.callWrite()+
Port2.callWrite()-
Control Location 9Group Node Type: NoneSink State
10
Control Location 10Group Node Type: NoneBookend of LoopNode
Port1.callRead()+
Port1.callRead()-
X++(+)
X++(-)
Control Location 3Group Node Type: ThisPortAccessNode
Control Location 4Group Node Type: NoneEnding of basic block
Control Location 5Group Node Type: Collection of Variable Nodes
Control Location 6Group Node Type: Variable Node (collection) - End
321X = 0 X = 1 X = 2
Hypothetical Automaton for X variable
Graph for Model
*”Temporal Safety Proofs for Systems Code”, Henzinger et al.
43/60
Surface Refinement Domains
Component(Producer)
Component(Adder)
Component(Producer)
Component(Adder)
Component(Memory)
move.source.Prodmove.dest.Prod
move.source.Addermove.dest.Adder
move.source.memmove.dest.mem
Component (Switch Fabric)
Component (Switch Fabric)
Add (input1, input2) Adder (input1, input2)prodLit() prodLit() get() put()
move.dest.Adder move.dest.Prod
Add (input1, input2) prodLit()
move.dest.Adder move.dest.Prod move.dest.mem
prodLit()Add (input1, input2)get() put()
Communication Ref Domain
Computation Ref Domain 1 Computation Ref Domain 2 Computation Ref Domain 1
Communication Ref Domain
Storage Ref Domain 1
OP OP
OPOP
OP
OP
OP OP
move.source.Prodmove.dest.Prod
move.source.Addermove.dest.Adder
move.source.Adder
move.dest.Adder move.dest.Prod
move.source.Prod
move.source.mem
move.dest.mem
move.source.Adder
move.source.Prodmove.dest.Adder
Problem StatementApproachContribution
<C, P, OP>
44/60
Surface Refinement ExampleProblem StatementApproachContribution
1. Douglas Densmore, Sanjay Rekhi, A. Sangiovanni-Vincentelli, MicroArchitecture Development via Successive Platform Refinement, Design Automation and Test Europe (DATE), Paris France, 2004.
Trace FIFO Scheduler Process Traces (*function calls abbr)
T1 Terminated()
T2 Terminated()
wRnd()*
T3 Terminated()
wRnd()* wRnd()*
T4 Terminated()
wRnd()* Tnated()*
qData ()*
T4 Cont putPolicy() PR1S()*
Bref = {T1, T3, T4} Bab = {T1, T2, T3, T4} Refinement!
1
3
2
4
5
6 7
89
10 11
12
terminated()
True False
whatRound()
Type & !Done Else
whatRound()checked_allterminated()
True False
FIFO SchedulerRef
queryData()
putPolicy()putRound1_
Status
FIFO SchedulerAb
1
3
2
4
5
6 7
89
10 11
12
terminated()
True False
whatRound()
Type & !Done Else
whatRound()checked_allterminated()
True False
queryData()
putPolicy()putRound1_
Status
!Type & !Done
Trace containment check for single threaded processes
45/60
Surface Refinement Flow Problem StatementApproachContribution
CFG Backend(automatic)
Metropolis Model (.mmm)
Visual Representation (for debugging)
Reactive Module of CFG (X)
MOCHA
Kiss file of CFA
SISstate_assign script
(automatic)
BLIF file
Manual Edits to BLIF and NEXLIF2EXE
Mode.exe file
FORTE
Witness Module
(W)
Edit and Parallel Composition
(manual)
X||W
1
23
3a
3b
4
4a
4b
4c
Answer to X Y
Answer to X Y
CFA (Y) developed in previous iteration
BLIF file developed in previous iteration
Three primary branches:1. Visual representation for debugging
2. CFG conversation to a reactive module. Works with the MOCHA tool flow. Requires manual augmentation of a witness module since Y has private variables.3. CFG conversation to a KISS file. Works with the SIS and Forte tool flows. Requires manual edits to BLIF to EXLIF.
46/60
empty
notempty
full
write
write2read2
Depth Refinement - LTS
• Definition: A Labeled Transition System (LTS) is a
tuple <Q, Q0, E, T, l> where:
–Q is a set of states,–Q0 Q is a set of initial states,–E is a finite set of transition labels or actions,–T Q x E x Q is a labeled transition relation, and–l : is an interpretation of each state on system variables.
•But in LTS there is no notion of input signals
–When we compose LTS, a transition can be triggered when another LTS is in a given state.
Service
ReadWrite
• Depth Refinement – Want to make inter-component structural changes.
Olga Kouchnarenko and Arnaud Lanoix. Refinement and Verification of SynchronizedComponent-Based Systems. In FME 2003: Formal Methods, Lecture Notes in Computer Science,volume 2805/2003, pages 341–358. Springer Berlin / Heidelberg, 2003
Write2 Read2
Problem StatementApproachContribution
47/60
Refinement Rule 1
• If there is a transition in the refined LTS from one state to another, then there must be the same transition in the abstract
• Note: The two transitions must have the same label!
Strict transition refinement
Problem StatementApproachContribution
48/60
Refinement Rule 2
• If there is a new (tau) transition in the refinement LTS, then its beginning state and ending state must correspond to the same state in the abstract
Stuttering transition refinement
Problem StatementApproachContribution
49/60
Refinement Rule 3
• There are no new transitions in the refinement that go on forever
Lack of τ-divergence
Problem StatementApproachContribution
50/60
Refinement Rule 4
• If there is a transition in the abstract and the corresponding refined state does not have any transition then – there must be another refined state that corresponds to the abstract– it must take a transition to another refined state and in the abstract must exist a state so
that these two are glued together.
External non-determinism preservation
Problem StatementApproachContribution
51/60
Depth Ref. Design Flow1. Create a .fts file
capturing the LTS for each component of the refined and abstract systems.1. Define observable
events, OE
2. Transaction labels correspond to OE
2. Define gluing invariants in .inv file.
Problem StatementApproachContribution
empty
notempty
full
write
write
read
read
empty
notempty
full
d1
d2
write2
read2
readwrite
readwrite
write
write
read
read
Abstract RefinementGluing Relation
Gluing Relation
Gluing Relation
Gluing Relation
Gluing Relation
Transition System//Two state valuestype SIGNAL = {consume, wait}local con : SIGNAL
//Can only be in one stateInvariant(con = consume) \/ (con = wait)
//Initial stateInitially (con = wait)
//Transistion to consume (‘‘get’’ event)Transition get :enable (con = wait) ;assign con := consume
//Transition to wait (‘‘stallC’’ event)Transition stallC :enable (con = consume) ;assign con := wait
((con = consume) <--> (conR = consume))/\((con = wait) <--> ((conR = wait) \/ (conR = clean)))
//Buffer Events (reads and writes)//‘‘write1’’ event is enabled when the LTSs are in the following states(write1) when((prod = produce) /\ (buf = empty) /\ (con != consume)),(write3) when((prod = produce) /\ (buf = notempty) /\ (con != consume)),(read1) when((prod != produce) /\ (buf = notempty) /\ (con = consume)),(read3) when((prod != produce) /\ (buf = full) /\ (con = consume)),
//Producer Eventsmake when(prod = wait),stall when(prod = produce),
//Consumer Eventsget when(con = wait),stallC when(con = consume)
3. Define synchronization between LTS in .sync file.
52/60
File for Xilinx EDK Tool Flow
IP Library
1. Select an application and understand its behavior.
2. Create a Metropolis functional model which models this behavior.
3. Assemble an architecture from library services or create your own services.4. Map the functionality to the architecture.
5. Extract a structural file from the top level netlist of the architecture created.
On-ChipPeripheral
Bus(OPB)
SynthMaster
SynthSlave
MicroBlaze
Mapping ProcessMapping
Process
Mapping ProcessMapping
Process
BRAMBRAM
Preprocessing DCT Quantization Huffman
JPEG Encoder Function Model (Block Level)
StructureExtractor
Top Level Netlist
Problem Statement Approach ContributionExample Design
53/60
Example Design Cont. Problem StatementApproachContribution
File for Xilinx EDK Tool Flow
Permutation Generator
ISS Info CharDataTransaction
Info
Platform Characterization Tool (Xilinx EDK/ISE Tools)
Characterizer Database
Software Routinesint DCT (data){Begin calculate ……} Automatic32 Bit Read = Ack, Addr, Data, Trans, Ack
Manual
Hardware RoutinesDCT1 = 10 CyclesDCT2 =5 CyclesFFT = 5 Cycles
Manual
1. Feed the captured structural file to the permutation generator.
2. Feed the permutations to the Xilinx tools and extract the data.
3. Capture execution info for software and hardware services.
4. Provide transaction info for communication services.
Permutation 1 Permutation 2 Permutation N
54/60
Example Design Cont. Problem StatementApproachContribution
Preprocessing DCT Quantization Huffman
JPEG Encoder Function Model (Block Level)
On-ChipPeripheral
Bus(OPB)
SynthMaster
SynthSlave
MicroBlaze
Mapping ProcessMapping Process
Mapping ProcessMapping
Process
BRAMBRAM
ISS InfoCharDataTransaction
Info
2. Refine design to meet performance requirements.
3. Use Refinement Verification to check validity of design changes.
• Vertical, or Horizontal• Depth, Surface• Refinement properties
1. Simulate the design and observe the performance.
Execution time 100msBus Cycles 4000Ave Memory Occupancy 500KB
BRAM
ConcurrentVertical Refinement
New Algorithm
Surface
VerificationTool
Yes? No?
Execution time 200msBus Cycles 1000Ave Memory Occupancy 100KB
4. Re-simulate to see if your goals are met.
Backend Tool Process:1. Abstract Syntax Tree (AST) retrieves structure.
2. Control Data Flow Graph - SurfaceFORTE – Intel ToolReactive Models – UC Berkeley
3. Event Traces – Refinement Properties.
Vertical RefinementHorizontal Refinement
55/60
MJPEG Encoding
Arch 1Arch 3
Arch 4
P D Q H P D Q
HD
D Q
Q
P D Q
HD
D Q
Q
H
H
TM
TM
Arch 2
P D Q
HD
D Q
Q
TM
TM
Col
Completely Sequential
Y, Cr, and Cb components parallelized
DCT and Quant
separated
Huffman operations parallelized
PreProcessing (P)
Huffman Encoding (H)
DCT (D)
Quantization (Q)
Table Modifications (TM)
Functional Key:
Collector (Col)
Mapping Guide:
uBlaze
FastSimplex Link (FSL)
FSL
==
Microblaze Soft Processor (uBlaze)
Fu
nc
tio
nal
Mo
del
Arch
itecture
Mo
del
Mapping Process
Problem StatementApproachContribution
System
Est. Cycles Char. Cycles Real Cycles
Rankings
Arch 1 145282 (52%) 228356 (25%) 304585 4, 4, 4
Arch 2 103812 (33%) 145659 (6%) 154217 3, 3, 2
Arch 3 103935 (29%) 145414 (1.2%) 147036 2, 2, 3
Arch 4 103320 (28%) 144432 (<+1%) 143335 1, 1, 1
56/60
Other case studies
• H.264 Deblocking Filter– 14 different mapping explored– Execution time analysis for
computation, waiting, and communication operations.
– Average differences from Metropolis simulation and actual implementation was 3.48%.
• SPI-5 Packet Processing– 6 architecture models developed– Optimal FIFO length determined
57/60
Summary and Conclusions1. Heterogeneity Modularity
– Functional block level Metropolis models of programmable services.• Direct structural correspondence aids accuracy.
Automatic structure extraction creates efficiency.– Independent characterization process of actual
hardware implementations.• Shown to be accurate. Independence creates
efficiency.
2. Complexity Abstraction– Depth/Surface Refinement allows internal
changes to the model.• Trace based formalism accuracy. Automatic checking
efficiency.– Vertical/Horizontal Refinement allow
structural changes to the model.• Event based formalism accuracy. Refinement property
encapsulation efficiency.
Problem StatementApproachContribution
58/60
Thanks
• Questions?• Thanks
– Metropolis Team: Yoshi Watanabe, Felice Balarin, Roberto Passerone, Abhijit Davare, Haibo Zeng, Qi Zhu, Guang Yang, Trevor Meyerowitz, Alessandro Pinto
– Committee: Jan Rabaey, Alberto Sangiovanni-Vincentelli, John Wawrzynek, Lee Schruben
– Industrial: Adam Donlin (Xilinx), Sanjay Rekhi (Cypress)