Upload
usrdresd
View
267
Download
1
Tags:
Embed Size (px)
Citation preview
Time-driven reconfiguration-aware
floorplacer
BY
Alessio Montone
NDDRESD Baceno, Goglio – Jul 29, 2008
2
Rationale and InnovationRationale and Innovation
Problem statementGiven a reconfigurable architecture, find an on-chip position for each functional unit
Innovative contribution: taking into accountTarget Device HeterogeneityTarget Device reconfiguation capabilities Inter-FU Communication
3
AimsAims
Considering the area assignment problem tailored for reconfigurable architectures, providea formalization of the problem, andan approach (in 3 algorithms) for solving
4
OutlineOutline
IntroductionFloorplacementThe Proposed ApproachExperimental ResultsComparison with the state of the artConclusions and Future WorksQuestions
INTRODUCTIONINTRODUCTION
5
Introduction: Simulated Introduction: Simulated AnnealingAnnealing
Metropolis PhysicsAn improving move is always acceptedA peggiorative move accepted with a decreasing probability
6
•N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. "Equations of State Calculations by Fast Computing Machines". Journal of Chemical Physics, 1953
•Sëma Kachalo, Hsiao-Mei Lu, and Jie Liang, Protein Folding Dynamics via Quantification of Kinematic Energy Landscape, Physical Review Letters, 2006
Reconfigurable Architectures - IReconfigurable Architectures - I
On FPGAsReconfigurable DevicesHeterogeneousReconfiguration Limits
Different types of Reconfigurable Architectures:
TotalPartial (Static)Partial (Dynamic)
7
Reconfigurable Architectures - IIReconfigurable Architectures - II
Partial Dynamic
8
Area Assignment ProblemArea Assignment Problem
Let consider a Reconfigurable ArchitectureGiven a scheduled task graph (TG) of the application
Node: Reconfigurable Functional Unit (RFU) [*], A netlist obtained after post synthesis and technology mapping (i.e., before placement and routing)
Aim: find an area assignment for each RFU
9
[*] K. Bazargan, R. Kastner, M.S.: 3-d floorplanning: Simulated annealing and greedy placement methods for reconfigurable computing systems. IEEE Rapid Systems Prototyping (1999)
Related Works - IRelated Works - I
[*] introduced the concept of 3D floorplanning for reconfigurable systems
SA in order to solve HW/SW codesign problemFor each task choose between
HW implementationSW implementation
LimitsNo device limits consideredNo communicationinfrastructure
10
[*] K. Bazargan, R. Kastner, M.S.: 3-d floorplanning: Simulated annealing and greedy placement methods for reconfigurable computing systems. IEEE Rapid Systems Prototyping (1999)
Related Works - IIRelated Works - II
[*] is the state of art in 3D floorplanningSimulated Annealing over Transitive Closure GraphTakes into account device reconfiguration limits
LimitsNo heterogeneity consideredHigh overhead communicationinfrastructure solution [**]
11
[*] Ping-Hung Yuh, Chia-Lin Yang, Yao-Wen Chang, Hsin-Lung Chen: Temporal Floorplanning Using 3D-subTCG, Design Automation Conference, 2004
[**] S. P. Fekete, E. Kohler, and J. Teich: Optimal FPGA Module Placement with Temporal Precedence Constraints, Proc. DATE, 2001.
FLOORPLACEMENTFLOORPLACEMENT
12
Floorplanning vs. PlacementFloorplanning vs. PlacementCharacteristic Floorplanning Placement
# items <100 >10.000
Items (for FPGAs) IP-Core Slice, CLB
Aim Find a position for each item
obj. function depends
mainly on Area mainly on Wirelength
Constraints Items can be positioned everywhere
There is a set of possible positions
13
PlacementFloor plan
Floorplacement - IFloorplacement - I
Hierarchical Approach (Floorplanning + Partitioning)
14
S. N. Adya, I. L. Markov, Fixed-outline Floorplanning: Enabling Hierarchical Design, IEEE Transaction on VLSI System, 2002
S. N. Adya, S. Chaturvedi, J. A. Roy, David A. Papa, I. L. Markov: Unification of Partitioning, Placement and Floorplanning , IEEE Intl. Conf. on CAD, 2004
Floorplacement - IIFloorplacement - II
Reconfigurable Functional Unit (RFU)A netlist obtained after post synthesis and technology mapping (i.e., before placement and routing)
Reconfigurable Region (RR)
15
Floorplacement - IIIFloorplacement - III
Resource Aware (i.e., not all positions are feasible)
Device heterogeneity
Device Reconfigurationcapabilities
16
THE PROPOSED THE PROPOSED APPROACHAPPROACH
17
Proposed Problem DefinitionProposed Problem Definition
18
AimDefine RRsFor each task find
find a RRA position inside RR
Objective FunctionMin. Fragmentation
ConstraintsCommunication issuesDevice limits
Target Devices: Xilinx Virtex 4 - 5Target architecture based on EAPR design flow
Target Architecture and DevicesTarget Architecture and Devices
19
InputInput
20
Proposed Approach: overviewProposed Approach: overview
21
11stst Algorithm: Partitioning into Algorithm: Partitioning into RRRR
Aim: identify the RRs and associate each RFU to one RRHow: partitioning the TG minimizing resource requirement variance of the RRs (moving and swapping nodes)
22
Resource of type t required by RFU n, at static photo p
22ndnd Algorithm:TFiRR - I Algorithm:TFiRR - I
23
Temporal Floorplacement inside RR (TFiRR)Aim: for each RR find a set of feasible width-height pairsHow: floorplacing RFUs inside corresponding RR Assumption: RFUs’ height = height of the RR they belong to
Pseudo Code:
22ndnd Algorithm:TFiRR - II Algorithm:TFiRR - II
24
Let consider an iteration:
22ndnd Algorithm:TFiRR - III Algorithm:TFiRR - III
25
Let consider an iteration:
22ndnd Algorithm:TFiRR - IV Algorithm:TFiRR - IV
26
Let consider an iteration:
22ndnd Algorithm:TFiRR - V Algorithm:TFiRR - V
27
Let consider an iteration:
22ndnd Algorithm:TFiRR - VI Algorithm:TFiRR - VI
28
22ndnd Algorithm: TFiRR – Example Algorithm: TFiRR – Example
29
33rdrd Algorithm: RR floorplacement - I Algorithm: RR floorplacement - I
Simulated AnnealingObjective Function
Data Structure4 Constraint Lists (one per row)
30
33rdrd Algorithm: RR floorplacement - II Algorithm: RR floorplacement - II
Simulated Annealing: movesSwap two RRsMove one RRsSpan over one more rowUn-Span over one less row
After each move packing is performed(i.e., the floorplacement is compressed)
31
33rdrd Algorithm: RR floorplacement – Example - I Algorithm: RR floorplacement – Example - I
32
33rdrd Algorithm: RR floorplacement – Example - II Algorithm: RR floorplacement – Example - II
33
ImplementationImplementation
Three simulated annealers written in C++ STL
34
Output ExamplesOutput Examples
TFiRR
RR Floorplacement
35
EXPERIMENTAL RESULTSEXPERIMENTAL RESULTS
36
Identification of the number of RRs Identification of the number of RRs
37
Partitioning’s impact on TFiRRPartitioning’s impact on TFiRR
TFiRR on Partitioned
TG
TFiRR on TG
Execution Time 125ms 114ms 4m54s
Width (normalized)
1.00 1.19 1.04
38
Increasing the number of RFUs decreases the possibility to pick up the right one
Partitioning is a precondition of the 3rd algorithm in order to better exploit FPGA’s area (2D Floorplacement)
Tests performed directly floorplacing RFUsExecution time about 100 ms (100K iterations)
Floorplacement – Success Floorplacement – Success RateRate
39
Floorplacement – Aspect RatioFloorplacement – Aspect Ratio
40
Tests performed directly floorplacing RFUs
COMPARISON WITH THE COMPARISON WITH THE STATE OF THE ARTSTATE OF THE ART
41
State of the artState of the art
42
Authors Comm. Infrastructure
ResourceAware
Reconfiguration Aware
Device LimitsAware
Bazargan et al. No No Yes No
Yuh et al. Limited, w/ High Overhead
No Yes Yes
Singhal et al. No No Yes No
Feng et al. No Yes No No
NotesNotes
The comparsion is performed with respect to the description given by Yuh et al in [*]
Yuh’s approach does not supportMultiple ResourcesExistence of a Static side
In order perform the comparisonthe case study has been chosen in order to avoid multiple resource limitationYuh’s approach has been extended to support a static side
43
[*] Ping-Hung Yuh, Chia-Lin Yang, Yao-Wen Chang, Hsin-Lung Chen: Temporal Floorplanning Using 3D-subTCG, Design Automation Conference, 2004
The Case StudyThe Case Study
A Reconfigurable Architecture (for Biomedical Purpose) on XC5VLX30T 1. Collecting data from sensor2. Elaborating them3. Sending to a host computer thorough the net
44
The Proposed SolutionThe Proposed Solution
It is a reconfigurable architecture with 2 static photos
45
Area assignments comparisonArea assignments comparison
46
Proposedsolution
Yuh’s solution
Communication performances comparisonCommunication performances comparison
Considering 100 M samples, 32 bit each, at 75 MHz.
The entire data are transferred byThe Proposed Approach in 2.6 secondsYuh’s approach in 416.0 seconds
47
Conclusions - IConclusions - I
An algorithm for the identification of area constraint for reconfigurable architectures has been introduced
Novelties: taking into accountTarget device heterogeneityTarget device reconfiguration capabilitiesCommunication issues
48
Conclusions - IIConclusions - II
Results have been publishedA. Montone, M.D. Santambrogio, D. Sciuto, A Design Workflow for the Identification of Area Constraints in Dynamic Reconfigurable Systems, IEEE International Symposium on Electronic Design, Test and Applications (DELTA), 2008A. Montone, M.D. Santambrogio, Area Constraint Evaluation for FPGAs, The Syndicated Q1-2008, A technical newsletter for FPGA, ASIC Verification and DSP Designers, Synplicity Incorporation
49
Future WorksFuture Works
Take into consideration IOBs and inter modules communications
Partitioning considering clock regions
50
QuestionsQuestions
51