Upload
usrdresd
View
355
Download
1
Tags:
Embed Size (px)
Citation preview
BY
Massimo Morandi
3D DRESD – 29/07/08
Runtime Core Allocation Runtime Core Allocation Management Management
for 2D Self Partially and for 2D Self Partially and Dynamically Reconfigurable Dynamically Reconfigurable
SystemsSystems
2
Rationale and InnovationRationale and Innovation
Problem statementProviding runtime management support for 2D self partial and dynamical reconfiguration, in particular for what concerns Core placement decisions
Innovative contributionsA fast and flexible solution
A low complexity, to avoid introducing too much overhead at runtimeSupporting different scenarios and placement policies, according to user needs
Allowing the possibility to exploit multiple shapes per Core by integration with area constraints definition
3
AimsAims
Our proposed solution must support different scenarios, placement policies and intervention from the designer
It must be fast when compared to related solutions existing in literature
The quality of the placement choices must be high, in terms of percentage of placement success, global application completion time or other metrics, as defined by the user
4
OutlineOutline
Context Definition
Motivations and GoalsSpecific Contributions to Polaris
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
5
Context DefinitionContext Definition
Reconfigurable hardware:Has the capability of changing its configuration (functionality) according to user needs
Self reconfiguration:the system must be completely autonomous at runtime
Partial reconfiguration:the changes can also involve fractions of the device
Dynamical Reconfiguration:if a part of the hardware is reconfigured, the rest can continue its computation
2D Reconfiguration:arbitrary rectangular slots can be dynamically reconfigured, as opposed to arbitrary columns in 1D
6
A bit of TerminologyA bit of Terminology
7
What’s nextWhat’s next
Context Definition
Motivations and GoalsSpecific Contributions to Polaris
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
8
Motivations and goalsMotivations and goals
The creation and management of a self partially and dynamically reconfigurable system is a complex problem
this is even more critical when exploiting the 2D reconfiguration paradigmmore issues in the definition of area constraints, in the core allocation decisionssince the system must be autonomous, it also needs runtime management functionalities
Need for automation in those processesto reduce the workload on the designerto improve efficiency of the final reconfigurable system
9
Motivations and goalsMotivations and goals
Creation of an automated workflow to generate a self dynamically reconfigurable architecture that:
Has “good” area constraints assigned to coresIs autonomous in performing 2D runtime core allocation decisionsExploits relocation to ensure that the system can obtain the configuration bitstreams it needs at runtimeSupports intervention from the designer, to guide or constraint the decisionsKeeps high flexibility and generality
10
Specific Contributions to Specific Contributions to PolarisPolaris
Solution identification phase of the flow:The definition of area constraints for Cores, when the user does not specify themThe creation of Core Allocation Management solutions, able to efficiently manage runtime Core placement
This last task includes:Offering high versatility, supporting different placement policies and different scenariosKeeping low complexity, to avoid too much overhead in the running time of the systemExperimenting techniques to improve the efficiency, for example allowing multiple shapes per Core
11
What’s NextWhat’s Next
Context Definition
Motivations and GoalsSpecific Contributions to Polaris
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
12
Area Constraints DefinitionArea Constraints Definition
The designer can choose to specify or not the AC for each Core in the application
If not specified, they are automatically computed
The designer can also choose wheter to allow multiple shapes per Core (and how many)
Finally, the last parameter represent the tightness of the constraints that will be defined:
Impacts on feasibility of implementationImpacts on performance of the RFU
CORE RFU (or set of RFUs)
13
Area Constraints DefinitionArea Constraints Definition
The constraints are defined with a simple heuristics
First a square-like constraint is defined, using these formulae:
Where H is the height (in slice) and W is the width, S is the number of slices of the Core and m is the tightness
14
Area Constraints DefinitionArea Constraints Definition
Then, the constraints are converted from slice to slots
Where Vg is a granularity parameter, Vslices is the number of vertical slices in the device and avgH is the average height of all the RFUs defined with the square-like formula
Finally, the constraints (in slots) are iteratively altered to horizontally or vertically stretch the Core and obtain multiple RFUs
15
What’s nextWhat’s next
Context Definition
Motivations and GoalsSpecific Contributions to Polaris
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
16
Runtime Core Allocation Runtime Core Allocation ManagementManagement
The Problem:Perform the choice of where to place new cores on the reconfigurable areaIn an online scenario: self partial and dynamical reconfiguration
The Goal:Allow efficient usage of the FPGA area Critical in the 2D reconfiguration case
This requires the creation of a solution for allocation management and suitable policies
17
Allocation Manager Desired Allocation Manager Desired FeaturesFeatures
Low Core Rejection Rate (CRR)% of cores that are not successfully placed in time
Fast application completion timeTime from arrival of first Core to completion of last
Low fragmentation gradeFraction of area that is unusable because too sparse
Small management overheadWe want a lightweight solution to run inside the system
High routing efficiencyIf interacting cores are clustered, the system is more efficient
Need to find a good compromise between them
18
Example: 2D fragmentationExample: 2D fragmentation
the 2D-fragmentation problem:Area generally more fragmentedCan nullify the area optimizations obtained
19
Example: Core RejectionExample: Core Rejection
Bad choices can lead to performance loss and rejection
A: Core C is successfully placed at step 2B: Core C is delayed (possibly rejected, if deadline=2)
20
Considered ScenariosConsidered Scenarios
Dynamic ScheduleCores can arrive at any timeHave an ASAP and an ALAP time (dependencies)Rejection: failure to respect ALAP for a CoreGoal: respect the schedule, CRR is the most important metric and should tend to zero
Blind ScheduleCores can be either available from the start or arrive at different times, no dependencies assumedno ASAP, Cores can optionally have a deadlineIf a Core is not placed, retry laterGoal: application must complete as fast as possibile, rejection is not the main issue, total time is
21
Allocation Manager CreationAllocation Manager Creation
Choose how to maintain information on empty space
Keep all information (Expensive but more accurate)Heuristically prune information (Cheaper)
Which placement policy to choose:General (First Fit, Best Fit, Worst Fit…)Focused (Fragmentation Aware, Routing Aware… )
Define in which scenario(s) the manager will work
It can also be useful to consider and exploit different shapes of a Core (multiple RFUs per Core scenario)
22
What’s nextWhat’s next
Context Definition
Motivations and GoalsSpecific Contributions to Polaris
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
23
Relevant WorksRelevant Works
Maintain complete information on empty space:
KAMER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
Keep All Maximally Empty RectanglesApply a general placement policy
CUR: A. Ahmadinia and C. Bobda and S. P. Fekete and J. Teich and J. v.d. Veen, ''Optimal Routing-Conscious Dynamic Placement for Reconfigurable Devices'', Field-Programmable Logic and Applications (FPL'04), 2004.
Maintain the Countour of a Union of RectanglesApply a focused placement policy
24
Relevant WorksRelevant Works
Heuristically prune part of the information:
KNER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
Keep Non-overlapping Empty RectanglesApply a general placement policy
2D-HASHING: H. Walder and C. Steiger and M. Platzner, ''Fast Online Task Placement on FPGAs: Free Space Partitioning and 2D-Hashing'', International Parallel and Distributed Processing Symposium (IPDPS'03), 2003.
Keep Non-ov. Empty Rectangles in optimized data structure
Apply (exclusively) a general placement policy
25
Example: Empty Space Example: Empty Space InformationInformation
26
EvaluationEvaluation
The solutions with higher placement quality also have higher complexityThe fastest solution cannot exploit focused policies, for example routing aware, and adds the overhead of maintaining the 2D hashing structureCUR does not support all general policies, for example Best Fit is not allowed
27
What’s nextWhat’s next
Context Definition
Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
28
Proposed ApproachProposed Approach
Choice driven by:Need for a low complexity solution to introduce low overhead at runtime in the self reconfigurable systemDesire to keep high flexibility, to suit user needs also in terms of placement policies
For this reasons we propose an heuristic (KNER-like) empty space manager:
Supporting general and focused placement policies (in particular, First Fit, Best Fit and Routing Aware)Suitable for both dynamic schedule and blind schedule scenariosExploiting multiple RFUs per Core, to improve results
29
Data RepresentationData Representation
Core, defined by:Arrival time,Set of RFUs, each one with:
H, W, Latency
Optional set of communicating Cores (if using RA)ASAP and ALAP (if in dynamic schedule scenario)
Two queues: one for new Coresone for Cores that were not successfully placed and need reexamination
30
Data RepresentationData Representation
Reconfigurable Device, represented as:Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle.Navigation trough:
pointers to left child, right child, next leafa function to find the previous leaf (used for bookkeeping after rectangle split and merge operations)
Rectangle, defined by:Coordinates on device: X, YSize: H, WInitially one, the root, with:
(X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols
31
The Online Placement The Online Placement AlgorithmAlgorithm
The whole processing of a Core is completed in linear time
32
The Online Placement The Online Placement AlgorithmAlgorithm
33
The Online Placement The Online Placement AlgorithmAlgorithm
34
What’s nextWhat’s next
Context Definition
Motivations and GoalsSpecific Contributions to Polaris
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
35
Evaluation of the proposed Evaluation of the proposed solutionsolution
To evaluate the quality of the proposed approach in various scenarios and with different metrics 3 kinds of experiment were performed:
1) A comparison against presented literature solutions
In a dynamic schedule scenarioWith a Routing Aware placement policyMeasuring CRR (and indirectly fragmentation), routing costs and computational overheadResults published in:
M. MORANDI, M. Novati, M. D. Santambrogio, D. Sciuto, “Core allocation and relocation management for a self dynamically recongurable architecture”, IEEE Computer Society Annual Symposium on VLSI, 2008
36
Evaluation of the proposed Evaluation of the proposed solutionsolution
2) A measure of application completion timeComposed of real Cores used as benchmarksIn a blind schedule scenarioDirectly measuring application completion time, gaining some insight on CRR and fragmentation
3) Evaluation of the multiple shapes per Core approach
Comparison between our solution with multiple shapes and KNER (adapted to blind schedule scenario)In a mixed scenario (blind schedule with deadlines and variable arrival times)Using both First Fit and Best FitMeasure of CRR and running time
37
Experiment 1: Routing AwareExperiment 1: Routing Aware
Version of our general solution:Tailored to minimize routing pathsCompared with close solutions from literatureNamed in the table RALP (Routing Aware Linear Placer)
Benchmark of 100 randomly generated tasks:Size (5% to 20% of FPGA), randomly interconnected
38
Experiment 2: Appl. Completion Experiment 2: Appl. Completion TimeTime
Benchmark applications composed of cores taken from opencores.org like JPEG, AES, 3DES
Measure the time instants needed to complete the applications with different amounts of resources
Infinite resources is shown, to compare against the lower bound
39
Experiment 3: Multiple ShapesExperiment 3: Multiple Shapes
Similar benchmark, but Cores have deadlines (for CRR)Shapes defined using the heuristic described previously
Difference in runtime is on average 30% more for 3 shapes and 40% more for 5 shapes w.r.t. 1 shapeCRR is more than halved, often reduced to one third
40
Numerical ExampleNumerical Example
To give an idea of the goodness of the obtained results, it is useful to give some numerical values for reconfiguration
Let us consider a JPEG Core, described by a 690 Kb configuration bitstream for a V4 device and using about 10% of the total area
Reconfiguration time: 150 msRelocation time: 90 msPlacement time: 0.4 ms
The obtained time is low and is suitable to actual usage in a real system
41
Concluding RemarksConcluding Remarks
The proposed solution offers:High versatility, supporting different placement policies and scenarios, designer intervention, multiple shapesLow overhead, always processing a Core in linear time and obtaining good results compared with literatureGood CRR, especially when exploiting multiple shapesFast application completion time, as shown by exp. 2Effective routing costs reduction, when used in conjunction with a Routing Aware policy (exp. 1)
The original goals were metUnder Review:
S. Corbetta, M. MORANDI, M. Novati, M. D. Santambrogio, D. Sciuto, P. Spoletini, “Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration”, IEEE Transactions on VLSI (2nd review)
42
Future WorkFuture Work
Future work will be in the direction of integration with the rest of the workflow that was briefly introduced
The parts that were described achieved good results as a stand-alone in the runtime management of the reconfigurable system, it is important to evaluate them also inside the complete workflow
The final goal is to achieve complete automation in the creation process of a self dynamically reconfigurable architecture, from user specification up to bistreams and processor code generation
43
QuestionsQuestions