Upload
usrdresd
View
266
Download
3
Tags:
Embed Size (px)
Citation preview
POLITECNICO DI MILANO
PolarisPolaris
2
A workflow to manage allocation and relocation of tasks in a reconfigurable architecture
Final goal: complete architecture (bitstreams) generation
PolarisPolaris
POLITECNICO DI MILANO
Management of 2D Management of 2D Reconfiguration in a Reconfiguration in a
Reconfigurable SystemReconfigurable System
Massimo [email protected]
4
OutlineOutline
IntroductionProblem description Project Goals and Contributions
Project in detailsPhasesResults
Future Work
Problem DescriptionProblem Description
New Generation of FPGAsVirtex-4 and Virtex-5Allow bi-dimensional reconfiguration
This permits to:Better exploit reconfigurable areaObtain modules performance optimizations
More complex management:Handle one more degree of freedomAvoid more fragmentationPerform good placement choices to keep low TRRKeep acceptable intra-module routing paths
5
Project Goals and Project Goals and ContributionsContributions
Analyze effects of 2D reconfigurationNew advantagesNew problems
Examine possible solutions to new problemsExplore literature to find promising ideasEvaluate those solutions in various scenarios
Propose a new solutionCombining ideas from literature with new onesObtaining good cost-quality tradeoff
6
Setting and Advantages Setting and Advantages DefinitionDefinition
Definition of the setting:2D self partial dynamical run-time reconfiguration
Analysis of the advantages of 2D ReconfigurationIn area usage and performance
7
8
2D Fragmentation Problem2D Fragmentation Problem
Analysis of the 2D-fragmentation problemArea generally more fragmentedCan nullify the area optimizations obtained
9
Placement DecisionsPlacement Decisions
Analysis of 2D placement choices effects:Again, bad choices can lead to performance loss
10
Allocation managerAllocation manager
Definition of allocation manager desired features:Low TRRLow management overheadHigh routing efficiencyLow fragmentation
Definition of allocation manager structure:Empty space manager
Complete space Heuristic selection
FitterGeneral (FF,BL,BF,WF…)Focused (FA,RA… )
Most relevant worksMost relevant works
Maintain complete information on empty space:KAMER:
Keep All Maximally Empty RectanglesApply a general fitting strategy
CUR:Maintain the Countour of a Union of RectanglesApply a focused fitting strategy
Heuristically prune part of the information:KNER:
Keep Non-overlapping Empty RectanglesApply a general fitting strategy
2D-HASHING:Keep Non-ov. Empty Rectangles in optimized data structure
Apply (exclusively) a general fitting strategy11
Evaluation and Proposed Evaluation and Proposed ApproachApproach
Proposed ApproachHeuristic (KNER-like) empty space manager, to keep low complexity for use in a self-reconfigurable systemFitting strategy focused on minimizing routing paths, to maintain high performance of the reconfigurable system (chosen metric to minimize Manhattan distance)12
High placement quality => high complexityLowest compl. => no focused fitting (bad especially for routing)
13
Structure of the allocation managerStructure of the allocation manager
Task, defined by:Arrival time, ASAP, (ALAP), H, W, Latency, Communicating TasksHosted in a queue which also adds a pointer to the rectangle where it is placed
Reconfigurable Device, represented as:Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle. Navigation trough pointers to left child, right child, next leaf and a function to find previous leaf (for bookkeeping after split or merge)
Rectangle, defined by:X, Y, H, WInitially one, (X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols
14
The Placement AlgorithmThe Placement Algorithm
Experimental ResultsExperimental Results
Benchmark of 100 randomly generated tasks:Size (5% to 25% of FPGA), randomly interconnected
Execution time: 3x less than CUR, close to KNERCommunication cost: 3x less than KNER, close to CURTask Rejection Rate: all solutions quite close
15
Future WorkFuture Work
Apply the proposed solution to self reconfiguration:
Adapt the algorithm to run on the internal processorCreate a validation reconfigurable architectureIntegrate the architecture with relocation
Tune the algorithm to improve results:Experiment techniques to reduce TRRTry to optimize the code to have an algorithm with lower running time
Evaluate other fitting strategies16
17
Questions?Questions?
POLITECNICO DI MILANO
Relocation for 2D Relocation for 2D Reconfigurable SystemsReconfigurable Systems
Marco [email protected]
19
Project OutlineProject Outline
IntroductionProblem descriptionProject Goals
Project in detailsPhases Results
What’s next
ProblemProblem DescriptionDescription
Self Dynamical Runtime 2D ReconfigurationXilinx Virtex-4 and Virtex-5
Relocation, different solutionsSoftwareHardware
We chose an hardware solutionBiRF Square
20
Project GoalsProject Goals
Study of the new FPGA FamiliesExamination of Xilinx documentation on V4 and V5
Analysis of the new bitstream structureGeneration of V4 and V5 bitstream
Development of the new version of BiRFImplementationValidation
21
New Frame Addressing:Possibility of addressing rows and columns
Frame Addressing (1/2)Frame Addressing (1/2)
22
23
Frame Addressing (2/2)Frame Addressing (2/2)
24
New ParserNew Parser
CRC CalculationCRC Calculation
Particular CRC value, used by Xilinx tools
Two version of BiRF Square:By using the “predefined” valueWith actual CRC calculation
An optimized algorithm has been used
25
Synthesis resultsSynthesis results
On a Virtex-4 with speed grade -12General purpose version: max frequency of 160 MHzSpecific version: max frequency of 290 Mhz
26
27
Target DeviceTarget Device
28
Validation ArchitectureValidation Architecture
Results (1/2)Results (1/2)
BiRF SquarePermits apply relocation in a self partially and dynamically 2D-reconfigurable systemThe occupation ratio is relatively smallFrequency more than acceptableReduction of internal memory requirements
29
Results (2/2)Results (2/2)
Throughput of 7,3 MB/s:
A total configuration file size is about 1 MBConsidering an architecture:
1/3 of the area as fixed part 2/3 as reconfigurable part with 6 slots
With such hypothesisSize of a partial bitstream will be about 110 KBRelocation time of about 15 ms
30
What’s NextWhat’s Next
Future improvements:Direct access to the memory (DMA)
Direct manipulation of the bitstreamPortability
Integration with ICAPElimination of the relocation overhead Relocation time << reconfiguration time
Future work:Provide a simulation framework to monitor the reconfigurable system evolution and to evaluate different choices
The final goal:Creation of a real architecture that exploits self partial and dynamical 2D-reconfiguration,with relocation
31
32
QuestionsQuestions