Upload
sofia-ede
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip
Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture (HDCA)
ByJ. Robert (Bob) Heath**, Sridhar Hegde, Kanchan Bhide, Paul Maxwell, Xiaohui Zhao and Venugopal
DuvvuriDepartment of Electrical and Computer Engineering
University of KentuckyLexington, Kentucky 40506
Heath MAPLD 2005/247 2
AbstractThere appear to be an increasing number of real-time and non-real-time computer applications where
the application may be described by process and/or data-flow graphs (from here on we use the term “process flow graphs”). Such applications include radar signal processing, sonar signal processing, various system simulation environments utilized within Computer Aided Design (CAD) software systems, communications signal processing, routing, collection and processing of data from multiple sensors/instruments, its storage, etc. For such applications, a first goal is the availability of a computer system/architecture platform which will allow an application described by a process flow graph of any topology to be mapped to and executed on the computer system/architecture. The application process flow graph could be single or multiple input/output and cyclic or acyclic. Processes are represented by nodes of the graphs. Further, it would be desirable for the computer system/architecture to be able to continue execution of the application with minimum interruption if the application process flow graph topology were to dynamically change during application execution. This goal is referred to as application level reconfigurability. A second goal for the same computer system/architecture would be that it have the ability to dynamically on-the-fly configure, move, or assign processors or other physical resources to application processes (and/or vice versa, the assignment of additional copies of a process to additional processors) that may need them at any time. This goal is referred to as node level reconfigurability. A third goal for the same computer system/architecture would be that it be a single-chip heterogeneous multiprocessor system and that it would have the capability to dynamically on-the-fly configure and reconfigure, if and when needed, single processor architectures within the overall multiprocessor architecture. We refer to this goal as processor architecture level reconfigurability. With proper Operating System (OS) and other system software support, a computer system/architecture platform which can meet these three goals should be able to execute a wide range of non-real and real-time applications described by process flow graphs of any topology in a fault tolerant manner. The contributions of this paper are in that it describes the research and development and current status of the development, testing and evaluation of such a computer system architecture. HDL “virtual prototype” functional and performance simulation testing results are shown for the architecture executing simple hypothetical applications. Future research, development and testing of the architecture is addressed. The described architecture paradigm and platform is known as a single-chip Hybrid Data/Command Driven Architecture (HDCA) system. A reconfigurable/dynamic production HDCA system would be implemented to Programmable Logic Devices (PLDs).
Heath MAPLD 2005/247 3
Goals, Objectives and Functionality of HDCA System
• Applicable to a wide-range of applications, especially those modeled by process flow graphs.
• Heterogeneous Shared-Memory Model Multiprocessor Architecture.• Allows a mix of Simple and Complex Special-Purpose and General-Purpose
Processors. • Single-Chip Architecture Implemented to Programmable Logic Device (PLD)
Technology.• May be used for real-time or non-real-time applications.• Scalable architecture.• Fault-tolerant architecture.• May operate in a data-driven or command-driven environment at process
level.• Idea is for a small number of short control-tokens to flow through the
architecture rather than more voluminous data.• Dynamic/Reconfigurable at the “application level”.• Dynamic/Reconfigurable at the “node level”.• Dynamic/Reconfigurable at the “processor architecture level”.
Heath MAPLD 2005/247 4
Application Description via Process Flow Graphs and Illustration of
Dynamic/Reconfigurability at the “Application Level”
Heath MAPLD 2005/247 5
Application Description via Process Flow Graphs and Illustration of Dynamic/Reconfigurability at the “Application Level” (continued)
Another Process Flow Graph Describing an Application With a Different Topology.
Heath MAPLD 2005/247 6
Illustration of Dynamic/Reconfigurability at the “Node Level”(Dynamic assignment of a process running on an overloaded Computing Element (CE)
processor, to additional CE processors, to help-out the overloaded CE processor)
Heath MAPLD 2005/247 7
Dynamic/Reconfigurability at the “Processor Architecture Level”
Goal - Dynamically, while an application is running, be able to reconfigure (restructure) a Processor Architecture to enhance performance as dynamic changes may occur in application data and process algorithmic structure.
Heath MAPLD 2005/247 8
HDCA System Organization and Architecture (High-Level Functional View)
Heath MAPLD 2005/247 9
Architectural View Of a Current Single-Chip HDCA System Instantiation
Heath MAPLD 2005/247 10
A Functional Level View of the CE Controller.
Heath MAPLD 2005/247 11
Brief Overview of HDCA Functional Units
• Process Request Token (PRT) Mapper.
– A Hardware Dynamic Load-Balancing System.– For a Process Requested by a Control Token, It Determines the
CE Containing a Copy of the Process Where Wait-Time to Execute the Requested Process is Minimum. CE Input Queue Depth is Used as the Parameter to Determine Minimum Wait Time (Least Depth) to Execution. CE Queue Depth is Directly Proportional to Wait Time via Utilization of “Dummy Tokens”.
– Detects Some Faults and System Failures.
Heath MAPLD 2005/247 12
PRT_IN
High Level Architectural Diagram of the Process Request Token (PRT) Mapper
RAM TABLES
AVAILAIBILITY ROUTER COMPARATORS
Heath MAPLD 2005/247 13
Multifunctional Queue (Functionality: FIFO queue, simultaneous
R/W, queue depth indication, signal when a programmable queue threshold depth is reached, switch order of any two entries, report input rate over a programmable time-interval, and report change in
input rate over a programmable time-interval)
Heath MAPLD 2005/247 14
Crossbar Interconnect Network (Variable-Priority Memory Contention Resolution Protocol-Priority Based on CE Queue Depths. Deepest Queue Depth Indicates “Most-Behind” .)
Heath MAPLD 2005/247 15
HDCA System CEs (Processors) for Previously Shown Instantiation
• Memory Register Computer Architecture CE– For ALU Instructions, one operand in Memory and
another in Register.– 16-Bit Wide Words/Operands.– 16 and 32-Bit Wide Instructions.– Sixteen Assembly Language Instructions.– I/O Structure.– Hardware Vectored Priority Interrupt System, etc.
Heath MAPLD 2005/247 16
Memory Register Computer Architecture CE Organization
Heath MAPLD 2005/247 17
DIVIDENDDIVISOR
FROM SHARED DATA MEMORY
Heath MAPLD 2005/247 18
Multiplier CE Organization/Architecture
Pipelined Multiplier
Multiplicand RegMultiplier Reg
Mux
Instruction Memory8x16
Instr Mar
Mux
Data loc1 Data loc2
Mux Mux
Controller
R2
Mux
Mux
adder
16
8 8
8 8
16
8
8
Heath MAPLD 2005/247 19
Control-Token Formats
• Important token formats for the HDCA
Heath MAPLD 2005/247 20
Token Formats ( Continued..)
Heath MAPLD 2005/247 21
Interface Controller State Diagram (There is an Interface Controller Within the CE Controller Module of Each CE-Responsible for Control of HDCA)
Heath MAPLD 2005/247 22
Hardware Description Language (HDL) Description of HDCA System
• VHDL Used as HDL.• Mostly Behavioral and RTL Level Coding Style Used.• Top-Down HDCA System Architecture Development and
Design Style Used.• Structural Bottom-Up Coding and Testing Style Used
(Lower Level Functional Units First Described and Tested Before Being Integrated Into Higher Level Functional Units).
• Generic and Parameterized Coding Style Used When Applicable.
• Approximately 150 Pages (8.5” x 11”) of Single-Spaced 10-Point Font VHDL Code for Shown 5 CE Configuration.
Heath MAPLD 2005/247 23
CAD Systems Used in Development and Testing of Single-Chip HDCA System (VHDL System Capture, Synthesis,
Post-Synthesis Simulation Testing, Implementation, Post-Implementation Simulation Testing and Evaluation (Virtual
Prototyping)
• Xilinx ISE 6.2.3 CAD software tool set used for system capture, synthesis and implementation to FPGA technology (Xilinx Virtex 2 – XC2V8000 FPGA chip).
• Modelsim PE 5.7g was used as the HDL simulator. • The host PC for the Xilinx and ModelSim CAD software was a high
performance AMD Athlon processor running Windows XP, 32 bit edition at 2.16 GHz with 2GB of RAM. Input stimuli were added through the HDL bencher, where timing constraints could also be specified. Post-Implementation simulation (after Map, Place and Route) was carried out using ModelSim with test vector sets developed for different applications and after the Input ROM and the Instruction Memories of the Memory/Register Architecture CEs of an HDCA system have been initialized using the Memory Editor tool provided in Xilinx.
Heath MAPLD 2005/247 24
HDCA System Testing, Evaluation and Validation via HDL Virtual Prototyping
Example Simple Applications (All Successfully Executed by HDCA)
1. Acyclic Integer Manipulation Algorithm.2. Acyclic Matrix Multiplication Algorithm 1.3. Acyclic Matrix Multiplication Algorithm 2.4. Acyclic Pipelined Integer Manipulation Algorithm.
(Will View in Some Detail-Uses All Heterogeneous CEs of an Experimental HDCA System)
5. Cyclic Non-Deterministic Value Swap Application.6. Other Applications.
Heath MAPLD 2005/247 25
Acyclic Pipelined Integer Manipulation Algorithm(Will simultaneously execute two copies of algorithm, each with a
different set of data)
Process Flow graph for the Algorithm
Heath MAPLD 2005/247 26
Input first five values of the ten values for first copy of the application - P1
5 Values of x”02” being input into shared data memory at consecutive locations starting from x”03”
Heath MAPLD 2005/247 27
Process P7 for Copy 1 of Application – Displays Final result at address location x”0F”
Unsigned 15
At x”0F”
Last Instruction – Copy 1
Heath MAPLD 2005/247 28
Conclusions and Future Research• Conclusions
– Validation of the Concept of a HDCA Accomplished via Virtual Prototyping – Parallel Single-Chip Multiprocessor System, Hybrid, Heterogeneous, Dynamic/Reconfigurable at Application and Node Levels, Implementable to PLD Technology, etc.
– Scalable Architecture/Design at the same time also a SoC.– Can Simultaneously Execute Multiple Copies of an Application, each with
different sets of data.– Potential for Execution of a Wide Range of Applications (Radar signal
processing; communications (packet driven) processing; image (pixel driven) processing; satellite data-stream processing; embedded computing applications including control applications; collection, processing and storage of data from multiple sensors/instruments, etc)
– Can Execute More Complex Applications.• Future Research
– Include More Complex Processors Into Experimental Model of HDCA In Addition to an Operating System (Linux, etc?).
– Further Research Into Development and Refinement of the Concept of “Reconfigurability at the Processor Architecture Level”.
– Identification and Adaptation to Several “Real Applications”!!