28
Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture (HDCA) By J. Robert (Bob) Heath**, Sridhar Hegde, Kanchan Bhide, Paul Maxwell, Xiaohui Zhao and Venugopal Duvvuri Department of Electrical and Computer Engineering University of Kentucky Lexington, Kentucky 40506 **[email protected]

Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Embed Size (px)

Citation preview

Page 1: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip

Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture (HDCA)

ByJ. Robert (Bob) Heath**, Sridhar Hegde, Kanchan Bhide, Paul Maxwell, Xiaohui Zhao and Venugopal

DuvvuriDepartment of Electrical and Computer Engineering

University of KentuckyLexington, Kentucky 40506

**[email protected]

Page 2: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 2

AbstractThere appear to be an increasing number of real-time and non-real-time computer applications where

the application may be described by process and/or data-flow graphs (from here on we use the term “process flow graphs”). Such applications include radar signal processing, sonar signal processing, various system simulation environments utilized within Computer Aided Design (CAD) software systems, communications signal processing, routing, collection and processing of data from multiple sensors/instruments, its storage, etc. For such applications, a first goal is the availability of a computer system/architecture platform which will allow an application described by a process flow graph of any topology to be mapped to and executed on the computer system/architecture. The application process flow graph could be single or multiple input/output and cyclic or acyclic. Processes are represented by nodes of the graphs. Further, it would be desirable for the computer system/architecture to be able to continue execution of the application with minimum interruption if the application process flow graph topology were to dynamically change during application execution. This goal is referred to as application level reconfigurability. A second goal for the same computer system/architecture would be that it have the ability to dynamically on-the-fly configure, move, or assign processors or other physical resources to application processes (and/or vice versa, the assignment of additional copies of a process to additional processors) that may need them at any time. This goal is referred to as node level reconfigurability. A third goal for the same computer system/architecture would be that it be a single-chip heterogeneous multiprocessor system and that it would have the capability to dynamically on-the-fly configure and reconfigure, if and when needed, single processor architectures within the overall multiprocessor architecture. We refer to this goal as processor architecture level reconfigurability. With proper Operating System (OS) and other system software support, a computer system/architecture platform which can meet these three goals should be able to execute a wide range of non-real and real-time applications described by process flow graphs of any topology in a fault tolerant manner. The contributions of this paper are in that it describes the research and development and current status of the development, testing and evaluation of such a computer system architecture. HDL “virtual prototype” functional and performance simulation testing results are shown for the architecture executing simple hypothetical applications. Future research, development and testing of the architecture is addressed. The described architecture paradigm and platform is known as a single-chip Hybrid Data/Command Driven Architecture (HDCA) system. A reconfigurable/dynamic production HDCA system would be implemented to Programmable Logic Devices (PLDs).

Page 3: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 3

Goals, Objectives and Functionality of HDCA System

• Applicable to a wide-range of applications, especially those modeled by process flow graphs.

• Heterogeneous Shared-Memory Model Multiprocessor Architecture.• Allows a mix of Simple and Complex Special-Purpose and General-Purpose

Processors. • Single-Chip Architecture Implemented to Programmable Logic Device (PLD)

Technology.• May be used for real-time or non-real-time applications.• Scalable architecture.• Fault-tolerant architecture.• May operate in a data-driven or command-driven environment at process

level.• Idea is for a small number of short control-tokens to flow through the

architecture rather than more voluminous data.• Dynamic/Reconfigurable at the “application level”.• Dynamic/Reconfigurable at the “node level”.• Dynamic/Reconfigurable at the “processor architecture level”.

Page 4: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 4

Application Description via Process Flow Graphs and Illustration of

Dynamic/Reconfigurability at the “Application Level”

Page 5: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 5

Application Description via Process Flow Graphs and Illustration of Dynamic/Reconfigurability at the “Application Level” (continued)

Another Process Flow Graph Describing an Application With a Different Topology.

Page 6: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 6

Illustration of Dynamic/Reconfigurability at the “Node Level”(Dynamic assignment of a process running on an overloaded Computing Element (CE)

processor, to additional CE processors, to help-out the overloaded CE processor)

Page 7: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 7

Dynamic/Reconfigurability at the “Processor Architecture Level”

Goal - Dynamically, while an application is running, be able to reconfigure (restructure) a Processor Architecture to enhance performance as dynamic changes may occur in application data and process algorithmic structure.

Page 8: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 8

HDCA System Organization and Architecture (High-Level Functional View)

Page 9: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 9

Architectural View Of a Current Single-Chip HDCA System Instantiation

Page 10: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 10

A Functional Level View of the CE Controller.

Page 11: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 11

Brief Overview of HDCA Functional Units

• Process Request Token (PRT) Mapper.

– A Hardware Dynamic Load-Balancing System.– For a Process Requested by a Control Token, It Determines the

CE Containing a Copy of the Process Where Wait-Time to Execute the Requested Process is Minimum. CE Input Queue Depth is Used as the Parameter to Determine Minimum Wait Time (Least Depth) to Execution. CE Queue Depth is Directly Proportional to Wait Time via Utilization of “Dummy Tokens”.

– Detects Some Faults and System Failures.

Page 12: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 12

PRT_IN

High Level Architectural Diagram of the Process Request Token (PRT) Mapper

RAM TABLES

AVAILAIBILITY ROUTER COMPARATORS

Page 13: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 13

Multifunctional Queue (Functionality: FIFO queue, simultaneous

R/W, queue depth indication, signal when a programmable queue threshold depth is reached, switch order of any two entries, report input rate over a programmable time-interval, and report change in

input rate over a programmable time-interval)

Page 14: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 14

Crossbar Interconnect Network (Variable-Priority Memory Contention Resolution Protocol-Priority Based on CE Queue Depths. Deepest Queue Depth Indicates “Most-Behind” .)

Page 15: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 15

HDCA System CEs (Processors) for Previously Shown Instantiation

• Memory Register Computer Architecture CE– For ALU Instructions, one operand in Memory and

another in Register.– 16-Bit Wide Words/Operands.– 16 and 32-Bit Wide Instructions.– Sixteen Assembly Language Instructions.– I/O Structure.– Hardware Vectored Priority Interrupt System, etc.

Page 16: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 16

Memory Register Computer Architecture CE Organization

Page 17: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 17

DIVIDENDDIVISOR

FROM SHARED DATA MEMORY

Page 18: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 18

Multiplier CE Organization/Architecture

Pipelined Multiplier

Multiplicand RegMultiplier Reg

Mux

Instruction Memory8x16

Instr Mar

Mux

Data loc1 Data loc2

Mux Mux

Controller

R2

Mux

Mux

adder

16

8 8

8 8

16

8

8

Page 19: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 19

Control-Token Formats

• Important token formats for the HDCA

Page 20: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 20

Token Formats ( Continued..)

Page 21: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 21

Interface Controller State Diagram (There is an Interface Controller Within the CE Controller Module of Each CE-Responsible for Control of HDCA)

Page 22: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 22

Hardware Description Language (HDL) Description of HDCA System

• VHDL Used as HDL.• Mostly Behavioral and RTL Level Coding Style Used.• Top-Down HDCA System Architecture Development and

Design Style Used.• Structural Bottom-Up Coding and Testing Style Used

(Lower Level Functional Units First Described and Tested Before Being Integrated Into Higher Level Functional Units).

• Generic and Parameterized Coding Style Used When Applicable.

• Approximately 150 Pages (8.5” x 11”) of Single-Spaced 10-Point Font VHDL Code for Shown 5 CE Configuration.

Page 23: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 23

CAD Systems Used in Development and Testing of Single-Chip HDCA System (VHDL System Capture, Synthesis,

Post-Synthesis Simulation Testing, Implementation, Post-Implementation Simulation Testing and Evaluation (Virtual

Prototyping)

• Xilinx ISE 6.2.3 CAD software tool set used for system capture, synthesis and implementation to FPGA technology (Xilinx Virtex 2 – XC2V8000 FPGA chip).

• Modelsim PE 5.7g was used as the HDL simulator. • The host PC for the Xilinx and ModelSim CAD software was a high

performance AMD Athlon processor running Windows XP, 32 bit edition at 2.16 GHz with 2GB of RAM. Input stimuli were added through the HDL bencher, where timing constraints could also be specified. Post-Implementation simulation (after Map, Place and Route) was carried out using ModelSim with test vector sets developed for different applications and after the Input ROM and the Instruction Memories of the Memory/Register Architecture CEs of an HDCA system have been initialized using the Memory Editor tool provided in Xilinx.

Page 24: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 24

HDCA System Testing, Evaluation and Validation via HDL Virtual Prototyping

Example Simple Applications (All Successfully Executed by HDCA)

1. Acyclic Integer Manipulation Algorithm.2. Acyclic Matrix Multiplication Algorithm 1.3. Acyclic Matrix Multiplication Algorithm 2.4. Acyclic Pipelined Integer Manipulation Algorithm.

(Will View in Some Detail-Uses All Heterogeneous CEs of an Experimental HDCA System)

5. Cyclic Non-Deterministic Value Swap Application.6. Other Applications.

Page 25: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 25

Acyclic Pipelined Integer Manipulation Algorithm(Will simultaneously execute two copies of algorithm, each with a

different set of data)

Process Flow graph for the Algorithm

Page 26: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 26

Input first five values of the ten values for first copy of the application - P1

5 Values of x”02” being input into shared data memory at consecutive locations starting from x”03”

Page 27: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 27

Process P7 for Copy 1 of Application – Displays Final result at address location x”0F”

Unsigned 15

At x”0F”

Last Instruction – Copy 1

Page 28: Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture

Heath MAPLD 2005/247 28

Conclusions and Future Research• Conclusions

– Validation of the Concept of a HDCA Accomplished via Virtual Prototyping – Parallel Single-Chip Multiprocessor System, Hybrid, Heterogeneous, Dynamic/Reconfigurable at Application and Node Levels, Implementable to PLD Technology, etc.

– Scalable Architecture/Design at the same time also a SoC.– Can Simultaneously Execute Multiple Copies of an Application, each with

different sets of data.– Potential for Execution of a Wide Range of Applications (Radar signal

processing; communications (packet driven) processing; image (pixel driven) processing; satellite data-stream processing; embedded computing applications including control applications; collection, processing and storage of data from multiple sensors/instruments, etc)

– Can Execute More Complex Applications.• Future Research

– Include More Complex Processors Into Experimental Model of HDCA In Addition to an Operating System (Linux, etc?).

– Further Research Into Development and Refinement of the Concept of “Reconfigurability at the Processor Architecture Level”.

– Identification and Adaptation to Several “Real Applications”!!