TEMPLATE DESIGN © 2008 Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

Hardware Design, Synthesis, and Verification of a Multicore Communication APIBen Meakin, Ganesh Gopalakrishnan

{meakin, ganesh}@cs.utah.eduUniversity of Utah School of Computing

Project Objectives

Implementing Inter-core Communication

Multicore Communication API

8-Core MIPS System-on-Chip

MIPS Core Data-path

Custom On-Chip Network Synthesis

Hardware Verification

Future Work

More Information

• Services provided by modern computer systems

– Computation oriented• Fast, low power cost

– Communication oriented• Slow, high power cost

• Objectives of this project– Research and implement efficient means of

performing on-chip communication– Evaluate the impact of instruction set extensions

enabling explicit data transfer– Apply these to a modern communication API– Study the use of semi-formal HW verification tools

to verify realistic multicore HW

• Physical transport layer– Asynchronous

network-on-chip– Dual networks; one

for user, one for cache controllers

• MIPS instruction set extension– Enables

explicit data transfer

– Reduces some hardware complexity

• Multicore Association Communication API (MCAPI)– Lightweight messaging API designed for

embedded multicore systems• Implementation

– Messages and packet channels use pointers to shared memory

– Scalar channels copy data– Uses in-line assembly code

• 8 processor tiles on a Xilinx Virtex5 FPGA– 16-bit MIPS cores (6-stage pipelines)– Private 2KB instruction and 2KB data caches– Shared 4KB slice of L2 data cache– Network interface unit– NUCA– MSI Directory based cache coherence– Various I/O interfaces

Wiki page with link to read-only SVN checkout:www.cs.utah.edu/formal_verification/mediawiki

-Under “MCAPI Hardware Implementation”

Ben Meakin's web-page:www.cs.utah.edu/~meakin

Multicore Association web-page:www.multicore-association.com

•Cache Architecture– Direct mapped, 8 words per block– L2 physically distributed/logically shared (NUCA)– L1 private– MSI directory coherence protocol– Write invalidate policy– Simplified form of modern architecture

• Workload driven synthesis of NoC given a model of an MCAPI target application– Paper under review for HiPEAC '10– Algorithmic objectives• Generate custom topology to minimize average

hops / flit for application• Synthesize deadlock free routing tables based

on shortest path• Given approximate node sizes find a physical

placement such that average wire distance is minimized

• Results highly encouraging– From baseline, our algorithms achieved for

specific application (> 16 cores)• ~50% reduction in avg. hops / flit• ~50% reduction in avg. wire distance / flit• ~17% increase in throughput• Comparable hardware cost

– Performed at least as well as baseline for general purpose– Better scalability

• Application of IBM's Sixthsense semi-formal verification tool to complex multicore hardware– Promises simulator usability with MUCH higher

coverage• Ability to verify large designs due to non-

exhaustive state space exploration

Simulation

Formal Verification

Semi-Formal Verification

• Cache coherence protocol verification at RTL– Can SXS find bugs not found by simulation?– Further application to pipeline control– Work in progress...

• Evaluation of SXS and other tools as applied to multicore RTL descriptions

• Extensive benchmarking of MCAPI implementation and interconnect technology

• Research additional applications of proposed ISA extension in parallel programming methods

• Research hardware mechanisms for increasing observability of multicore processors– Deterministic replay

Documents

TEMPLATE DESIGN © 2008 Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan