1
TEMPLATE DESIGN © 2008 www.PosterPresentations.com Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan {meakin, ganesh}@cs.utah.edu University of Utah School of Computing Project Objectives Implementing Inter-core Communication Multicore Communication API 8-Core MIPS System-on-Chip MIPS Core Data-path Custom On-Chip Network Synthesis Hardware Verification Future Work More Information • Services provided by modern computer systems – Computation oriented • Fast, low power cost – Communication oriented • Slow, high power cost • Objectives of this project – Research and implement efficient means of performing on-chip communication – Evaluate the impact of instruction set extensions enabling explicit data transfer – Apply these to a modern communication API – Study the use of semi-formal HW verification tools to verify realistic multicore HW • Physical transport layer – Asynchronous network-on-chip – Dual networks; one for user, one for cache controllers • MIPS instruction set extension – Enables explicit data transfer – Reduces some hardware complexity • Multicore Association Communication API (MCAPI) – Lightweight messaging API designed for embedded multicore systems • Implementation – Messages and packet channels use pointers to shared memory – Scalar channels copy data – Uses in-line assembly code • 8 processor tiles on a Xilinx Virtex5 FPGA – 16-bit MIPS cores (6-stage pipelines) – Private 2KB instruction and 2KB data caches – Shared 4KB slice of L2 data cache – Network interface unit – NUCA – MSI Directory based cache coherence – Various I/O interfaces Wiki page with link to read-only SVN checkout: www.cs.utah.edu/formal_verification/med iawiki -Under “MCAPI Hardware Implementation” Ben Meakin's web-page: www.cs.utah.edu/~meakin Multicore Association web-page: www.multicore-association.com •Cache Architecture – Direct mapped, 8 words per block – L2 physically distributed/logically shared (NUCA) – L1 private – MSI directory coherence protocol – Write invalidate policy – Simplified form of modern architecture • Workload driven synthesis of NoC given a model of an MCAPI target application – Paper under review for HiPEAC '10 – Algorithmic objectives •Generate custom topology to minimize average hops / flit for application •Synthesize deadlock free routing tables based on shortest path •Given approximate node sizes find a physical placement such that average wire distance is minimized • Results highly encouraging – From baseline, our algorithms achieved for specific application (> 16 cores) •~50% reduction in avg. hops / flit •~50% reduction in avg. wire distance / flit •~17% increase in throughput •Comparable hardware cost – Performed at least as well as baseline for general purpose – Better scalability • Application of IBM's Sixthsense semi- formal verification tool to complex multicore hardware – Promises simulator usability with MUCH higher coverage •Ability to verify large designs due to non-exhaustive state space exploration Simulation Formal Verification Semi-Formal Verification • Cache coherence protocol verification at RTL – Can SXS find bugs not found by simulation? – Further application to pipeline control – Work in progress... • Evaluation of SXS and other tools as applied to multicore RTL descriptions • Extensive benchmarking of MCAPI implementation and interconnect technology • Research additional applications of proposed ISA extension in parallel programming methods • Research hardware mechanisms for increasing observability of multicore processors – Deterministic replay

TEMPLATE DESIGN © 2008 Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan

Embed Size (px)

Citation preview

Page 1: TEMPLATE DESIGN © 2008  Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

Hardware Design, Synthesis, and Verification of a Multicore Communication APIBen Meakin, Ganesh Gopalakrishnan

{meakin, ganesh}@cs.utah.eduUniversity of Utah School of Computing

Project Objectives

Implementing Inter-core Communication

Multicore Communication API

8-Core MIPS System-on-Chip

MIPS Core Data-path

Custom On-Chip Network Synthesis

Hardware Verification

Future Work

More Information

• Services provided by modern computer systems

– Computation oriented• Fast, low power cost

– Communication oriented• Slow, high power cost

• Objectives of this project– Research and implement efficient means of

performing on-chip communication– Evaluate the impact of instruction set extensions

enabling explicit data transfer– Apply these to a modern communication API– Study the use of semi-formal HW verification tools

to verify realistic multicore HW

• Physical transport layer– Asynchronous

network-on-chip– Dual networks; one

for user, one for cache controllers

• MIPS instruction set extension– Enables

explicit data transfer

– Reduces some hardware complexity

• Multicore Association Communication API (MCAPI)– Lightweight messaging API designed for

embedded multicore systems• Implementation

– Messages and packet channels use pointers to shared memory

– Scalar channels copy data– Uses in-line assembly code

• 8 processor tiles on a Xilinx Virtex5 FPGA– 16-bit MIPS cores (6-stage pipelines)– Private 2KB instruction and 2KB data caches– Shared 4KB slice of L2 data cache– Network interface unit– NUCA– MSI Directory based cache coherence– Various I/O interfaces

Wiki page with link to read-only SVN checkout:www.cs.utah.edu/formal_verification/mediawiki

-Under “MCAPI Hardware Implementation”

Ben Meakin's web-page:www.cs.utah.edu/~meakin

Multicore Association web-page:www.multicore-association.com

•Cache Architecture– Direct mapped, 8 words per block– L2 physically distributed/logically shared (NUCA)– L1 private– MSI directory coherence protocol– Write invalidate policy– Simplified form of modern architecture

• Workload driven synthesis of NoC given a model of an MCAPI target application– Paper under review for HiPEAC '10– Algorithmic objectives• Generate custom topology to minimize average

hops / flit for application• Synthesize deadlock free routing tables based

on shortest path• Given approximate node sizes find a physical

placement such that average wire distance is minimized

• Results highly encouraging– From baseline, our algorithms achieved for

specific application (> 16 cores)• ~50% reduction in avg. hops / flit• ~50% reduction in avg. wire distance / flit• ~17% increase in throughput• Comparable hardware cost

– Performed at least as well as baseline for general purpose– Better scalability

• Application of IBM's Sixthsense semi-formal verification tool to complex multicore hardware– Promises simulator usability with MUCH higher

coverage• Ability to verify large designs due to non-

exhaustive state space exploration

Simulation

Formal Verification

Semi-Formal Verification

• Cache coherence protocol verification at RTL– Can SXS find bugs not found by simulation?– Further application to pipeline control– Work in progress...

• Evaluation of SXS and other tools as applied to multicore RTL descriptions

• Extensive benchmarking of MCAPI implementation and interconnect technology

• Research additional applications of proposed ISA extension in parallel programming methods

• Research hardware mechanisms for increasing observability of multicore processors– Deterministic replay