Upload
shawn-lewis
View
212
Download
0
Embed Size (px)
Citation preview
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
Hardware Design, Synthesis, and Verification of a Multicore Communication APIBen Meakin, Ganesh Gopalakrishnan
{meakin, ganesh}@cs.utah.eduUniversity of Utah School of Computing
Project Objectives
Implementing Inter-core Communication
Multicore Communication API
8-Core MIPS System-on-Chip
MIPS Core Data-path
Custom On-Chip Network Synthesis
Hardware Verification
Future Work
More Information
• Services provided by modern computer systems
– Computation oriented• Fast, low power cost
– Communication oriented• Slow, high power cost
• Objectives of this project– Research and implement efficient means of
performing on-chip communication– Evaluate the impact of instruction set extensions
enabling explicit data transfer– Apply these to a modern communication API– Study the use of semi-formal HW verification tools
to verify realistic multicore HW
• Physical transport layer– Asynchronous
network-on-chip– Dual networks; one
for user, one for cache controllers
• MIPS instruction set extension– Enables
explicit data transfer
– Reduces some hardware complexity
• Multicore Association Communication API (MCAPI)– Lightweight messaging API designed for
embedded multicore systems• Implementation
– Messages and packet channels use pointers to shared memory
– Scalar channels copy data– Uses in-line assembly code
• 8 processor tiles on a Xilinx Virtex5 FPGA– 16-bit MIPS cores (6-stage pipelines)– Private 2KB instruction and 2KB data caches– Shared 4KB slice of L2 data cache– Network interface unit– NUCA– MSI Directory based cache coherence– Various I/O interfaces
Wiki page with link to read-only SVN checkout:www.cs.utah.edu/formal_verification/mediawiki
-Under “MCAPI Hardware Implementation”
Ben Meakin's web-page:www.cs.utah.edu/~meakin
Multicore Association web-page:www.multicore-association.com
•Cache Architecture– Direct mapped, 8 words per block– L2 physically distributed/logically shared (NUCA)– L1 private– MSI directory coherence protocol– Write invalidate policy– Simplified form of modern architecture
• Workload driven synthesis of NoC given a model of an MCAPI target application– Paper under review for HiPEAC '10– Algorithmic objectives• Generate custom topology to minimize average
hops / flit for application• Synthesize deadlock free routing tables based
on shortest path• Given approximate node sizes find a physical
placement such that average wire distance is minimized
• Results highly encouraging– From baseline, our algorithms achieved for
specific application (> 16 cores)• ~50% reduction in avg. hops / flit• ~50% reduction in avg. wire distance / flit• ~17% increase in throughput• Comparable hardware cost
– Performed at least as well as baseline for general purpose– Better scalability
• Application of IBM's Sixthsense semi-formal verification tool to complex multicore hardware– Promises simulator usability with MUCH higher
coverage• Ability to verify large designs due to non-
exhaustive state space exploration
Simulation
Formal Verification
Semi-Formal Verification
• Cache coherence protocol verification at RTL– Can SXS find bugs not found by simulation?– Further application to pipeline control– Work in progress...
• Evaluation of SXS and other tools as applied to multicore RTL descriptions
• Extensive benchmarking of MCAPI implementation and interconnect technology
• Research additional applications of proposed ISA extension in parallel programming methods
• Research hardware mechanisms for increasing observability of multicore processors– Deterministic replay