A Scalable and Recon gurable Shared-Memory Graphics ... · other graphics application related...

A Scalable and Reconfigurable Shared-Memory Graphics Architecture

Michael Manzke∗

Trinity College DublinRoss Brennan

Trinity College DublinKeith O’Conor

Trinity College DublinJohn Dingliana

Trinity College DublinCarol O’Sullivan

Trinity College Dublin

Current scalable high-performance graphics systems are either con-structed using special purpose graphics acceleration hardware orbuilt as a cluster of commodity components with a software in-frastructure that exploits multiple graphics cards [Humphreys et al.2002]. Both these solutions are used in application domains wherecomputational demand cannot be met by a single commodity graph-ics card e.g., large-scale scientific visualisation. The former ap-proach tends to provide the highest performance but is expensivebecause it requires frequent redesign of the special purpose graph-ics acceleration hardware in order to maintain a performance ad-vantage over the commodity graphics hardware used in the clusterapproach. The latter approach, while more affordable and scalable,has intrinsic performance drawbacks due to computationally expen-sive communication between the individual graphics pipelines.

Figure 1: The first prototype of the custom-built high-performancegraphics cluster node. The figure shows how a commodity graphicscard interfaces the cluster node. It also depicts the four SCI cablesthat should interconnect the custom-built GPU interface boards andthe PC cluster via a 2D torus topology.

In this sketch we propose a scalable tightly coupled cluster ofcustom-built boards that provide an AGP interface for commoditygraphics accelerators. This hybrid solution aims to bridge the gapbetween both of the current solutions, offering a minimal custom-built hardware component together with a novel and efficient sharedmemory infrastructure that exploits cutting-edge consumer graph-ics hardware. The boards are supplied with rendering instructionsby a cluster of commodity PCs that execute OpenGL graphics ap-plications. All the commodity PCs and custom-built boards are in-terconnected with an implementation of the IEEE 1596-1992 Scal-able Coherent Interface (SCI) standard. This technology providesthe system with a high bandwidth, low latency, point to point inter-connect. Our design allows for the implementation of a 2D torustopology with good scalability properties and excellent suitabilityfor parallel rendering. Most importantly the interconnect imple-ments a Distributed Shared Memory (DSM) architecture in hard-

∗e-mail:Michael.Manzke@cs.tcd.ie

ware. Figure 2 shows how local memories on the custom-builtboards and the PCs become part of the system wide DSM throughthe SCI interconnect. Figure 2 also depicts Field ProgrammableGate Arrays (FPGAs) on the custom-built boards. These recon-figurable components assist the SCI implementation and providesubstantial additional computational resources that may be used tocontrol the commodity graphics accelerators and to perform oper-ations associated with a parallel rendering infrastructure. Beyondthe previously mentioned application of the FPGAs we envisionother graphics application related computation e.g., ray tracing.These reconfigurable components are an integral part of the scalableshared-memory graphics cluster and consequently increase the pro-grammability of the parallel rendering system, just like vertex andpixel shaders increased the programmability of graphics pipelines.

Board n

FPGAs GPU Card

Local Shared Memory

Board 1

FPGAs GPU Card

Local Shared Memory

Distributed Shared MemorySingle Address Space

Implemented through a High Speed Interconnect

PC Board n

Local Shared Memory

CPUPC Board 1

Local Shared Memory

CPU Scalable

Scalable

Figure 2: Shared memory system.

In this sketch, we describe the design of a tightly coupled scalableNon-Uniform Memory Access (NUMA) architecture of distributedFPGAs, GPUs and memory that may be constructed with a limitedamount of custom-built hardware. A first prototype of the custom-built boards, seen in Figure 1, was manufactured and is currentlydebugged. A second revision will resolve outstanding problems.We expect that this hardware DSM cluster communicates data at500Mbytes/s with low latencies (< 1.5 µs). This hard real-timecapable parallel rendering cluster will be connected with the samehigh speed interconnect to a commodity PC cluster that will exe-cute the graphics application. We have introduced this novel archi-tecture and estimate, based on the arguments presented, that thissolution could out-perform pure commodity implementations with-out increased hardware cost and yet maintain its adaptability to themost recent generation of commodity graphics accelerators and tar-get applications. Later prototypes will incorporate PCI Express tobe compatible with the latest commodity graphics accelerators.

References

HUMPHREYS, G., HOUSTON, M., NG, R., FRANK, R., AH-ERN, S., KIRCHNER, P. D., AND KLOSOWSKI, J. T. 2002.Chromium: a stream-processing framework for interactive ren-dering on clusters. In SIGGRAPH, 693–702.

A Scalable and Recon gurable Shared-Memory Graphics ... · other graphics application related...

Documents

Rapidly Recongurable Inextensible Inatables

GPC-Based Stable Recon gurable Control

Scienti c Mashups: Runtime-Con gurable Data Product Ensembles

Chapter 2 Literature Survey of Recon gurable Antennashodhganga.inflibnet.ac.in/bitstream/10603/33586/9/09_chapter 2.pdf · Chapter 2 Literature Survey of Recon gurable Antenna

A Recon gurable PID Controller

Recon gurable honeycomb metamaterial absorber having

Recon gurable, Intelligently-Adaptive, …...Recon gurable, Intelligently-Adaptive, Communication System, an SDR Platform Rigoberto J. Roche NASA Glenn Research Center, Cleveland,

Magnetized Plasma for Reconï¬gurable Subdiffraction Imaging

Adaptive NoC for recon gurable SoC - core.ac.uk · PDF fileAdaptive NoC for recon gurable SoC Istas Pratomo To cite this version: Istas Pratomo. Adaptive NoC for recon gurable SoC

Elliptic Curve Cryptosystems on Recon gurable Hardware · Elliptic Curve Cryptosystems on Recon gurable Hardware by Martin Christopher Rosner AThesis ... The thesis deals withthe

Reconï¬gurable Component-based Middleware for Networked

A Recon gurable PID Controller - TUCkpapadimitriou/publications/2018arc-ReconfigurablePID...A Recon gurable PID Controller Sikandar Khan1, Kyprianos Papadimitriou2;3, Giorgio Buttazzo1,

Reconï¬gurable Mesh Algorithms For Image Shrinking

Recon gurable Convolution Implementation for CNNs in FPGA

Modular Exponentiation on Reconï¬gurable Hardware

USB-Con gurable - Balluff

Design and development of a recongurable cryptographic co ......Daniele FRONTE Titre de la th ese : Design and development of a recon gurable cryptographic co-processor Soutenue le

Frequency Recon gurable and Linear Power Ampli ers Based on …publications.lib.chalmers.se/records/fulltext/244278/... · 2016-10-27 · Frequency Recon gurable and Linear Power

Multi-Band and Multi-Function Reconﬁ gurable Gallium

Coarse-Grained Recon gurable Architectures