27
An Application Specific Reconfigurable Graphics Processor Hans Holten-Lund IMM, DTU Graphics Vision Day, 13. Juni 2003

An Application Specific Reconfigurable Graphics Processor

  • Upload
    pearl

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

An Application Specific Reconfigurable Graphics Processor. Hans Holten-Lund IMM, DTU Graphics Vision Day, 13. Juni 2003. Overview. Currently there is a trend for more programmability in graphics hardware. Where is the graphics industry headed? What about reconfigurable circuits? - PowerPoint PPT Presentation

Citation preview

Page 1: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor

Hans Holten-LundIMM, DTUGraphics Vision Day, 13. Juni 2003

Page 2: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 2

Overview Currently there is a trend for more

programmability in graphics hardware. Where is the graphics industry headed? What about reconfigurable circuits?

This could be a comeback for “software” rendering, but not as we know it.

Parallel graphics systems? Demo:

A working graphics processor implemented in an FPGA.

Page 3: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 3

Current trends Past:

Fixed function graphics processors. Present:

Programmable vertex and pixel processors. Future:

Reconfigurable vertex & pixel datapaths? Modify instruction set of vertex & pixel processors?

Reconfigurable architecture? Configurable parallelism? Rasterization algorithm optimizations? Support new graphics algorithms?

Page 4: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 4

Why reconfigurable? Reconfigurable graphics processors could allow:

Specialized graphics primitives (e.g. subdivision surfaces) Customized pixel processing (e.g. fragment sorting for correct

multilayer transparency) Avoiding software emulation (e.g. when you try to run a DX9

program on DX7-only hardware). New ways to process triangles and other primitives:

Current GPU’s use a fixed triangle processing pipeline. The triangle processing part of the graphics processor should

also be programmable, not just vertex processing. Enables: Displacement mapping, procedural geometry,

deformable models, collision detection, adaptive level-of-detail, soft shadows, physics simulation, etc…

New exciting graphics algorithms appear all the time, we would like to have them running in hardware as well!

Page 5: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 5

Reconfigurable architecture: Alternative rendering architectures: Tile-based rendering, where triangles are sorted

according to screen regions. Tiles can also be rendered in parallel allowing the graphics

system performance to scale.

Framebuffer organization could be more flexible: How about a distributed framebuffer for large display walls? Direct display of a floating-point framebuffer?

Hardware Ray-Tracing. For global illumination.

Page 6: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 6

Reconfigurable hardware. Types of hardware:

Fixed-function ASIC (Application Specific IC) E.g. older non-programmable hardwired graphics

processors. General purpose CPU (Pentium 4, etc.)

Fixed function but can run any program. Application Specific Processors, ASP

Runs specialized code. (ex: Vertex shaders.) Fixed-function ASIC with multiple embedded ASPs

More flexible, but still mostly hardwired. Field Programmable Gate Array (FPGA)

Fully reconfigurable chip, can emulate any hardware.

Page 7: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 7

Designing a custom graphics processor

The task of designing a custom graphics processor from scratch is much like writing a software renderer. In fact, you should start by writing it in C and test it thoroughly. When the software model works, the design should be

partitioned into hardware modules. Or, we could use similar techniques to create a cache-optimized

software renderer. The C models are then translated to a HDL (Hardware

Description Language) description in VHDL, Verilog or SystemC. Using logic synthesis and place & route tools (a silicon

compiler), the HDL code can be used to create a chip (ASIC) ready to be fabricated.

The same tools can be used to create a configuration pattern for an FPGA.

Page 8: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 8

Why is an FPGA so interesting? It allows anybody to build graphics hardware!

Tough learning curve. Reconfiguration:

In a few seconds the entire chip can be reconfigured, by loading a new configuration bit-pattern.

Partial self-reconfiguration in a running FPGA is also possible. Other interesting uses:

An FPGA is used in the European Mars Express Lander “British Beagle 2”.

To save weight and power, the FPGA can only control a single instrument at a time, but reconfigures itself on demand when another instrument is used. Memory chips for storing multiple configurations are lighter than having all the hardware in place at once.

Emulation of old hardware: Original Pac-Man game on an FPGA. Real-time verification of large ASICs using multiple FPGAs.

Page 9: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 9

Practical issues with FPGAs: Cost issues:

At low volume, very low cost compared to ASICs. At high volume, expensive!

Speed issues: FPGAs are rapidly catching up with standard-cell based ASICs,

there is only a factor 5 in clock speed difference today. The gap is getting smaller for each generation.

Area issues: Major concern, as most of the FPGA’s chip-area is used for the

reconfiguration network. => low silicon area utilization. ASIC based graphics processors are 100+ Mtransistor designs

using many floating point units, etc.

Page 10: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 10

Vertex & Pixel Processors in FPGAs

Do we really need to use programmable vertex & pixel processors in an FPGA based GPU? We should! - The size of a pipelined RISC style

processor is quite small in a modern FPGA. A pipelined dedicated T&L datapath is still possible,

and will be faster but also larger. In the FPGA, programmable vertex & pixel processors

can be customized to suit the specific vertex & pixel programs used in the application.

Page 11: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 11

A Scalable Graphics Architecture

Some reasons for why we should look at scalable graphics architectures: Since an FPGA comes in many sizes ranging from

e.g. 10,000 to 8 Million system gates. Future FPGA generations should be easily utilizable.

Currently FPGA technology is evolving even faster than graphics processors!

Page 12: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 12

FPGA prototyping boards: We have two suitable boards at IMM:

Celoxica RC-1000 reconfigurable computing platform: 1 million gates, no multipliers, 16 kbytes on-chip RAM, 30 MHz. 4x 32 bit SRAM memory (8 Mbytes) PCI interface.

Avnet Xilinx Virtex-II development kit: 4 million gates, 120 multipliers, 270 kbytes on-chip RAM, 100

MHz. Space enough for up to 25 RISC CPU cores, 100 MHz each!

Up to 512 Mbytes 64-bit PC2100 DDR SDRAM module. 64-bit 100 MHz PCI-X connector.

Page 13: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 13

Case study:The Hybris Graphics Architecture Developed at IMM, see my Ph.D. thesis. Hybrid-Parallel scalable graphics architecture. It is primarily a sort-middle architecture:

Tile-based architecture similar to GigaPixel, PowerVR, etc. Screen is divided into 32x32 pixel tiles, triangles are sorted into

bins for each tile. Tiles can be rendered in parallel after sorting, which allows

multiple tile rendering units to be used. Framebuffer operations are fast (on-chip Z-buffer), but extra

memory/bandwidth are needed for binning triangles. Several prototype implementations exist:

Parallel software rendering and FPGA-based graphics hardware.

Page 14: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 14

Sort-middle parallelism with Tiles

G G G G

R R R R

Bucket sorted redistribution by screen region

Geometry database (random distribution)

ParallelGeometryProcessors

ParallelTileRenderers

Final image isassembled fromsubdivision tiles

Page 15: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 15

Processes in the tile rendering engine

a) Setup Triangle b) Draw Triangle

c) Setup Span d) Draw Span

e) Final result

Page 16: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 16

Pipelined tile rendering engine

Local SRAM32x32 pixels

color &depth

SetupTriangle

DrawTriangle /

SetupSpan

DrawSpan /DrawPixel

Triangle

Pix

el

Pix

el

FIF

O

FIF

O

Triangle Triangle Span Span

Local SRAM32x32 pixels

color &depth

Pix

el

Pix

el

Pix

el

Pix

el

Pix

el

Pix

el

switch

hand-shake

hand-shake

hand-shake

hand-shake

hand-shake

hand-shakeInput

Trianglecontroller

OutputTile

controller

2x2 crossbar switch

Page 17: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 17

FPGA floor plan

Page 18: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 18

FPGA place & route floor plan

Page 19: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 19

Page 20: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 20

Page 21: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 21

Hardware rendered image example

Image showing the Stanford “Dragon” laserscan rendered using the FPGA hardware implementation of the tile-based Hybris rendering architecture.

The object contains 870k triangles and was rendered in 2 seconds on an experimental Xilinx Virtex 1000 FPGA board running at 25 MHz.

Page 22: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 22

“Dragon” rendered by the FPGA

Page 23: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 23

Next generation of our FPGA-GPU: Work-in-progress:

Design migration to next-generation FPGA architecture. Change hardware description language from VHDL to SystemC. PCI-X bus interface core on the FPGA. DDR SDRAM memory controller. 2x2 Interleaved pixel-parallel tile rendering engine, renders up to

4 pixels/clock. Parallel tile rendering to render multiple tiles in parallel. Programmable floating point vertex processor to run vertex

shaders. Floating point units and MIPS-style RISC core implemented.

A SystemC based simulator with a nice user interface allows easier debugging and cycle-true simulations.

Page 24: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 24

2x2 pixel parallel tile rendering engine

h.s. x4

hand

-sh

ake

SetupTriangle

Triangle FIF

O

Triangle

hand-shake

hand-shakeInput

Trianglecontroller

DrawTriangle /

SetupSpan

Triangle

hand-shake

DrawTriangle /

SetupSpan

Triangle

hand-shake

Bro

adca

st b

us

FIF

O

Triangle

hand-shake

FIF

O

Triangle

hand-shakeTriangle

hand-shake

DrawSpan /DrawPixel

FIF

O

Span

hand-shake

FIF

O

Span

hand-shake

Broadcast bus

FIFO

Spa

n

hand

-sh

ake

FIFO

Spa

n

hand

-sh

ake

Spa

n

hand

-sh

ake

DrawSpan /DrawPixel

Spa

n

hand

-sh

ake

DrawSpan /DrawPixel

Broadcast bus

FIFO

Spa

n

hand

-sh

ake

FIFO

Spa

n

hand

-sh

ake

Spa

n

hand

-sh

ake

DrawSpan /DrawPixel

Spa

n

hand

-sh

ake

Pix

el

Pix

el

2x2 crossbar

Local SRAM16x16 pixelscolor & depth

Pix

el

Pix

el

Pix

el

Pix

el

Pix

el

Pix

el

2x2 crossbar

Local SRAM16x16 pixelscolor & depth

Pix

el

Pix

el

Pix

el

Pix

el

Pix

el

Pix

el

2x2 crossbar

Local SRAM16x16 pixelscolor & depth

Pix

el

Pix

el

Pix

el

Pix

el

Pix

el

Pix

el2x2 crossbar

Local SRAM16x16 pixelscolor & depth

Pix

el

Pix

el

Pix

el

Pix

el Collectpixels into32x32 tile

Outputctrl.

Outputctrl.

Outputctrl.

Outputctrl.

Span

Span

hand

-sh

ake

OutputTile

controller

Pix

els

hand

-sh

ake

Page 25: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 25

Sparse super-sampling for anti-aliasing

1

2

3

4

1

2

4

2

3

1

5

6

7

8

Page 26: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 26

Dual geometry / dual tileparallel implementation

Framebuffer

combinedoutput

Geometryengine 1

Tilerenderer 1

Tilerenderer 2

triangle

triangle

Geometryengine 2

Objectpartitioneddata, set 1

object

object

Objectpartitioneddata, set 2

Front-end Back-end

Bucketsortedtriangleheap 1

Bucketsortedtriangleheap 2

triangle

triangle

Bucketsortedtriangleheap 1

Bucketsortedtriangleheap 2

Page 27: An Application Specific Reconfigurable Graphics Processor

An Application Specific Reconfigurable Graphics Processor - Graphics Vision Day, IMM, DTU 27

Future? Future mass-market graphics processors may

incorporate FPGA-like structures to allow: Addition of new hardware features. Fix hardware bugs “in the field”. Flexible architecture, can be tuned to match the application.

FPGA’s have revolutionized the way we think about hardware, by introducing “soft” hardware. Like with software, there is also an opensource hardware

movement. (www.opencores.org) The border between software and hardware has been blurred.