71
Introduction to Modern GPU Hardware Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Hsinchu, Taiwan Fall, 2018 1 The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact [email protected] . I will correct the error asap. This course used only and please do NOT broadcast. Thank you.

Introduction to Modern GPU Hardwareviplab.cs.nctu.edu.tw/course/GPASD2018_Fall/GPASD_Lecture_2.pdf · GPU Fundamentals: Graphics Pipeline • A simplified graphics pipeline –Note

  • Upload
    others

  • View
    30

  • Download
    0

Embed Size (px)

Citation preview

Introduction to Modern GPU Hardware

Lan-Da Van (范倫達), Ph. D.

Department of Computer Science

National Chiao Tung University Hsinchu, Taiwan

Fall, 2018

1

The following content are extracted from the material in the references on

last page. If any wrong citation or reference missing, please contact

[email protected] . I will correct the error asap.

This course used only and please do NOT broadcast. Thank you.

Outline

2

GPU Pipeline

GPU Hardware History

GPU Hardware Consideration

Modern GPU Hardware Architecture

NVIDIA GeForce

AMD (ATI) Radeon

IMG PowerVR

ARM Mali

GPU Applications

Summary

GPU Fundamentals: Graphics Pipeline

• A simplified graphics pipeline

– Note that pipe widths vary

– Many caches, FIFOs, and so on not shown

GPUCPU

ApplicationTransform

& LightRasterize Shade Video

Memory

(Textures)

Xfo

rmed, L

it Vertic

es (2

D)

Graphics State

Render-to-texture

Assemble

Primitives

Vertic

es (3

D)

Scre

ensp

ace tria

ngle

s (2D

)

Fra

gm

ents (p

re-p

ixels)

Fin

al P

ixels (C

olo

r, Depth

)

GPU

Transform

& Light

CPU

Application Rasterize Shade Video

Memory

(Textures)

Xfo

rmed, L

it Vertic

es (2

D)

Graphics State

Render-to-texture

Assemble

Primitives

Vertic

es (3

D)

Scre

ensp

ace tria

ngle

s (2D

)

Fra

gm

ents (p

re-p

ixels)

Fin

al P

ixels (C

olo

r, Depth

)

GPU Fundamentals: ModernGraphics Pipeline

• Programmable vertex processor!

• Programmable pixel processor!

Fragment

Processor

Vertex

Processor

GPUCPU

ApplicationVertex

ProcessorRasterize

Fragment

ProcessorVideo

Memory

(Textures)

Xfo

rmed, L

it Vertic

es (2

D)

Graphics State

Render-to-texture

Vertic

es (3

D)

Scre

ensp

ace tria

ngle

s (2D

)

Fra

gm

ents (p

re-p

ixels)

Fin

al P

ixels (C

olo

r, Depth

)

GPU Fundamentals: ModernGraphics Pipeline

Assemble

Primitives

Geometry

Processor

Programmable primitive assembly!

More flexible memory access!

History of Graphics Hardware (1/3)

6

… - mid ’90s

SGI mainframes and workstations

PC: only 2D graphics hardware

mid ’90s

Consumer 3D graphics hardware (PC)

- 3dfx, NVIDIA, Matrox, ATI, …

Triangle rasterization (only)

Cheap: pushed by game industry

1999

PC-card with TnL (Transform and Lighting)

- NVIDIA GeForce: Graphics Processing Unit (GPU)

PC-card more powerful than specialized workstations

3DFX Voodoo graphics 4MB - 1997

History of Graphics Hardware (2/3)

https://www.zhihu.com/question/21980949

History of Graphics Hardware (3/3)

8

Modern graphics hardware

Graphics pipeline partly programmable

Leaders: AMD(ATI) and NVIDIA

- “AMD Radeon HD 6990” and “NVIDIA GeForce GTX 590”

Game consoles similar to GPUs (Xbox)

Computational Power (1/2)

• GPUs are fast…

– 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160):

• Computation: 48 GFLOPS peak

• Memory bandwidth: 21 GB/s peak

• Price: $874 (chip)

– NVIDIA GeForce 8800 GTX:

• Computation: 330 GFLOPS observed

• Memory bandwidth: 55.2 GB/s observed

• Price: $599 (board)

• GPUs are getting faster, faster

– CPUs: 1.4× annual growth

– GPUs: 1.7× (pixels) to 2.3× (vertices) annual growth

Computational Power (2/2)

Courtesy Naga Govindaraju

GPU

CPU

Flops Comparison on GPU and CPU

Memory Bandwidths Comparison of CPU and GPU

Motivation

• Why are GPUs getting faster so fast?

– Arithmetic intensity

• the specialized nature of GPUs makes it easier to use additional transistors for computation

– Economics

• multi-billion dollar video game market is a pressure cooker that drives innovation to exploit this property

Flexible and Precise

• Modern GPUs are deeply programmable

– Programmable pixel, vertex, and geometry engines

– Solid high-level language support

• Modern GPUs support “real” precision

– 32-bit/64-bit floating point throughout the pipeline

• High enough for many applications

– DX10-class GPUs add 32-bit integers

Graphics Hardware Consideration (1/2)

• GPU = Graphics Processing Unit– Vector processor

– Operates on 4 tuples• Position ( x, y, z, w )

• Color ( red, green, blue, alpha )

• Texture Coordinates ( s, t, r, q )

– 4 tuple ops, 1 clock cycle• SIMD [ Single Instruction Multiple Data ]

– ADD, MUL, SUB, DIV, MADD, …

• Pipelining

– Number of stages

• Parallelism

– Number of parallel processes

• Parallelism + pipelining

– Number of parallel pipelines

1 2 3

1 2 3

1 2 3

1 2 3

1

2

3

Graphics Hardware Consideration (2/2)

Outline

17

GPU Pipeline

History of GPU Hardware

GPU Hardware Consideration

Modern GPU Hardware Architecture

NVIDIA GeForce

AMD (ATI) Radeon

IMG PowerVR

ARM Mali

Summary

Growth of NVIDIA GPU

• Performance matrices

– Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate.

Growth of NVIDIA GPU

NVIDIA GeForce 7900 GTX

Nvidia Graphics Card Architecture

• GeForce-8 Series– 12,288 concurrent threads, hardware managed– 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak

TEX L1

SP

SharedMemory

IU

SP

SharedMemory

IU

TF

TEX L1

SP

SharedMemory

IU

SP

SharedMemory

IU

TF

TEX L1

SP

SharedMemory

IU

SP

SharedMemory

IU

TF

TEX L1

SP

SharedMemory

IU

SP

SharedMemory

IU

TF

TEX L1

SP

SharedMemory

IU

SP

SharedMemory

IU

TF

TEX L1

SP

SharedMemory

IU

SP

SharedMemory

IU

TF

TEX L1

SP

SharedMemory

IU

SP

SharedMemory

IU

TF

TEX L1

SP

SharedMemory

IU

SP

SharedMemory

IU

TF

L2

Memory

Work DistributionHost CPU

L2

Memory

L2

Memory

L2

Memory

L2

Memory

L2

Memory

NVIDIA FERMI

FERMI: Streaming Multiprocessor (SM)

• Each SM contains

• 32 Cores

• 16 Load/Store units

• 32,768 registers

• Newer FP representation

• IEEE 754-2008

• Two units

• Floating point

• Integer

FERMI: Results

FERMI: Comparison

Kepler: Core Architecturehttp://www.weistang.com/article-941-1.html

Maxwell: Core Architecturehttp://www.weistang.com/article-941-1.html

http://www.coolaler.com/showthread.php/313295-

%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A

B%98%E6%95%88GPU%EF%BC%9ANVIDIA-

Maxwell%E6%9E%B6%E6%A7%8B

Kepler vs Maxwell Comparison

http://www.coolaler.com/showthread.php/313295-

%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIA-

Maxwell%E6%9E%B6%E6%A7%8B

2012 2014

Pascal: Core Architecture

https://read01.com/zh-tw/oemmE4.html#.Wi5F30qWYps

Volta: Core Architecture

http://technews.tw/2017/05/11/nvidia-gpu-volta/

Pascal vs Volta Comparison

http://technews.tw/2017/05/11/nvidia-gpu-volta/

2016 2017

09/02/11https://zh.wikipedia.org/wiki/CUDA

NVIDIA ULP-Geforce (Tegra2)

33

NVIDIA ULP-Geforce (Tegra3)

34

Tegra Roadmap

09/02/11

Mobile Roadmap

09/02/11

http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler-

into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table-

drawing-tablet?page=2

• Features of ATI Radeon X1900 XTX

– Core speed 650 MHz

– 48 pixel shader processors

– 8 vertex shader processors

– 51 GB/s memory bandwidth

– 512 MB memory

ATI Radeon X1900 XTX

http://product.pcpop.com/000024721/Index

.html

GPU

650MHzGraphics memory

½ GB

CPU

3GHzMain memory

1GB

Cach

e

½M

B

AGP bus

2GB/s

Output

Graphics CardHigh bandwidth

51GB/s

High bandwidth

77GB/s

Par

alle

l P

roce

sses

3GB/s

AGP memory

½ GB

Processor Chip

• High Memory Bandwidth

ATI Radeon X1900 XTX

• Parallelism + pipelining: ATI Radeon 9700

4 vertex pipelines 8 pixel pipelines

ATI Radeon 9700

Radeon Comparison

09/02/11http://www.pcdiy.com.tw/detail/4275

IMG PowerVR Series5XT (SGXMP)

41

IMG PowerVR Series5XT (SGXMP)

42

• Shader-driven Tile-Based Deferred Rendering (TBDR) architecture

• Fully programmable GPU using unique USSE architecture

• All SGX cores support OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1

IMG PowerVR Series6 (Rogue)

43

IMG PowerVR Series6 (Rogue)

44

• Support OpenGL ES 3.0, OpenGL ES 2.0, OpenGL 3.x/4.x, OpenCL 1.x and DirectX10 with certain family members extending their capabilities to full WHQL-compliant DirectX11.1 functionality

IMG PowerVR 7XT Plus

45http://imgtec.eetrend.com/article/7130

IMG PowerVR 7XT Plus

46http://imgtec.eetrend.com/article/7130

Features of ARM Mali

47

ARM Mali-200

48

ARM Mali-300

49

ARM Mali-400MP

50

ARM Mali-450MP

51

ARM Mali-T604

52

ARM Mali-T604

• GPGPU (support OpenCL 1.1)

• Tri-pipe architecture

• The first GPU based on the Midgard architecture

• True IEEE double-precision floating-point math in hardware for Full Profile

• The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU

• 5x performance improvement over previous Mali graphics processors.

53

ARM Mali-T624

9/10/201854

ARM Mali-T678

55

ARM Mali-T678

56

• 50% performance improvement compared to the Mali-T658.

ARM Mali-T760

57

ARM Mali-T880

58

ARM Mali Comparison

59

https://zh.wikipedia.org/wiki/Mali_(GPU)

ARM Mali Comparison

60

https://zh.wikipedia.org/wiki/Mali_(GPU)

Applications (1/7)

• Includes lots of applications

– Ray-tracer

– Image segmentation

– FFT/Linear Algebra

http://graphics.stanford.edu/data/3Ds

canrep/stanford-bunny-cebal-ssh.jpg

http://f.fwallpapers.com/images/3d

-bunny.jpg

09/02/11

Applications (2/7)

http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler-

into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table-

drawing-tablet?page=2

http://5pit.tw/tech/computer/tid_12880

Applications (3/7)

09/02/11

Applications (4/7)

http://wechatinchina.com/thread-461154-1-1.html

09/02/11

Applications (5/7)https://read01.com/Pnd3D.html

09/02/11

http://wechatinchina.com/thread-461154-1-1.html

Applications (6/7)

AR and VR Applications @@

Applications (7/7)

09/02/11

http://www.naipo.com/Portals/1/web_tw/Knowledge_Center/Industry_E

conomy/publish-482.htm

GPU Solve ALL Problems?

GPU Solve ALL Problems?

Summary

70

Understand the GPU pipeline in depth

Understand the motivation of of GPU hardware

Understand modern GPU hardware architecture and

specifications

Understand GPU/GPGPU applications and key problems

Reference

71

GPU Architecture & CG, Mark Colbert, 2006

Introduction to Graphics Hardware and GPUs, Yannick Francken,

Tom Mertens

GPU Tutorial, Yiyunjin, 2007

Evolution of GPU and Graphics Pipelining, Weijun Xiao

Commercial product website (NVIDIA, ATI, IMG, ARM).

Referencing SIGGRAPH 2005 Course Notes from David Luebke

Adapted from: David Luebke (University of Virginia) and NVIDIA

Jan Verschelde, MCS 572 Lecture 27, Introduction to

Supercomputing, 17 March 2014

Acknowledgement:

Thanks for TA’s help for preparing the material.