Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
2009-4-30John Lazzaro
(www.cs.berkeley.edu/~lazzaro)
EECS 150 Components & Design Techniques
For Digital Systems
Lecture 28: Graphics Processors
www-inst.eecs.berkeley.edu/~cs150/
TAs: CS 194-6 alums - Chris, Ilia, and Chen
Play
1
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Today: Graphics Processors
Computer Graphics. A brief introduction to “the pipeline”.
Stream Processing. Casting the graphics pipeline into hardware.
Unified Pipelines. GeForce 8800,from Nvidia, introduced in 2006.
Larrabee. Intel multi-core graphics architecture, SIGGRAPH 2008.
2
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Personal computer graphics architecture
Case Study: Mac Mini (PowerPC edition)
3
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Personal computer graphics architecture
4
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
Processor bus.How the CPUtalks to everything else.
CPU: PowerPC G4 (Freescale)
Bus controller. Low-cost Mac Mini only has 1. Most PCs have two: fast North Bridge, slow South Bridge.
Mac Mini G4: System block diagram
5
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
The bus controller talks to everything elseAGP 4X bus.Graphics chip.
PCI bus: Boot ROM, USB 2.
ATA/100 bus.For hard disk,DVD/CD ROM.
PCI, ATA, AGP devices can be bus master, for Direct Memory Access (DMA). Disk can write RAM directly.
6
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
ATI Radeon 9200: Graphics Processing Unit (GPU).
Mac Mini: Graphics sub-system
To Display
AGP 4X: Hi-Speed Graphics Bus
Dedicated Graphics RAM
Average selling price (ASP) for GPUs: $307
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
8
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
9
Parts + manufacturing cost: $283.37
Parts cost in volume: $274.69
GPU cost a significant part of total “Bill of Materials”
Source: iSuppli corporation
10
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
About 12 MB/frame (24-bit pixels)24 frames/sec: 300 MB/second
1600
2560
11
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
Anatomy of a “dumb” graphics card ...AGP 4X: 1.1 GB/s. Can handle 24 f/s(300 MB/s) for a 2560x1600 display.
DVI Formatter D/A
Control Logic
12 MB Frame Buffer
12 MB Frame Buffer
Double Buffering:
CPU writes “next frame” in one buffer.
Control logic sends “this frame” out of other buffer to display.
Problem: CPU has to compute a new pixel every 10 ns. 10 clock cycles for a 1 GHz CPU clock.
12
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Graphics Acceleration
Q. In a multi-core world, why should we use a special processor for graphics? A. Programmers generally use a certain coding style for graphics. We can design a processor to fit the style.
Q. What kind of graphics are we accelerating?A. In 2009, interactive entertainment (3-D games). In the 1990s, 2-D acceleration (fast windowing systems, games like Pac-Man).
Next: An intro to 3-D graphics.
13
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
The Triangle ...
Simplest closed shape that may be defined by straight edges.
With enough triangles, you can make anything.
14
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model includes faces we can’t see in this view.
A sphere whose faces are made up of triangles. With enough triangles, the curvature of the sphere can be made arbitrarily smooth.
15
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
A teapot (famous object in computer graphics history). A “wire-frame” of triangles can capture the 3-D shape of complex, man-made objects.
16
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Triangle defined by 3 verticesBy transforming (v’ = f(v)) all vertices in a 3-D object (like the teapot), you can move it in the 3-D world, change it’s size, rotate it, etc.
vertex vo = (xo, yo, zo)vertex v1 = (x1, y1, z1)
vertex v2 = (x2, y2, z2)
If a teapot has 10,000 triangles, need to transform 30,000 vertices to move it in a 3-D scene ... per frame!
17
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Vertex can have color, lighting info ...If vertices colors are different, this means that a smooth gradient of color washes across triangle.
vertex vo = (ro, go, bo)vertex v1 = (r1, g1, b1)
vertex v2 = (r2, g2, b2)
More realistic graphics models include light sources in the scene. Per-vertex information can carry information about how light hits the vertex.
18
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
We see a 2-D window into the 3-D world
Let’s follow
one 3-D
triangle.
19
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
From 3-d triangles to screen pixelsFirst, project each 3-D triangle that might
“face” the “eye” onto the image plane.
Then, create “pixel fragments” on the boundary of the
image plane triangle
Then, create “pixel fragments” to fill in the triangle
(rasterization).
Why “pixel fragments”? A screen pixel color might depend on many triangles (example: a glass teapot).
20
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Process pixel fragment to “shade” it.Algorithmic approach: Per-pixel computational model of metal and how light reflects off of it. Move teapot and what reflects off it changes.
21
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Process each fragment to “shade” it.Artistic approach: Artist paints surface of teapot in Photoshop. We “map” this “texture” onto each pixel fragment during shading.Final step: Output Merge. Assemble pixel fragments to make final 2-d image pixels.
22
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Real-world texture maps: Bike decals
23
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Applying texture maps: Quality matters! !"#$%&'()*!
! ! ! !
!
+,-.*/0! ! )*!)*1231*4( !!
!
!
5"67#$()/(8$679:#(&$;&7#$(<"9&$#"=6(
!
"#$%!%&'('!)*+,!-(*'./!"+0*(.,'(1!2334!%#+5%!1#'!/$,$1%!+)!1'610*'!)$/1'*$(7!+(!1+8.9:%!;<-%=!"#'!*.,>!#'*'!$%!8$?$8'8!$(1+!1#*''!%'&1$+(%=!@'&1$+(!A!*'&'$?'%!7++8!)$/1'*$(7!80'!1+!$1%!%$,>/'!B3!8'7*''!>*+C'&1$+(=!D+5'?'*E!F'$(7!+(!.(!.(7/'E!%'&1$+(%!G!.(8!H!*'&'$?'!/$11/'!)$/1'*$(7E!*'%0/1$(7!$(!F/0**'8!1'610*'%!5$1#!/$11/'!8'1.$/=!
!
!
5"67#$(2/(>="?@&#@A"%(+$;&7#$(5"9&$#"=6(@=(&B$(C$<@#%$(DD**(C+'(
!
"#'!I0,'('6!J(7$('!8'/$?'*%!.!).*!,+*'!*+F0%1!.($%+1*+>$&!)$/1'*$(7!./7+*$1#,!1#.1!.&&+0(1%!)+*!.//!%0*).&'%E!*'7.*8/'%%!+)!1#'$*!+*$'(1.1$+(=!AF+?'!$%!1#'!%.,'!$,.7'!*'(8'*'8!+(!1#'!;')+*&'!KK33!;"L=!M+1'!#+5!%'&1$+(%!G!.(8!H!.*'!).*!F'11'*!8')$('8=!
!
!
! !"#$%&'()*!
! ! ! !
!
+,-.*/0! ! )*!)*1231*4( !!
!
!
5"67#$()/(8$679:#(&$;&7#$(<"9&$#"=6(
!
"#$%!%&'('!)*+,!-(*'./!"+0*(.,'(1!2334!%#+5%!1#'!/$,$1%!+)!1'610*'!)$/1'*$(7!+(!1+8.9:%!;<-%=!"#'!*.,>!#'*'!$%!8$?$8'8!$(1+!1#*''!%'&1$+(%=!@'&1$+(!A!*'&'$?'%!7++8!)$/1'*$(7!80'!1+!$1%!%$,>/'!B3!8'7*''!>*+C'&1$+(=!D+5'?'*E!F'$(7!+(!.(!.(7/'E!%'&1$+(%!G!.(8!H!*'&'$?'!/$11/'!)$/1'*$(7E!*'%0/1$(7!$(!F/0**'8!1'610*'%!5$1#!/$11/'!8'1.$/=!
!
!
5"67#$(2/(>="?@&#@A"%(+$;&7#$(5"9&$#"=6(@=(&B$(C$<@#%$(DD**(C+'(
!
"#'!I0,'('6!J(7$('!8'/$?'*%!.!).*!,+*'!*+F0%1!.($%+1*+>$&!)$/1'*$(7!./7+*$1#,!1#.1!.&&+0(1%!)+*!.//!%0*).&'%E!*'7.*8/'%%!+)!1#'$*!+*$'(1.1$+(=!AF+?'!$%!1#'!%.,'!$,.7'!*'(8'*'8!+(!1#'!;')+*&'!KK33!;"L=!M+1'!#+5!%'&1$+(%!G!.(8!H!.*'!).*!F'11'*!8')$('8=!
!
!
“Good” algorithm. B and C look blurry.
“Better” algorithm. B and C aredetailed.
24
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Putting it All Together ...
Luxo, Jr: Short movie made by Pixar, shown at SIGGRAPH in 1986.
First Academy Award given to a computer graphics movie.
25
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
26
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Graphics Acceleration
Next: Back to architecture ...
27
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
The graphics pipeline in hardware (2004)Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
Output Merge
Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
To display
Create pixels fragments
Algorithms are usually hardwired
Process each vertex
3-D vertex “stream” sent by CPU
Programmable CPU”Vertex Shader”
Process pixel fragments
Programmable CPU”Pixel Shader”
Programming Language/API? DirectX, OpenGL
28
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Vertex Shader: A “stream processor”
Shader CPU
Input Registers (Read Only)
Vertex “stream” from CPUOnly one vertex at a time placed in input registers.
Constant Registers
(Read Only)
From CPU: changes slowly (per frame,
per object)
Output Registers (Write Only)
Vertex “stream” ready for 3-D to 2-D conversion
Shader creates one vertex out for each vertex in.Working
Registers (Read/Write)
Shader Program Memory
Short (ex: 128 instr)straight-line code. Same code runs on every vertex.
29
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Optimized instructions and data formats
Input Registers
From CPU
Output Registers
Shader CPU Shader Program Memory
128-bit registers, holding four 32-bit floats.
Typical use: (x,y,z,w) representation of a point in 3-Dspace.
x y z w
x y z w
Typical instruction:
rsq dest src
dest.{x,y,z,w} = 1.0/sqrt(abs(src.w)).If src.w=0, dest ∞.
The 1/sqrt() function is often used in graphics.
To 3-D/2-D 30
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Easy to parallelize: Vertices independent
Input Registers
From CPU
Output Registers
Shader CPU
x y z w
x y z w
Input Registers
Output Registers
Shader CPU
x y z w
x y z w
To 3-D/ 2-D
Caveat: Care might be needed when merging streams.
Why?3-D to 2-D may expect triangle vertices in order in the stream.
Shader CPUs easy to multithread.
31
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Pixel shader specializations ...
Process each vertex
Create pixels fragments
Output Merge
Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
Pixel shader needs fast access to the map of Europe on teapot (via graphics card RAM).
Texture maps (look-up tables) play a key role.
Process pixel fragments
”Pixel Shader” CPU
32
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Pixel Shader: Stream processor + Memory
Shader CPU
Input Registers (Read Only)
Pixel fragment stream from rasterizerOnly one fragment at a time placed in input registers.
Constant Registers
(Read Only)
From CPU: changes slowly (per frame,
per object)
Registers (Read/Write)
Register R0 is pixel fragment,
ready for output merge
Shader creates one fragment out for each fragment in.
Indices into texture maps.
TextureRegisters
Texture Engine
Memory System
Engine does interpolation.
33
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Example Design: Nvidia GeForce 7900
Vertex Shaders: 8
Pixel Shaders:24
3-D to 2-D
Output Merge Units
Texture Cache
278 Million Transistors, 650 MHz clock, 90 nm process
Figure 2-1 Block diagram
167 MHzMaxBus
12 MbpsUSB
PCI bus
BootROM
USB 2.0 port (480 Mbps)
USB 2.0 port (480 Mbps)
PCI USB 2.0controller
DDR SDRAMDIMM slot
32 MBDDR RAM
DVI/VGA/composite/S-videooutput port
Ethernet port10/100 Mbps
FireWire 400 port
AGP 4Xbus
167 MHzMemory
bus
PMUpower controller
Powerbutton Fan
Opticaldrive
UltraATA/100
bus
Device 0
Device 1
Headphone/audio line-out jack
Hard diskdrive
Radeon9200
graphics IC
Audiocodec
EthernetPHY
FireWirePHY
PowerPC G4microprocessor
(L2 cache: 512K 1:1)
AirPort Extreme
I2S
I2S
I2C
BluetoothModem port
Modem module
Data pumpand DAA
Built-inspeaker
Intrepidmemorycontrollerand I/Odevice
controller
Main ICs and Buses
The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.
The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.
The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.
Each of the components listed here is described in one of the following sections.
16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R 2
Architecture
34
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Break Time ...
Next: Unified architectures
Play
35
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Unified Architectures
Basic idea: Replace specialized logic (vertex shader, pixel shader, hardwired algorithms) with many copies of one unified CPU design.
Consequence: You no longer “see” the graphics pipeline when you look at the architecture block diagram.
Designed for: DirectX 10 (Microsoft Vista), and new non-graphics markets for GPUs.
36
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
!
!"#$%&"'%!()*(+,'-$.!./&(!"+(/0!%#0-(./1+'+/1+/2+()+!3++/(4+#!.52+6( 0/1( )+!3++/( '.,+-6( 7#0&8+/!69( (:%-!.'-+( ./6!0/2+6( $7( 4+#!+,(0/1( '.,+-( 6"01+#6( 0#+( %6+1( !$( '#$2+66( ./1+'+/1+/!( 4+#!.2+6( 0/1('.,+-( 7#0&8+/!6( ./('0#0--+-9( (;0#130#+( .8'-+8+/!0!.$/6( !*'.20--*(./2-%1+( 0( -0#&+#( /%8)+#($7( '.,+-( 6"01+#6( !"0/(4+#!+,( 6"01+#6( #+57-+2!./&(!"+(".&"+#(#0!.$($7('.,+-6(!$(4+#!.2+6(./(0(!*'.20-(#+/1+#./&(3$#<-$01(=:$/!#*8(0/1(:$#+!$/(>??@A9((B".6(2"0#02!+#.6!.2(0-6$(./7-%+/2+6(!"+(2$6!($7('.,+-(6"01+#6(#+-0!.4+(!$(4+#!+,(6"01+#6(6./2+('.,+-(6"01+#6(0#+(8$#+("+04.-*(#+'-.20!+19(
B"+('#$Ͱ)-+('.'+-./+( .6(1.#+2!+1(%6./&(0( -$35-+4+-(0)56!#02!.$/( -0*+#( 6%2"( 06( C'+/DE( $#( F.#+2!GF9( ( B"+( 0)6!#02!.$/(-0*+#(6+#4+6(!$(".1+(!"+(1.77+#+/2+6()+!3++/(40#*./&(.8'-+8+/!05!.$/6($7(!"+('.'+-./+(0/1('#$4.1+(0(8$#+(2$/4+/.+/!('#$X./&(0)6!#02!.$/9((H.,+1('-0!7$#86I(6%2"(06(2$/6$-+6I(1.77+#(7#$8(JK6(./(!"0!(!"+#+(.6($/-*($/+("0#130#+(.8'-+8+/!0!.$/I(6$($7!+/(-$35-+4+-(1+!0.-6($7(!"+("0#130#+(0#+(+,'$6+1(!"#$%&"(!"+(0)6!#02!.$/(-0*+#9((
L+( #+7+#( !$( !"+( 0)6!#02!.$/( -0*+#( 06( 0( "#$%&'(( 0/1( .!( .6( 2$/5!#$--+1(!"#$%&"(.!6(MJN9(B"+(#%/!.8+('#$4.1+6(1+4.2+(./1+'+/1+/!(#+6$%#2+(80/0&+8+/!( O0--$20!.$/I( -.7+!.8+I( ./.!.0-.P0!.$/I( 4.#!%0-5.P0!.$/I(+!2Q(7$#(!+,!%#+(80'6I(4+#!+,()%77+#6I(0/1($!"+#(6!0!+(0/1(.!(2$88%/.20!+6( 3.!"( !"+( "0#130#+( 022+-+#0!$#( !"#$%&"( 1+4.2+51+'+/1+/!( 1#.4+#( 6$7!30#+9( ( B"+( !#0/6.!.$/( !$( 0( '#$Ͱ)-+('.'+-./+( "06( 011+1( !"+( !06<( $7( 0)6!#02!./&( 0/1(80/0&./&( 6"01+#('#$V(!$(!"+(#%/!.8+9(
B"+( -.8.!+1( ./6!#%2!.$/( 6!$#+( $7( +0#-*( '#$Ͱ)-+( '#$2+656$#6( 801+( !"+( 2"$.2+( $7( '#$X./&( ./( 0/( 066+8)-*5-.<+( -0/5&%0&+(=D#0*(>??GA()$!"('#02!.20-(0/1(./(80/*(206+6(/+2+660#*(!$(80,.8.P+( 2$/!#$-( $7( !"+( -.8.!+1( #+6$%#2+69( ( ;$3+4+#I( 8$1+6!(./2#+06+6( ./( 040.-0)-+( "0#130#+( #+6$%#2+6( 2#+0!+1( 0( /++1( 7$#( 0(".&"+#5-+4+-( '#$X./&( 0)6!#02!.$/( !$( 80,.8.P+( '#$X+#('#$1%2!.4.!*9(K5-.<+('#$X./&( -0/&%0&+6(3.!"( 6$8+(2%6!$8.5P0!.$/6( !$( 80!2"( !"+( %/1+#-*./&( #+/1+#./&( '.'+-./+( OR54+2!$#6I(./!#./6.26I(NSC(#+&.6!+#6Q(0/63+#+1(!".6(/++1(=J#$%17$$!(+!(0-9(>??TU(:.2#$6$7!( >??>U( :0#<( >??GU( V+66+/.2"( >??RU( :2K$$-( 0/1( F%(B$.!(>??RUA((M11.!.$/0--*I($!"+#(-0/&%0&+6("04+()++/(1+4+-$'+1(!$(+,'-$#+( !"+( %6+( $7( !"+( 6%)6!0/!.0-( 7-$0!./&5'$./!( '#$2+66./&( 0/1(8+8$#*( )0/13.1!"( $7(DJW6( 7$#( 0''-.20!.$/( 1$80./6( $!"+#( !"0/(
#+/1+#./&(=X%2<(+!(0-9(>??RU(:2K$#8.2<(+!(0-9(>??RAI()%!(3+(3.--(/$!(011#+66(!".6(-0!!+#(6%)Y+2!(7%#!"+#(./(!".6('0'+#9(
L".-+( !"+#+(0#+(6.8.-0#.!.+6( !$( .8'+#0!.4+(KJW('#$X./&(-0/&%0&+6(O/$!0)-*(KQI(!"+#+(0#+(6$8+(6.&/.7.20/!(1+'0#!%#+69((H$#(+,08'-+I(!"+(802"./+(0/1(2$8'.-0!.$/(8$1+-(.6(8$#+(4.#!%0-(8052"./+5-.<+I( 3.!"( !"+( 6"01+#( 066+8)-*( -0/&%0&+( 6+#4./&( 06( 0(8052"./+5./1+'+/1+/!( ./!+#8+1.0!+( -0/&%0&+( ONEQ( #0!"+#( !"0/( 0( 6'+52.7.2(802"./+( -0/&%0&+T9( B"$%&"( 0( ".&"5-+4+-( -0/&%0&+( -.<+(:.52#$6$7!Z6(;E[E(20/()+(2$8'.-+1( !$( NE($77-./+I( !"+( !#0/6-0!.$/( !$(!"+(!0#&+!("0#130#+($22%#6(Y%6!(./(!.8+(O\NBQ(0!(#%/5!.8+(3.!"(!"+(!#0/6-0!$#( .8'-+8+/!+1(06('0#!($7( !"+(1#.4+#( ./7#06!#%2!%#+(7$#( !"+(DJW9( (L+(/$!+( !"0!( !"+(C'+/DE(["01./&(E0/&%0&+( !0<+6(0(1.757+#+/!( 0''#$02"( 3.!"( !"+( +/!.#+( 2$8'.-0!.$/( '#$2+66( $22%#./&( 0!(#%/5!.8+9((
M/$!"+#( 6.&/.7.20/!( 1.77+#+/2+( .6( !"0!( 6"01./&( '#$V( 0#+(/$!( 6!0/10-$/+(0''-.20!.$/6I( 0/1(0#+( ./6!+01(+,+2%!./&( ./(2$/2+#!(3.!"(0('#$(+,+2%!./&($/(!"+(KJW(!"0!($#2"+6!#0!+6(!"+(#+/1+#5./&('.'+-./+9( (B"+(KJW('#$(0-6$( 6%''-.+6( '0#08+!+#6( !$( !"+(6"01./&('#$(./(!"+(7$#8($7(!+,!%#+(80'6($#()*('$'%-0!./&($/52".'(#+&.6!+#6(20--+1(2$/6!0/!69(
L".-+( !".6( '0'+#( 1$+6( /$!( 1+62#.)+( 0( 6'+2.7.2( "0#130#+( +85)$1.8+/!($7( !"+(/+3('.'+-./+(0#2".!+2!%#+I( !"+('.'+-./+(1+6.&/( .6(6"0'+1(6.&/.7.20/!-*()*("0#130#+('#02!.20-.!.+6(0/1(306(1+6.&/+1(2$/2%##+/!-*( 3.!"( 8%-!.'-+( "0#130#+( .8'-+8+/!0!.$/69( :0/*( $7(!"+( 6!#%2!%#0-( %/1+#'.//./&6( 7#$8( 2%##+/!( "0#130#+( .8'-+8+/!05!.$/6( =MBN( >??@U( F$&&+!!( >??@U( :$/!#*8( 0/1( :$#+!$/( >??@A(2$/!./%+(!$()+()$!"(#+-+40/!(0/1(./7-%+/!.0-(./(!".6(1+6.&/9(
!"#$%&"' ''()(''*++(' '''*)+''*++*' ',)+''*++-. ''-)+''*++/'
T>]( >@^( !@T>(./6!#%2!.$/(6-$!6(
R_]` G>_^R` !@T>(
!^RV(
!a^( !>@^( !>@^(2$/6!0/!(#+&.65
!+#6( ]( G>( >>R(
T^,R?a^(
T>( T>( G>(!8'(#+&.6!+#6(
>( T>( G>(
R?a^(
T^( T^( T^( T^(./'%!(#+&.6!+#6(
R_>b ]_>b T?( G>(
#+/1+#(!0#&+!6( T( R( R( ](
608'-+#6( ]( T^( T^( T^(
( ( R(!+,!%#+6(
( ]( T^( T^(
T>](
>F(!+,(6.P+( ( ( >V,>V( ]V,]V(
./!+&+#($'6( ( ( ( !(
-$01($'( ( ( ( !(
608'-+($776+!6( ( ( ( !(
!( !( !(!#0/62+/1+/!0-(
$'6( ( !( !(
!(
(
1+#.40!.4+($'( ( ( !( !(
( 6!0!.2( 6!0!S1*/(7-$3(2$/!#$-(
( ( ( 6!0!S1*/(
1*/08.2(
!"#$%&'(&!"#$%&'()$%*'+%#,-&%'.)(/#&01)2'1-((#&34&&51/%.0+0.#,0)2'&%*%#1%$'02'67768'"#&$9#&%'02'677:;'
<,%=,-&%'*)#$'>'#&0,"(%,0.'
021,&-.,0)21;'?,%=,-&%'>'.)*)&'&%@01,%&1;'$#1"%$'*02%'1%/#&#,%1'A%&,%='1"#$%&'
B#C)A%D'+&)('/0=%*'1"#$%&'BC%*)9D'
)*&!+%&,-.%$-/%&
B"+(F.#+2!GF( T?( '.'+-./+( #+!0./6( !"+( 6!#%2!%#+( $7( !"+( !#01.!.$/0-("0#130#+5022+-+#0!+1( GF( '.'+-./+9( ( B3$( /+3( 6!0&+6( "04+( )++/(011+1(0/1($!"+#(6!0&+6("04+()++/(+.!"+#(6.8'-.7.+1($#(7%#!"+#(&+/5+#0-.P+19((B"+()06.2('.'+-./+(.6(.--%6!#0!+1(./(H.&%#+(T9(H$#(2$/6.65!+/2*(3+(1+62#.)+(+02"($7(!"+('.'+-./+(6!0&+6I(#0!"+#(!"0/(Y%6!(!"+(011.!.$/69(L+( %6+( !#01.!.$/0-( !+#86( 6%2"( 06( 4+#!+,I( !+,!%#+I( 0/1('.,+-( 7$#( 2$/!./%.!*( 3.!"( '#.$#( /$8+/2-0!%#+I( )%!( 02</$3-+1&+(!"0!( !".6( !+#8./$-$&*( #+7-+2!6( 0( 6'+2.7.2( %60&+( $7( 0(8$#+( &+/+#0-('#$2+66./&(20'0).-.!*9(
012%$' 344"567"&' 8039( &0!"+#6( TF( 4+#!+,( 10!0( 7#$8(%'( !$( ](./'%!(6!#+086(0!!02"+1(!$(4+#!+,()%77+#6(0/1(2$/4+#!6(10!0(.!+86(!$(0(20/$/.20-(7$#80!(O+9&9I(7-$0!G>Q9((c02"(6!#+08(6'+2.7.+6(0/(./1+5'+/1+/!( 4+#!+,( 6!#%2!%#+( 2$/!0././&( %'( !$( T^( 7.+-16( O20--+1( +-+58+/!6Q9( (M/(+-+8+/!( .6(0("$8$&+/$%6( !%'-+($7(T( !$(R(10!0( .!+86(O+9&9I( 7-$0!G>6Q9( (M(4+#!+,( .6( 066+8)-+1()*( #+01./&( 7#$8( !"+(2%#5#+/!-*(+/0)-+1(6!#+0869((d$#80--*(4+#!+,(10!0(.6(#+01(6+e%+/!.0--*(7#$8(+02"(4+#!+,()%77+#U("$3+4+#I( .7(0/(./1+,()%77+#( .6(6'+2.7.+1(!"+/( +02"( 6!#+08(%6+6( 0( 6"0#+1( ./1+,( !$( 2$8'%!+( !"+( $776+!( ./!$(+02"(4+#!+,()%77+#9( ( ( N/1+,./&(0--$36(011.!.$/0-('+#7$#80/2+($'5!.8.P0!.$/6( ./( !"0!( !"+(4+#!+,('#$2+66$#( 2$8'%!+6( 0( #+6%-!( !"0!( .6(2$8'-+!+-*( 1+!+#8./+1( )*( !"+( ./1+,( 40-%+I( !"+#+7$#+( #+2$8'%!05!.$/( $7( #+6%-!6( 7$#( !"+( 608+( ./1+,( 20/( )+( 04$.1+1( %6./&( 0( #+6%-!(202"+(./1+,+1()*(!"+(./1+,(40-%+9((
B"+(NM(0-6$(6%''$#!6(0(8+2"0/.68(!"0!(0--$36(!"+(NM(!$(+77+25!.4+-*(#+'-.20!+(0/($)Y+2!($( !.8+69(B".6(8+2"0/.68(.6(0/(011#+665./&(8$1+( #+7+##+1( !$( 06( &$)%*$+&$,( ./(3".2"( 0( #+'+0!( 2$%/!($( .6(066$2.0!+1(3.!"()-$2<($7(-( 4+#!.2+6( O2$##+6'$/1./&( !$( 0/($)Y+2!Q9((M!( !"+( 608+( !.8+I( !"+( '#.8.!.4+( 10!0( .6( f!0&&+1g( 3.!"( 0( 2%##+/!(./6!0/2+I('#.8.!.4+I(0/1(4+#!+,(.1(0/1(!"+6+(.16(20/()+(022+66+1(./(!"+( '#$Ͱ)-+( 6!0&+6( !$( 2$8'%!+( 40-%+6( 6%2"( 06( !#0/67$#805!.$/6($#(80!+#.0-('0#08+!+#6()06+1($/(!"+6+(.169(
(((((((((((((((((((((((((((((((((((((((( (((((((((((((((((((((((((T(B".6(1$+6(2$/!#01.2!(!"+(/$!.$/(!"0!(!"+(066+8)-*5-+4+-(6"01+#('#$͚+#("06(0)6$-%!+(2$/!#$-9(
DirectX 10 (Vista): Towards Shader UnityEarlier APIs: Pixel and Vertex CPUs very different ...
DirectX 10: Many specs are identical for Pixel and Vertex CPUs
37
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
DirectX 10 : New Pipeline Features ...!
!"#$"%& '()*"#& +!',! "#! $%#&! '%$$%()*! +#,-! &%! &./(#0%.$!1,.&"',#! 0.%$! %23,'&! #4/',! &%! ')"4! #4/',5! 67,!89! .,/-#! /! #"(:),!1,.&,;!/(-!4.%-+',#!/!#"(:),!1,.&,;!/#!%+&4+&5!!67,!89!/(-!%&7,.!4.%:./$$/2),!#&/:,#!#7/.,!/!'%$$%(!0,/&+.,!#,&!&7/&!"(')+-,#!/(!,;4/(-,-!#,&!%0!0)%/&"(:<4%"(&=!"(&,:,.=!'%(&.%)=!/(-!$,$%.*!.,/-!"(#&.+'&"%(#! /))%>"(:! /'',##! &%! +4! &%! ?@A! $,$%.*! 2+00,.#! B&,;<&+.,#C!/(-!?D!4/./$,&,.! B'%(#&/(&C!2+00,.#5!67"#!'%$$%(!'%.,! "#!-,#'."2,-!"(!$%.,!-,&/")!"(!9,'&"%(!E5!
-"./"$#0&'()*"#& +-',! &/F,#! &7,!1,.&"',#!%0!/!#"(:),!4."$"<&"1,! B4%"(&=! )"(,! #,:$,(&=! %.! &."/(:),C! /#! "(4+&! /(-! :,(,./&,#! &7,!1,.&"',#!%0!G,.%!%.!$%.,!4."$"&"1,#5! !67,!"(4+&!/(-!%+&4+&!4."$"<&"1,! &*4,#!(,,-!(%&!$/&'7=!2+&! &7,*!/.,! 0";,-!0%.! &7,!#7/-,.!4.%<:./$5!!!H!I9!4.%:./$!'/(!/$4)"0*!&7,!(+$2,.!%0!"(4+&!4."$"&"1,#!2*!,$"&&"(:!/--"&"%(/)!4."$"&"1,#!#+23,'&!&%!/!4,.<"(1%'/&"%(!)"$"&!%0!?J@E!K@<2"&!1/)+,#!%0!1,.&,;!-/&/5!!6."/(:),#!/(-!)"(,#!/.,!%+&<4+&! /#! '%((,'&,-! #&."4#! %0! 1,.&"',#5! H! I9! 4.%:./$! '/(! %+&4+&!$%.,! &7/(! %(,! #&."4! "(! /! #"(:),! "(1%'/&"%(! %.! "&! '/(! ,00,'&"1,)*!-,),&,!/(! "(4+&!4."$"&"1,!2*!(%&!4.%-+'"(:!/(!%+&4+&5!H!I9!4.%<:./$! '/(! /)#%! #"$4)*! /00";! /--"&"%(/)! /&&."2+&,#! &%! /! 4."$"&"1,!>"&7%+&! :,(,./&"(:! /--"&"%(/)! :,%$,&.*=! 0%.! ,;/$4),=! '%$4+&"(:!/--"&"%(/)!+("0%.$<1/)+,-!/&&."2+&,#!0%.!,/'7!4."$"&"1,5!!9"(',!/))!%0!&7,!4."$"&"1,!1,.&"',#!/.,!/1/")/2),=!:,%$,&."'!/&&."2+&,#!#+'7!/#!/!&."/(:),L#!4)/(,!,M+/&"%(!'/(!2,!.,/-")*!'%$4+&,-5!!
N(!/--"&"%(!&%!&7,!&./-"&"%(/)!"(4+&!4."$"&"1,#=!&."/(:),!/(-!)"(,!4."$"&"1,#!$/*!/)#%!2,!4.%',##,-!>"&7!&7,".!/-3/',(&!1,.&"',#5! !H!&."/(:),!'%$4."#,#!K!1,.&"',#!4)+#!K!/-3/',(&!1,.&"',#!>7"),!/!)"(,!7/#!@!1,.&"',#!>"&7!@!/-3/',(&!1,.&"',#!/#!#7%>(!"(!O":+.,!@5!!H-<3/',(&!1,.&"',#!/.,!"(')+-,-!/#!4/.&!%0!&7,!1,.&,;!2+00,.!0%.$/&#!0%.!&."/(:),! /(-! )"(,! 4."$"&"1,#! /(-! /.,! ,;&./'&,-! 2*! &7,! NH!>7,(! /!4."$"&"1,!&%4%)%:*!>"&7!/-3/',('*!"#!#4,'"0",-!B.,(-,.,-C5!
'$#")/&12$32$& +'1,! '%4",#!/!#+2#,&!%0! &7,!1,.&,;! "(0%.$/<&"%(!%+&4+&!2*! &7,!I9! &%!+4! &%!E!?P!%+&4+&!2+00,.#! "(!#,M+,(&"/)!
%.-,.5! ! N-,/))*! &7,!9Q!#7%+)-!7/1,!#*$$,&."'!%+&4+&!'/4/2")"&",#!>"&7!&7,!B(%(<"(-,;,-C!"(4+&!'/4/2")"&",#!%0!&7,!NH!BA!#&.,/$#!;!?D!,),$,(&#C=! 2+&! &7,! 7/.->/.,! '%#&#! >,.,! (%&! 3+#&"0",-5! 67,! 9Q! "#!)"$"&,-! &%! ,"&7,.! ?!$+)&"<,),$,(&! %+&4+&! #&.,/$!%0! +4! &%! ?D! ,),<$,(&#!%.!+4!&%!E!#"(:),<,),$,(&!%+&4+&!#&.,/$#5!R7"),!&7,!NH!'/(!#+44%.&! .,/-"(:! 0.%$!A<! /(-! ?D<2"&! -/&/! &*4,#! /(-! '%(1,.&"(:! &%!0)%/&K@=! &7,!9Q!'/(!%()*!>."&,! ./>!K@<2"&!-/&/! &*4,#5! !S%>,1,.=!-/&/!'%(1,.#"%(!/(-!4/'F"(:!'/(!2,!,/#")*! "$4),$,(&,-! "(!/!I9!4.%:./$!.,-+'"(:!&7,!(,,-!0%.!0";,-<0+('&"%(!#+44%.&5!
'"$423& )5*& 6)7$"#89)$8.5& '$):"& +6',! "#! /! 0";,-<0+('&"%(!#&/:,! 7/(-)"(:! ')"44"(:=! '+))"(:=! 4,.#4,'&"1,! -"1"-,=! 1",>4%.&!&./(#0%.$=!4."$"&"1,!#,&<+4=!#'"##%."(:=!-,4&7!%00#,&=!/(-!0./:$,(&!:,(,./&"%(5! !T%-,.(!IUV!-,#":(#! "(1/."/2)*! "(')+-,! #%$,! 0%.$!%0!,/.)*!-,4&7!4.%',##"(:!BG<'+))=!7",./.'7"'/)<GC!WH6N!@JJXY!T%(<&.*$!/(-!T%.,&%(!@JJXZ!/#!>,))5!!R,!,;4)"'"&)*!$,(&"%(!&7"#!%4<&"$"G/&"%(!/#!"&!"#!2,'%$"(:!),##!&./(#4/.,(&!&%!/44)"'/&"%(!-,1,)<%4,.#5!67,!"(4+&!%0!&7,![9!"#!&7,!1,.&"',#!/(-!/&&."2+&,#!%0!/!#"(:),!4."$"&"1,!/(-!&7,!%+&4+&!"#!/!#,.",#!%0!4";,)!0./:$,(!!!
67,!4";,)!#7/-,.!4.%:./$!#4,'"0",#!&7,!$/((,.!"(!>7"'7!1,.<&,;! /&&."2+&,#! /.,! "(&,.4%)/&,-! &%! 4.%-+',! 0./:$,(&! /&&."2+&,#! B(%!"(&,.4%)/&"%(=!(%(<4,.#4,'&"1,<'%..,'&,-!"(&,.4%)/&"%(=!%.!4,.#4,'<&"1,<'%..,'&,-!"(&,.4%)/&"%(C5!!T%-,.(!IUV#!+#+/))*!#+44%.&!$+)<&"#/$4),! /(&"/)"/#"(:! WHF,),*! ?\\KZ5! ! T+)&"#/$4)"(:! .,M+".,#!/--"&"%(/)!'/.,!"(!#4,'"0*"(:!/&&."2+&,!,1/)+/&"%(!2,7/1"%.!>7,(!/!0./:$,(&!-%,#!(%&!"(')+-,!&7,!4";,)!',(&,.=!#"(',!',(&,.!,1/)+/&"%(!$/*! .,#+)&! "(! /(! %+&<%0<:/$+&! 1/)+,5! ! H(! /--"&"%(/)! ,1/)+/&"%(!M+/)"0",.! B',(&.%"-C! '/(!2,! #4,'"0",-! &%! .,M+,#&! ,1/)+/&"%(!>"&7"(!&7,!0./:$,(&!2%+(-/.",#5!
;8%"<&'()*"#& +;',!.,/-#!&7,!/&&."2+&,#!%0!/!#"(:),!4";,)!0./:<$,(&!/(-!4.%-+',#!/! #"(:),!%+&4+&! 0./:$,(&!'%(#"#&"(:!%0!?! &%!A!/&&."2+&,!B'%)%.C!1/)+,#!/(-!%4&"%(/))*!/!-,4&7!1/)+,5!!!67,!/&&."2<+&,! 1/)+,#! B,),$,(&#C! /.,! ,/'7!>."&&,(! &%! /! #,4/./&,! '%)%.! 2+00,.!B&,.$,-!/!"#$%#"!&'"(#&C!%.!&7,!,(&".,!.,#+)&!$/*!2,!-"#'/.-,-!B(%!0./:$,(&! "#!%+&4+&C5! ! !]%.$/))*!-,4&7!/(-!#&,('")!1/)+,#!/.,! 0%.<>/.-,-! 0.%$! &7,! [95! ! S%>,1,.=! &7,! U9! '/(! .,4)/',! &7,! -,4&7!1/)+,!>"&7!/!'%$4+&,-!1/)+,=!2+&!(%&!&7,!#&,('")!1/)+,5!!^%&7!-"#<'/.-"(:! 4";,)#! /(-! .,4)/'"(:! &7,! -,4&7! 1/)+,! $/*! -,0,/&! -,4&7<4.%',##"(:! %4&"$"G/&"%(#! "(! &7,! [9! #"(',! &7,*! '/(! '7/(:,! &7,!0./:$,(&L#!1"#"2")"&*5!
!!"#$%&'()'!"#$%&'!()*(+"+$,"-$.'
/012#(033"&"2-4(0#$(5"65,"65&$3.(
!"#$%&'(()*+,)-.!'/
012),&3456)-.03/
7$%#$%&8)-9)-.78/
:,1#&;&0-<=)>%&;3)%$#&;&?5-,@&A&;&
B5(%)-1C)&.B3/
D)-%)2&3456)-.D3/
E)<*)%-@&3456)-&.E3/
8)*<-@
!"6)2F$GG)-
H)#%4I3%)">1,
3%-)5*7$%#$%&.37/
J2KL+
J2KL+
KL+
MN2J2KL+
KL+;O+
J2KL+&<-
MN2J2KL+
D)-%)2F$GG)-
D)-%)2F$GG)-
P)2%$-)
O
MLO
P)2%$-)
3%-)5*F$GG)-
P)2%$-)
B)"6)-P5-9)%
J&<-&M
MLO
O
35*#,)-
MN
35*#,)-
35*#,)-
MN
MN2J2KL+
KL2J2KL+
&&&O2J2KL+&;
KL+&;&O+
!6(
&:,1#I:$,,&;&
BP&'--5@
:<"(%5"%
:<"(%5"%
:<"(%5"%
J2KL+
Q5>1"9
MLO
M&1"RM&<$%
M&1"R&M&<$%
M&1"R&ST*5"@&<$%
M&1"R&ST*5"@&<$%
M&1"R&STM&<$%
M&1"R&M&<$%
12$32$&="#:"#&+1=,>!&/F,#!/!0./:$,(&!0.%$!&7,!U9!/(-!4,.<0%.$#! &./-"&"%(/)! #&,('")! /(-! -,4&7! &,#&"(:! %4,./&"%(#! /#! >,))! /#!.,(-,.!&/.:,&!2),(-"(:5!67,!QT!#4,'"0",#!2"(-!4%"(&#!0%.!/!#"(:),!+("0",-! -,4&7_#&,('")! 2+00,.! /(-! +4! &%! A! %&7,.! .,(-,.! &/.:,&#! B/&<&."2+&,!2+00,.#C5!!67,!4";,)!#7/-,.!$+#&!%+&4+&!/!#,4/./&,!1/)+,!0%.!,/'7!.,(-,.!&/.:,&!B&7,.,!"#!(%!$+)&"'/#&C5!!R7"),!/!#"(:),!2),(-"(:!0+('&"%(!"#!#7/.,-!/'.%##!/))!%0!&7,!.,(-,.!&/.:,&#=!2),(-"(:!'/(!2,!,(/2),-!%.!-"#/2),-!"(-,4,(-,(&)*!0%.!,/'7!.,(-,.!&/.:,&5!
DS
DM
DL
'S
'L
'M
DS
DM
'S
'M
!"#$%&'*)'7#"0-6,$(0-3(,"-$(4$68$-&(9"&5(0310%$-&(:$#&"%$4.'
+,('-&./%0'12%$32$%&'456'7424'!8/9'
T%-,.(!IUV#!.,)*!7,/1")*!%(!4.%',##"(:!.,&/"(,-!-/&/!#&.+'&+.,#!"(!&7,!0%.$!%0!1,.&,;!/(-!"(-,;!2+00,.#=!&,;&+.,!$/4#=!.,(-,.!&/.<:,&#! /(-! -,4&7_#&,('")! 2+00,.#5! IUV#! &*4"'/))*! #&%.,! &7,#,! "(! /!7":7<4,.0%.$/(',!$,$%.*! #*#&,$! /&&/'7,-! -".,'&)*! &%! &7,!IUV5!!67,! ./(:,! %0! #&.+'&+.,#! "(')+-,#! 7%$%:,(,%+#! ?P! &7.%+:7! KP!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!@!9%$,!"$4),$,(&/&"%(#!&./-"&"%(/))*!.,0,.!&%!&7"#!0+('&"%(/)"&*!/#!`[QUa!0%.!./#&,.!%4,./&"%(#5!
Geometry Shader: Lets a shader program create new triangles.
Stream Output: Lets vertex streamrecirculate through shaders many times ...(and also, back to CPU)
Also: Shader CPUs are more
like RISC machines in many ways.
38
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Why? “Particle Systems” ...Why? Particle systems ...
39
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Why? Fractal images ...
40
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
! GeForce 8800 Architecture in Detail
!
!
Figure 12. GeForce 8800 GTX Block Diagram !
!
!
!
TB-02787-001_v1.0 19
NVidia 8800: Unified GPU, announced Fall 2006
128 Shader CPUs Streams
loop around...
Thread processor sets shader type of each CPU
1.35 GHz Shader CPU Clock, 575 MHz core clock41
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
! GeForce 8800 Architecture in Detail
!
!
Figure 12. GeForce 8800 GTX Block Diagram !
!
!
!
TB-02787-001_v1.0 19
Graphics-centric functionality ...3-D to 2-D (vertex to pixel)
Pixel fragment output merge
Texture engine and memory system
42
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Can be reconfigured with graphics logic hidden ...
128 scalar 1.35 GHz processors: Integer ALU, dual-issue single-precision IEEE floats.
! GeForce 8800 Architecture Overview
!
!
Figure 9. CUDA Thread Computing Pipeline "#$%!&'()*&+!'&,!(--*./(0.1'+!,.02!(!+0('3(43!-*(05146!514!&704(/0.'8!9(*:()*&!.'5146(0.1'!5416!9(+0!;:('0.0.&+!15!4(,!3(0(<!('3!-419.3&!02&!51**1,.'8!=&>!)&'&5.0+!.'!02.+!(4&(?!
! @'()*&+!2.82!3&'+.0>!/16-:0.'8!01!)&!3&-*1>&3!1'!+0('3(43!&'0&4-4.+&!,14=+0(0.1'+!('3!+&49&4!&'9.41'6&'0+!514!3(0(!.'0&'+.9&!(--*./(0.1'+A!
! $.9.3&+!/16-*&7!/16-:0.'8!0(+=+!.'01!+6(**&4!&*&6&'0+!02(0!(4&!-41/&++&3!+.6:*0('&1:+*>!.'!02&!BC#!01!&'()*&!4&(*D0.6&!3&/.+.1'!6(=.'8A!
! C419.3&!(!+0('3(43!-*(05146!)(+&3!1'!.'3:+04>D*&(3.'8!EFG$G%!2(43,(4&!('3!+150,(4&!514!(!,.3&!4('8&!15!2.82!3(0(!)('3,.302<!/16-:0(0.1'(**>!.'0&'+.9&!(--*./(0.1'+A!
! "16).'&+!,.02!6:*0.D/14&!"C#!+>+0&6+!01!-419.3&!(!5*&7.)*&!/16-:0.'8!-*(05146A!
! "1'041*+!/16-*&7!-4184(6+!('3!/1143.'(0&+!.'2&4&'0*>!-(4(**&*!/16-:0(0.1'!1'!02&!BC#!-41/&++&3!)>!021:+('3+!15!/16-:0.'8!024&(3+A!
!
"#$%H+!2.82!-&45146('/&<!+/(*()*&!/16-:0.'8!(4/2.0&/0:4&!+1*9&+!/16-*&7!-(4(**&*!-41)*&6+!IJJ7!5(+0&4!02('!04(3.0.1'(*!"C#D)(+&3!(4/2.0&/0:4&+?!
! #-!01!15!IKL!-(4(**&*!IAMNBOP!/16-:0&!/14&+!.'!B&Q14/&!LLJJ!BRS!BC#+!2(4'&++!6(++.9&!5*1(0.'8!-1.'0!-41/&++.'8!-1,&4!&'()*.'8!6(7.6:6!(--*./(0.1'!-&45146('/&A!
! R24&(3!/16-:0.'8!+/(*&+!(/41++!EFG$G%H+!/16-*&0&!*.'&!15!'&70!8&'&4(0.1'!BC#+!D!5416!&6)&33&3!BC#+!01!2.82!-&45146('/&!BC#+!02(0!+:--140!2:'34&3+!15!-41/&++14+A!
! EFG$G%!TUGV!0&/2'1*18>!(**1,+!6:*0.-*&!BC#+!01!3.+04.):0&!/16-:0.'8!01!-419.3&!:'-(4(**&*&3!/16-:0&!3&'+.0>A!
!
TB-02787-001_v1.0 13
Texture system set up to look like a conventional memory system (768MB GDDR3, 86 GB/s)
1000s of active threads
3 TeraFlops Peak Performance Ships with a C compiler.
43
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Chip Facts
90nm process681M Transistors80 die/wafer (pre-testing)
A big die. Many chips will not work (low yield). Low profits.
4 year design cycle
Design Facts
$400 Million design budget600 person-years: 10 people at start, 300 at peak
44
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
GeForce 8800 GTX Card: $599 List Price
PCI-Express 16X Card - 2 Aux Power Plugs!
185 Watts Thermal Design Point (TDP) -- TDP is a “real-world” maximum power spec.
45
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Some products are “loss-leaders”
Breakthrough product creates“free” publicity you can’t buy.
(1) Hope: when chip “shrinks” to 65nm fab process, die will be smaller, yields will improve, profits will rise.
(2) Simpler versions of the design will be made to create an entire product family, some very profitable.“We tape out a chip a month”, NVidia CEO quote.
46
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
And it happened! 2008 nVidia products
9800 GTX
Specs similar to 8800, card sells for $199.
GTX 280
Price similar to 8800, stream CPU count > 2X ...
47
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
48
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Face was “scanned” to create a vertex model. 8800 GTX was used to do skin, eye, lips and hair rendering.
49
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
History and Graphics Processors
Create standard model from common practice: Wire-frame geometry, triangle rasterization, pixel shading.
Put model in hardware: Block diagram of chip matches computer graphics math.
Evolve to be programmable: At some point, it becomes hard to see the math in the block diagram.
“Wheel of reincarnation” -- Hardwired graphics hardware evolves to look like general-purpose CPU. EECS visitor Ivan Sutherland co-wrote a paper on this topic in 1968!
50
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Intel @ SIGGRAPH 2008: Larrabee
!"#$%&'&(&)*&$+,(-./!"#$"%&'()&'*+%,"+-&'.)&'!/%+-0$"&'1)&'23%4567&'8)&'9:%+47&';)&'.<:"5&'=)&'><-?#-4&'!)&'(+?"&'9)&'!<0"%,+-&'>)&'*+@#-&'A)&'14/+4+&'A)&'B%3C73D4?#&'1)&'><+-&'8)&'E+-%+7+-&'=)'FGGH)'(+%%+:""I'9';+-5J*3%"'KHL'9%C7#6"C6<%"'M3%'N#4<+$'*3,/<6#-0)'!"#$%&'()*$+&',-*$./&'O&'9%6#C$"'PH'Q9<0<46'FGGHR&'PS'/+0"4)'.TU'V'PG)PPWSXPOLGLPF)POLGLPY'766/IXXZ3#)+C,)3%0XPG)PPWSXPOLGLPF)POLGLPY)
",01(234/$5,/2*&="%,#44#3-'63',+?"'Z#0#6+$'3%'7+%Z'C3/#"4'3M'/+%6'3%'+$$'3M'67#4'D3%?'M3%'/"%43-+$'3%'C$+44%33,'<4"'#4'0%+-6"Z'D#673<6'M""'/%3@#Z"Z'67+6'C3/#"4'+%"'-36',+Z"'3%'Z#46%#:<6"Z'M3%'/%3!'6'3%'Z#%"C6'C3,,"%C#+$'+Z@+-6+0"'+-Z'67+6'C3/#"4'473D'67#4'-36#C"'3-'67"'!'%46'/+0"'3%'#-#6#+$'4C%""-'3M'+'Z#4/$+5'+$3-0'D#67'67"'M<$$'C#6+6#3-)'*3/5%#0764'M3%'C3,/3-"-64'3M'67#4'D3%?'3D-"Z':5'367"%4'67+-'9*;',<46':"'73-3%"Z)'9:46%+C6#-0'D#67'C%"Z#6'#4'/"%,#66"Z)'83'C3/5'367"%D#4"&'63'%"/<:$#47&'63'/346'3-'4"%@"%4&'63'%"Z#46%#:<6"'63'$#464&'3%'63'<4"'+-5'C3,/3-"-6'3M'67#4'D3%?'#-'367"%'D3%?4'%"[<#%"4'/%#3%'4/"C#!'C'/"%,#44#3-'+-ZX3%'+'M"")'="%,#44#3-4',+5':"'%"[<"46"Z'M%3,'=<:$#C+6#3-4'."/6)&'9*;&'U-C)&'F'="--'=$+\+&'!<#6"'YGP&']"D' 3̂%?&']^'PGPFP_GYGP&'M+K'`P'QFPFR'HLa_GWHP&'3%'/"%,#44#3-4b+C,)3%0)c'FGGH'9*;'GYOG_GOGPXFGGHXGO_9A8PH'dS)GG'.TU'PG)PPWSXPOLGLPF)POLGLPY'766/IXXZ3#)+C,)3%0XPG)PPWSXPOLGLPF)POLGLPY
!
!"##"$%%&'(')"*+,-.#%'/01'(#2345%256#%'7.#'8496":'-.;<654*='
!"##$%&'()'#*+%%%,-./%0"#1'"2
*+%%%3#(4%&5#"2/)'
*+%%%6-1%7-#8$9:
*+%%%;(4:"')%<=#"8:
>+%
?#"@''5%,.='$*+%%%&9'5:'2%A.2B(28
*+%%%<@"1%!"B'
*+%%%A'#'1$%&./'#1"2
C+%
D-='#9%0"E(2*+%%%D-/'#%385"8"
*+%%%3@%F#-4:-G8B(
*+%%%6-2(%A."2
*+%%%"2@%%?"9%H"2#":"2
C
($95#"25!"#'
6:(8% 5"5'#% 5#'8'298% "% 1"2$I4-#'% E(8.")% 4-15.9(2/% "#4:(9'49.#'%
4-@'%2"1'@%!"##"=''+%"%2'G%8-J9G"#'%#'2@'#(2/%5(5')(2'+%"%1"2$I
4-#'% 5#-/#"11(2/% 1-@')+% "2@% 5'#J-#1"24'% "2")$8(8% J-#% 8'E'#")%
"55)(4"9(-28K%!"##"=''%.8'8%1.)9(5)'%(2I-#@'#%LMN%0?O%4-#'8%9:"9%
"#'%"./1'29'@%=$%"%G(@'%E'49-#%5#-4'88-#%.2(9+% "8%G'))% "8% 8-1'%
J(L'@% J.249(-2% )-/(4% =)-4B8K% 6:(8% 5#-E(@'8% @#"1"9(4"))$% :(/:'#%
5'#J-#1"24'%5'#%G"99%"2@%5'#%.2(9%-J%"#'"%9:"2%-.9I-JI-#@'#%0?O8%
-2% :(/:)$% 5"#"))')% G-#B)-"@8K% P9% ")8-% /#'"9)$% (24#'"8'8% 9:'%
J)'L(=()(9$%"2@%5#-/#"11"=()(9$%-J%9:'%"#4:(9'49.#'%"8%4-15"#'@%9-%
89"2@"#@%F?O8K%<%4-:'#'29%-2I@('%>2@%)'E')%4"4:'%"))-G8%'JJ(4('29%
(29'#I5#-4'88-#% 4-11.2(4"9(-2% "2@% :(/:I="2@G(@9:% )-4")% @"9"%
"44'88%=$%0?O%4-#'8K%6"8B%84:'@.)(2/%(8%5'#J-#1'@%'29(#')$%G(9:%
8-J9G"#'% (2% !"##"=''+% #"9:'#% 9:"2% (2% J(L'@% J.249(-2% )-/(4K% 6:'%
4.89-1(Q"=)'% 8-J9G"#'% /#"5:(48% #'2@'#(2/% 5(5')(2'% J-#% 9:(8%
"#4:(9'49.#'% .8'8% =(22(2/% (2% -#@'#% 9-% #'@.4'% #'R.(#'@% 1'1-#$%
="2@G(@9:+%1(2(1(Q'% )-4B% 4-29'29(-2+% "2@% (24#'"8'%-55-#9.2(9('8%
J-#% 5"#"))')(81% #')"9(E'% 9-% 89"2@"#@% F?O8K% 6:'% !"##"=''% 2"9(E'%
5#-/#"11(2/% 1-@')% 8.55-#98% "% E"#('9$% -J% :(/:)$% 5"#"))')%
"55)(4"9(-28% 9:"9% .8'% (##'/.)"#% @"9"% 89#.49.#'8K% ?'#J-#1"24'%
"2")$8(8% -2% 9:-8'% "55)(4"9(-28% @'1-289#"9'8% !"##"=''S8% 5-9'29(")%
J-#%"%=#-"@%#"2/'%-J%5"#"))')%4-15.9"9(-2K%
""#$% PKCK*% T0-15.9'#% F#"5:(48UV% H"#@G"#'% <#4:(9'49.#'IIF#"5:(48% ?#-4'88-#8+% ?"#"))')% ?#-4'88(2/W% PKCKC% T0-15.9'#%F#"5:(48UV% ?(49.#'XP1"/'% F'2'#"9(-2II,(85)"$% <)/-#(9:18W% PKCKY%T0-15.9'#%F#"5:(48UV%6:#''I,(1'28(-2")%F#"5:(48%"2@%D'")(81II0-)-#+%8:"@(2/+%8:"@-G(2/+%"2@%9'L9.#'%
%&'()*+8$% /#"5:(48% "#4:(9'49.#'+% 1"2$I4-#'% 4-15.9(2/+% #'")I9(1'% /#"5:(48+% 8-J9G"#'% #'2@'#(2/+% 9:#-./:5.9% 4-15.9(2/+% E(8.")%4-15.9(2/+%5"#"))')%5#-4'88(2/+%&P;,+%F?F?OK!
>?' @*5#.A6254.*'
;-@'#2%F?O8%"#'%(24#'"8(2/)$%5#-/#"11"=)'%(2%-#@'#%9-%8.55-#9%"@E"24'@% /#"5:(48% ")/-#(9:18% "2@% -9:'#% 5"#"))')% "55)(4"9(-28K%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%*%P29')Z%0-#5-#"9(-2V%%)"##$K8'()'#+%@-./K4"#1'"2+%'#(4K85#"2/)'+%
9-1KJ-#8$9:+%5#"@''5K@.='$+%89'5:'2K[.2B(28+%"@"1K9K)"B'+%
#-='#9K@K4"E(2+%#-/'#K'85"8"+%'@G"#@K/#-4:-G8B(%\%9-2(K[."2%
](29')K4-1%
>%D<,%F"1'%6--)8V%%1(B'"]#"@/"1'9--)8K4-1%
C%&9"2J-#@%O2(E'#8(9$V%%$-')%\%:"2#":"2%]48K89"2J-#@K'@.%
H-G'E'#+% /'2'#")% 5.#5-8'% 5#-/#"11"=()(9$% -J% 9:'% /#"5:(48%5(5')(2'%(8%#'89#(49'@%=$%)(1(9"9(-28%-2%9:'%1'1-#$%1-@')%"2@%=$%J(L'@% J.249(-2% =)-4B8% 9:"9% 84:'@.)'% 9:'% 5"#"))')% 9:#'"@8% -J%'L'4.9(-2K% 7-#% 'L"15)'+% 5(L')% 5#-4'88(2/% -#@'#% (8% 4-29#-))'@% =$%9:'%#"89'#(Q"9(-2%)-/(4%"2@%-9:'#%@'@(4"9'@%84:'@.)(2/%)-/(4K%
6:(8%5"5'#%@'84#(='8%"%:(/:)$%5"#"))')%"#4:(9'49.#'%9:"9%1"B'8%9:'%#'2@'#(2/% 5(5')(2'% 4-15)'9')$% 5#-/#"11"=)'K% 6:'% !"##"=''%"#4:(9'49.#'%(8%="8'@%-2%(2I-#@'#%0?O%4-#'8%9:"9%#.2%"2%'L9'2@'@%E'#8(-2% -J% 9:'% LMN% (289#.49(-2% 8'9+% (24).@(2/% G(@'% E'49-#%5#-4'88(2/% -5'#"9(-28% "2@% 8-1'% 85'4(")(Q'@% 84")"#% (289#.49(-28K%7(/.#'% *% 8:-G8% "% 84:'1"9(4% ()).89#"9(-2% -J% 9:'% "#4:(9'49.#'K% 6:'%4-#'8% '"4:% "44'88% 9:'(#% -G2% 8.=8'9% -J% "% 4-:'#'29% !>% 4"4:'% 9-%5#-E(@'% :(/:I="2@G(@9:% !>% 4"4:'% "44'88% J#-1% '"4:% 4-#'% "2@% 9-%8(15)(J$%@"9"%8:"#(2/%"2@%8$24:#-2(Q"9(-2K%
!"##"=''% (8%1-#'% J)'L(=)'% 9:"2% 4.##'29%F?O8K% P98%0?OI)(B'% LMNI="8'@% "#4:(9'49.#'% 8.55-#98% 8.=#-.9(2'8% "2@% 5"/'% J".)9(2/K% &-1'%-5'#"9(-28% 9:"9% F?O8% 9#"@(9(-2"))$% 5'#J-#1% G(9:% J(L'@% J.249(-2%)-/(4+% 8.4:% "8% #"89'#(Q"9(-2% "2@% 5-89I8:"@'#% =)'2@(2/+% "#'%5'#J-#1'@%'29(#')$%(2%8-J9G"#'%(2%!"##"=''K%!(B'%F?O8+%!"##"=''%.8'8%J(L'@%J.249(-2%)-/(4%J-#%9'L9.#'%J()9'#(2/+%=.9%9:'%4-#'8%"88(89%9:'%J(L'@%J.249(-2%)-/(4+%'K/K%=$%8.55-#9(2/%5"/'%J".)98K%%
%
!"#$%&'(!"#$%&'()*$"+,")%&"-(..(/&&"'(012$+.&"(.$%*)&$)3.&!"4%&"03'/&."+,"567"$+.&8"(09")%&"03'/&."(09")1:&"+,"$+2:.+$&88+.8"(09" ;<=" />+$?8" (.&" *':>&'&0)()*+029&:&09&0)@" (8" (.&" )%&":+8*)*+08"+,")%&"567"(09"0+02567"/>+$?8"+0")%&"$%*:A"
6:(8%5"5'#%")8-%@'84#(='8%"%8-J9G"#'%#'2@'#(2/%5(5')(2'% 9:"9% #.28%'JJ(4('29)$% -2% 9:(8% "#4:(9'49.#'K% P9% .8'8% =(22(2/% 9-% (24#'"8'%5"#"))')(81% "2@% #'@.4'% 1'1-#$% ="2@G(@9:+% G:()'% "E-(@(2/% 9:'%5#-=)'18%-J%8-1'%5#'E(-.8%9()'I="8'@%"#4:(9'49.#'8K%P15)'1'29(2/%9:'% #'2@'#'#% (2% 8-J9G"#'%"))-G8%'L(89(2/% J'"9.#'8% 9-%='%-59(1(Q'@%="8'@% -2% G-#B)-"@% "2@% "))-G8% 2'G% J'"9.#'8% 9-% ='% "@@'@K% 7-#%'L"15)'+% 5#-/#"11"=)'% =)'2@(2/% "2@% -#@'#I(2@'5'2@'29%9#"285"#'24$%J(9%'"8()$%(29-%9:'%!"##"=''%8-J9G"#'%5(5')(2'K%%
7(2"))$+% 9:(8%5"5'#%@'84#(='8%"%5#-/#"11(2/%1-@')% 9:"9% 8.55-#98%1-#'% /'2'#")% 5"#"))')% "55)(4"9(-28+% 8.4:% "8% (1"/'% 5#-4'88(2/+%5:$8(4")%8(1.)"9(-2+%"2@%1'@(4")%\%J(2"24(")%"2")$9(48K%!"##"=''S8%8.55-#9% J-#% (##'/.)"#% @"9"% 89#.49.#'8% "2@% (98% 84"99'#I/"9:'#%4"5"=()(9$% 1"B'% (9% 8.(9"=)'% J-#% 9:'8'% 9:#-./:5.9% "55)(4"9(-28% "8%@'1-289#"9'@%=$%-.#%84")"=()(9$%"2@%5'#J-#1"24'%"2")$8(8K%
ACM Transactions on Graphics, Vol. 27, No. 3, Article 18, Publication date: August 2008.
!"#$%&'&(&)*&$+,(-./
!"#$"%&'()&'*+%,"+-&'.)&'!/%+-0$"&'1)&'23%4567&'8)&'9:%+47&';)&'.<:"5&'=)&'><-?#-4&'!)&'(+?"&'9)&'!<0"%,+-&'>)&'*+@#-&'A)&'14/+4+&'A)&'B%3C73D4?#&'1)&'><+-&'8)&'E+-%+7+-&'=)'FGGH)'(+%%+:""I'9';+-5J*3%"'KHL'9%C7#6"C6<%"'M3%'N#4<+$'*3,/<6#-0)'!"#$%&'()*$+&',-*$./&'O&'9%6#C$"'PH'Q9<0<46'FGGHR&'PS'/+0"4)'.TU'V'PG)PPWSXPOLGLPF)POLGLPY'766/IXXZ3#)+C,)3%0XPG)PPWSXPOLGLPF)POLGLPY)
",01(234/$5,/2*&
="%,#44#3-'63',+?"'Z#0#6+$'3%'7+%Z'C3/#"4'3M'/+%6'3%'+$$'3M'67#4'D3%?'M3%'/"%43-+$'3%'C$+44%33,'<4"'#4'0%+-6"Z'D#673<6'M""'/%3@#Z"Z'67+6'C3/#"4'+%"'-36',+Z"'3%'Z#46%#:<6"Z'M3%'/%3!'6'3%'Z#%"C6'C3,,"%C#+$'+Z@+-6+0"'+-Z'67+6'C3/#"4'473D'67#4'-36#C"'3-'67"'!'%46'/+0"'3%'#-#6#+$'4C%""-'3M'+'Z#4/$+5'+$3-0'D#67'67"'M<$$'C#6+6#3-)'*3/5%#0764'M3%'C3,/3-"-64'3M'67#4'D3%?'3D-"Z':5'367"%4'67+-'9*;',<46':"'73-3%"Z)'9:46%+C6#-0'D#67'C%"Z#6'#4'/"%,#66"Z)'83'C3/5'367"%D#4"&'63'%"/<:$#47&'63'/346'3-'4"%@"%4&'63'%"Z#46%#:<6"'63'$#464&'3%'63'<4"'+-5'C3,/3-"-6'3M'67#4'D3%?'#-'367"%'D3%?4'%"[<#%"4'/%#3%'4/"C#!'C'/"%,#44#3-'+-ZX3%'+'M"")'="%,#44#3-4',+5':"'%"[<"46"Z'M%3,'=<:$#C+6#3-4'."/6)&'9*;&'U-C)&'F'="--'=$+\+&'!<#6"'YGP&']"D' 3̂%?&']^'PGPFP_GYGP&'M+K'`P'QFPFR'HLa_GWHP&'3%'/"%,#44#3-4b+C,)3%0)c'FGGH'9*;'GYOG_GOGPXFGGHXGO_9A8PH'dS)GG'.TU'PG)PPWSXPOLGLPF)POLGLPY'766/IXXZ3#)+C,)3%0XPG)PPWSXPOLGLPF)POLGLPY
!
!"##"$%%&'(')"*+,-.#%'/01'(#2345%256#%'7.#'8496":'-.;<654*='
!"##$%&'()'#*+%%%,-./%0"#1'"2
*+%%%3#(4%&5#"2/)'
*+%%%6-1%7-#8$9:
*+%%%;(4:"')%<=#"8:
>+%
?#"@''5%,.='$*+%%%&9'5:'2%A.2B(28
*+%%%<@"1%!"B'
*+%%%A'#'1$%&./'#1"2
C+%
D-='#9%0"E(2*+%%%D-/'#%385"8"
*+%%%3@%F#-4:-G8B(
*+%%%6-2(%A."2
*+%%%"2@%%?"9%H"2#":"2
C
($95#"25!"#'
6:(8% 5"5'#% 5#'8'298% "% 1"2$I4-#'% E(8.")% 4-15.9(2/% "#4:(9'49.#'%
4-@'%2"1'@%!"##"=''+%"%2'G%8-J9G"#'%#'2@'#(2/%5(5')(2'+%"%1"2$I
4-#'% 5#-/#"11(2/% 1-@')+% "2@% 5'#J-#1"24'% "2")$8(8% J-#% 8'E'#")%
"55)(4"9(-28K%!"##"=''%.8'8%1.)9(5)'%(2I-#@'#%LMN%0?O%4-#'8%9:"9%
"#'%"./1'29'@%=$%"%G(@'%E'49-#%5#-4'88-#%.2(9+% "8%G'))% "8% 8-1'%
J(L'@% J.249(-2% )-/(4% =)-4B8K% 6:(8% 5#-E(@'8% @#"1"9(4"))$% :(/:'#%
5'#J-#1"24'%5'#%G"99%"2@%5'#%.2(9%-J%"#'"%9:"2%-.9I-JI-#@'#%0?O8%
-2% :(/:)$% 5"#"))')% G-#B)-"@8K% P9% ")8-% /#'"9)$% (24#'"8'8% 9:'%
J)'L(=()(9$%"2@%5#-/#"11"=()(9$%-J%9:'%"#4:(9'49.#'%"8%4-15"#'@%9-%
89"2@"#@%F?O8K%<%4-:'#'29%-2I@('%>2@%)'E')%4"4:'%"))-G8%'JJ(4('29%
(29'#I5#-4'88-#% 4-11.2(4"9(-2% "2@% :(/:I="2@G(@9:% )-4")% @"9"%
"44'88%=$%0?O%4-#'8K%6"8B%84:'@.)(2/%(8%5'#J-#1'@%'29(#')$%G(9:%
8-J9G"#'% (2% !"##"=''+% #"9:'#% 9:"2% (2% J(L'@% J.249(-2% )-/(4K% 6:'%
4.89-1(Q"=)'% 8-J9G"#'% /#"5:(48% #'2@'#(2/% 5(5')(2'% J-#% 9:(8%
"#4:(9'49.#'% .8'8% =(22(2/% (2% -#@'#% 9-% #'@.4'% #'R.(#'@% 1'1-#$%
="2@G(@9:+%1(2(1(Q'% )-4B% 4-29'29(-2+% "2@% (24#'"8'%-55-#9.2(9('8%
J-#% 5"#"))')(81% #')"9(E'% 9-% 89"2@"#@% F?O8K% 6:'% !"##"=''% 2"9(E'%
5#-/#"11(2/% 1-@')% 8.55-#98% "% E"#('9$% -J% :(/:)$% 5"#"))')%
"55)(4"9(-28% 9:"9% .8'% (##'/.)"#% @"9"% 89#.49.#'8K% ?'#J-#1"24'%
"2")$8(8% -2% 9:-8'% "55)(4"9(-28% @'1-289#"9'8% !"##"=''S8% 5-9'29(")%
J-#%"%=#-"@%#"2/'%-J%5"#"))')%4-15.9"9(-2K%
""#$% PKCK*% T0-15.9'#% F#"5:(48UV% H"#@G"#'% <#4:(9'49.#'IIF#"5:(48% ?#-4'88-#8+% ?"#"))')% ?#-4'88(2/W% PKCKC% T0-15.9'#%F#"5:(48UV% ?(49.#'XP1"/'% F'2'#"9(-2II,(85)"$% <)/-#(9:18W% PKCKY%T0-15.9'#%F#"5:(48UV%6:#''I,(1'28(-2")%F#"5:(48%"2@%D'")(81II0-)-#+%8:"@(2/+%8:"@-G(2/+%"2@%9'L9.#'%
%&'()*+8$% /#"5:(48% "#4:(9'49.#'+% 1"2$I4-#'% 4-15.9(2/+% #'")I9(1'% /#"5:(48+% 8-J9G"#'% #'2@'#(2/+% 9:#-./:5.9% 4-15.9(2/+% E(8.")%4-15.9(2/+%5"#"))')%5#-4'88(2/+%&P;,+%F?F?OK!
>?' @*5#.A6254.*'
;-@'#2%F?O8%"#'%(24#'"8(2/)$%5#-/#"11"=)'%(2%-#@'#%9-%8.55-#9%"@E"24'@% /#"5:(48% ")/-#(9:18% "2@% -9:'#% 5"#"))')% "55)(4"9(-28K%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%*%P29')Z%0-#5-#"9(-2V%%)"##$K8'()'#+%@-./K4"#1'"2+%'#(4K85#"2/)'+%
9-1KJ-#8$9:+%5#"@''5K@.='$+%89'5:'2K[.2B(28+%"@"1K9K)"B'+%
#-='#9K@K4"E(2+%#-/'#K'85"8"+%'@G"#@K/#-4:-G8B(%\%9-2(K[."2%
](29')K4-1%
>%D<,%F"1'%6--)8V%%1(B'"]#"@/"1'9--)8K4-1%
C%&9"2J-#@%O2(E'#8(9$V%%$-')%\%:"2#":"2%]48K89"2J-#@K'@.%
H-G'E'#+% /'2'#")% 5.#5-8'% 5#-/#"11"=()(9$% -J% 9:'% /#"5:(48%5(5')(2'%(8%#'89#(49'@%=$%)(1(9"9(-28%-2%9:'%1'1-#$%1-@')%"2@%=$%J(L'@% J.249(-2% =)-4B8% 9:"9% 84:'@.)'% 9:'% 5"#"))')% 9:#'"@8% -J%'L'4.9(-2K% 7-#% 'L"15)'+% 5(L')% 5#-4'88(2/% -#@'#% (8% 4-29#-))'@% =$%9:'%#"89'#(Q"9(-2%)-/(4%"2@%-9:'#%@'@(4"9'@%84:'@.)(2/%)-/(4K%
6:(8%5"5'#%@'84#(='8%"%:(/:)$%5"#"))')%"#4:(9'49.#'%9:"9%1"B'8%9:'%#'2@'#(2/% 5(5')(2'% 4-15)'9')$% 5#-/#"11"=)'K% 6:'% !"##"=''%"#4:(9'49.#'%(8%="8'@%-2%(2I-#@'#%0?O%4-#'8%9:"9%#.2%"2%'L9'2@'@%E'#8(-2% -J% 9:'% LMN% (289#.49(-2% 8'9+% (24).@(2/% G(@'% E'49-#%5#-4'88(2/% -5'#"9(-28% "2@% 8-1'% 85'4(")(Q'@% 84")"#% (289#.49(-28K%7(/.#'% *% 8:-G8% "% 84:'1"9(4% ()).89#"9(-2% -J% 9:'% "#4:(9'49.#'K% 6:'%4-#'8% '"4:% "44'88% 9:'(#% -G2% 8.=8'9% -J% "% 4-:'#'29% !>% 4"4:'% 9-%5#-E(@'% :(/:I="2@G(@9:% !>% 4"4:'% "44'88% J#-1% '"4:% 4-#'% "2@% 9-%8(15)(J$%@"9"%8:"#(2/%"2@%8$24:#-2(Q"9(-2K%
!"##"=''% (8%1-#'% J)'L(=)'% 9:"2% 4.##'29%F?O8K% P98%0?OI)(B'% LMNI="8'@% "#4:(9'49.#'% 8.55-#98% 8.=#-.9(2'8% "2@% 5"/'% J".)9(2/K% &-1'%-5'#"9(-28% 9:"9% F?O8% 9#"@(9(-2"))$% 5'#J-#1% G(9:% J(L'@% J.249(-2%)-/(4+% 8.4:% "8% #"89'#(Q"9(-2% "2@% 5-89I8:"@'#% =)'2@(2/+% "#'%5'#J-#1'@%'29(#')$%(2%8-J9G"#'%(2%!"##"=''K%!(B'%F?O8+%!"##"=''%.8'8%J(L'@%J.249(-2%)-/(4%J-#%9'L9.#'%J()9'#(2/+%=.9%9:'%4-#'8%"88(89%9:'%J(L'@%J.249(-2%)-/(4+%'K/K%=$%8.55-#9(2/%5"/'%J".)98K%%
%
!"#$%&'(!"#$%&'()*$"+,")%&"-(..(/&&"'(012$+.&"(.$%*)&$)3.&!"4%&"03'/&."+,"567"$+.&8"(09")%&"03'/&."(09")1:&"+,"$+2:.+$&88+.8"(09" ;<=" />+$?8" (.&" *':>&'&0)()*+029&:&09&0)@" (8" (.&" )%&":+8*)*+08"+,")%&"567"(09"0+02567"/>+$?8"+0")%&"$%*:A"
6:(8%5"5'#%")8-%@'84#(='8%"%8-J9G"#'%#'2@'#(2/%5(5')(2'% 9:"9% #.28%'JJ(4('29)$% -2% 9:(8% "#4:(9'49.#'K% P9% .8'8% =(22(2/% 9-% (24#'"8'%5"#"))')(81% "2@% #'@.4'% 1'1-#$% ="2@G(@9:+% G:()'% "E-(@(2/% 9:'%5#-=)'18%-J%8-1'%5#'E(-.8%9()'I="8'@%"#4:(9'49.#'8K%P15)'1'29(2/%9:'% #'2@'#'#% (2% 8-J9G"#'%"))-G8%'L(89(2/% J'"9.#'8% 9-%='%-59(1(Q'@%="8'@% -2% G-#B)-"@% "2@% "))-G8% 2'G% J'"9.#'8% 9-% ='% "@@'@K% 7-#%'L"15)'+% 5#-/#"11"=)'% =)'2@(2/% "2@% -#@'#I(2@'5'2@'29%9#"285"#'24$%J(9%'"8()$%(29-%9:'%!"##"=''%8-J9G"#'%5(5')(2'K%%
7(2"))$+% 9:(8%5"5'#%@'84#(='8%"%5#-/#"11(2/%1-@')% 9:"9% 8.55-#98%1-#'% /'2'#")% 5"#"))')% "55)(4"9(-28+% 8.4:% "8% (1"/'% 5#-4'88(2/+%5:$8(4")%8(1.)"9(-2+%"2@%1'@(4")%\%J(2"24(")%"2")$9(48K%!"##"=''S8%8.55-#9% J-#% (##'/.)"#% @"9"% 89#.49.#'8% "2@% (98% 84"99'#I/"9:'#%4"5"=()(9$% 1"B'% (9% 8.(9"=)'% J-#% 9:'8'% 9:#-./:5.9% "55)(4"9(-28% "8%@'1-289#"9'@%=$%-.#%84")"=()(9$%"2@%5'#J-#1"24'%"2")$8(8K%
ACM Transactions on Graphics, Vol. 27, No. 3, Article 18, Publication date: August 2008.
A GPU that looks a lot like a many-core CPU.However, cores are specialized for graphics ...
51
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Larrabee CPU core microarchitecture !
!"#$%&'(
%')$*%+$,%-
+)./0.1
2,'3/(4%.*%4'/(%5'2')),)/$-
6%7/'4'2'%/$%
'%-+)./0&*
2,%4,(,2')%
5+25*$,%-/&2*
52*&,$$*
2%89*(4,./2'%
,.%')6%
:;;<=%>,'.+
2/(4%,/4
1.%/(0*23,2%&*
2,$?%,'&1%&'5
'@),%*
>%,A,&+./(4%
>*+2%$/-
+).'(,*+$%.12,'3
$?%'(3%'%$1
'2,3%&'&1
,6%B+.%4/C,(%/.$%>*
&+$%
*(%&*--,2&/')%
$,2C,2%D*2E)*'3$?%7/'4'2'%
)'&E$%'2&1
/.,&.+2')%
,),-,(.$%&2/./&')%
>*2%C/$+')%&*
-5+./(4?%$+
&1%'$%F
GHI%>)*
'./(40
5*/(.%,A,&+./*(?%$&'..,204
'.1,2?%*
2%>/A,3%>+(&./*(%.,A.+2,%$+
55*2.6%
!"#$%&&%'((#)%&*+%&(#,&-./0(-01&(#
J/4+2,%K%'@*C,%$1*D$%'%@)*&E%3/'42'-%*>%.1,%@'$/&%
L'22'@
,,%'2&1
/.,&.+2,6%L
'22'@,,%/$%3
,$/4(,3%'2*
+(3%-+)./5),%/(
$.'(./'./*
($%
*>%'(
%/(0*23,2%!
"#%&*2,%.1
'.%/$%'+4-,(.,3%D/.1%'%D
/3,%C,&.*
2%52*&,$$*
2%MN"#O6%!
*2,$%&*
--+(/&'.,%.1
2*+41%'%1
/410@'(3D/3.1%
/(.,2&*
((,&.%(
,.D*2E%D/.1%$*-,%>/A
,3%>+(&./*(%)*4/&?%-
,-*2P%GQR
%/(.,2>'&,$?%'(
3%*.1,2%(
,&,$$'2P%GQR
%)*4/&?%3
,5,(3/(4%*(%.1,%,A
'&.%'55)/&'./*
(6%J*2%,A'-5),?%'(%/-5),-,(.'./*
(%*>%L'22'@
,,%'$%'%
$.'(30')*(,%S"#%D*+)3%.P5/&'))P
%/(&)+3,%'%"
!G,%@+$6%
T1,%3'.'%/(
%T'@),%K
%-*./C'.,$%L
'22'@,,U$%+
$,%*>%/(0*23,2%&*
2,$%D/.1%D/3,%N"#$6%T1,%-/33),%&*
)+-(%$1*D$%.1,%5,'E%5,2>*
2-'(&,%
*>%'%-
*3,2(%*+.0*>0*23,2%!
"#?%.1,%G(.,) V%!*2,W
:%I+*%52*&,$$*
26%T1,%2/4
1.01'(3%&*)+-(%$1*D$%'%.,$.%!
"#%3,$/4
(%@'$,3
%*(%.1,%
",(./+-V%52*&,$$*
2?%D1/&1%D'$%/(
.2*3+&,3%/(%KXX:%'(3%+$,3%3+')0
/$$+,%/(
0*23,2%/(
$.2+&./*(%,A,&+./*(%8Y)5,2.%K
XXZ=6%T
1,%",(./+-%
52*&,$$*
2%&*2,%D
'$%-*3/>/,3
%.*%$+55*2.%>*
+2%.12,'3
$%'(3%'%K
[0
D/3,%N"#6%T1,%>/(
')%.D*%2*D$%$5
,&/>P%.1,%(+-@,2%*
>%(*(0C,&.*
2%/($.2+&./*($%.1'.%&'(
%@,%/$$+
,3%5,2%&)*
&E%@P%*(,%!"#%'(3%.1,%.*.')%
(+-@,2%*
>%C,&.*
2%*5,2'./*
($%.1'.%&'(
%@,%/$$+
,3%5,2%&)*
&E6%T1,%.D
*%
&*(>/4+2'./*
($%+$,%2*
+41)P%.1,%$'-
,%'2,'%'(3%5*D,26%%
\%!"#%&*2,$]%
:%*+.0*>0*23,2%
K;%/(0*23,2%
G($.2+&./*(%/$$+
,]%^%5,2%&)*
&E%
:%5,2%&)*
&E%
N"#%5,2%&*
2,]%^0D/3,%FF_%
K[0D/3,%
L:%&'&1
,%$/`,]%^%HB%
^%HB%
F/(4),0$.2,'-
]%"!#$%!&'(
&)!!
*!#$%!&'(
&)!
N,&.*
2%.12*+415+.]%
+!#$%!&'(
&)!
,-.!#$%!&'(
&)!
!"#$%!&
"!#$%&'(&')*+)!,-.!/0
&')*+)!1
23!4'567)/-'0"!*+-/8
0/08!
%9+!6)'4+--'
)!(')!/04)+7
-+*!%9)'$896$%!47
0!)+-$
:%!/0!;!%9+!6+7<!
-/08:+&-%)+7
5!6+)('
)5704+=!>
$%!?@A!%9
+!6+7<!,+4%'
)!%9)'$896$%!
B/%9!)'$89:C!%9
+!-75+!7)+7!70*!6'B+).!D
9/-!*/((+)+0
4+!/-!E@A!/0
!FG#2H=!-/0
4+!%9+!B/*+!I23!-$66')%-!($
-+*!5$:%/6:C&7
**!>$%!HHJ!
*'+-0K%.!D
9+-+!/0
&')*+)!4'
)+-!7)+!0
'%!G7))7>++=!>
$%!7)+!-/5
/:7).!
T1,%.,$.%3
,$/4(%/(%T'@),%K%/$%(
*.%/3,(./&')%.*
%L'22'@
,,6%T*%52*C/3,%
'%-*2,%3
/2,&.%&*-5'2/$*
(?%.1,%/(
0*23,2%&*
2,%.,$.%3,$/4
(%+$,$%.1
,%$'-
,%52*&,$$%'(
3%&)*&E%2'.,%'$%.1
,%*+.0*>0*23,2%&*
2,$%'(3%/(&)+3,$%
(*%>/A,3%>+(&./*(%42'51/&$%
)*4/&6%
T1/$%&*-5'2/$*
(%-*./C'.,$%
3,$/4
(%3,&/$/*
($%>*2%L'22'@
,,%$/(&,%/.%$1
*D$%.1'.%'%D
/3,%N"#%D/.1%
'%$/-5),%/(
0*23,2%&*
2,%'))*D$%!"#$%.*%2,'&1
%'%32'-
'./&'))P%1/41,2%
&*-5+.'./*
(')%3,($/.P%>*2%5'2')),)%'5
5)/&'./*
($6%
F,&./*
($%Z6K%.*%Z6<%@,)*D%3,$&2/@
,%.1,%E,P%>,'.+
2,$%*>%.1,%L'22'@
,,%'2&1
/.,&.+2,]%
.1,%!"#%&*2,?%
.1,%$&')'2%
+(/.%'(3%&'&1
,%&*(.2*)%
/($.2+&./*($?%.1
,%C,&.*
2%52*&,$$*
2?%.1,%/(.,25
2*&,$$*
2%2/(4%(,.D*2E?%
'(3%.1,%&1
*/&,$%>*
2%D1'.%/$%/-
5),-,(.,3%/(%>/A,3%>+(&./*(%)*4/&6%
!"2#$%&&%'((#34&(#%5*#3%-.(6#
J/4+2,%Z
%$1*D$%'%$&1
,-'./&%*
>%'%$/(4),%L
'22'@,,%!
"#%&*2,?%5
)+$%
/.$%&*((,&./*
(%.*%.1,%*(03/,%/(
.,2&*((,&.%(
,.D*2E%'(3%.1,%&*
2,U$%)*&')%$+
@$,.%*
>%.1,%L:%&'&1
,6%T1,%/($.2+&./*(%3,&*3,2%$+
55*2.$%.1
,%$.'(
3'23%",(./+-%52*&,$$*
2%Aa[%/($.2+&./*(%$,.?%D
/.1%.1,%'3
3/./*(%
*>%(,D%/($.2+&./*($%.1'.%'2,%3
,$&2/@,3%/(%F,&./*
($%Z6:%'(3%Z6Z6%T*%
$/-5)/>P%.1,%3,$/4
(%.1,%$&')'2%
'(3%C,&.*
2%+(/.$%+$,%$,5'2'.,%
2,4/$.,2%$,.$6%I
'.'%.2'($>,22,3
%@,.D,,(%.1,-%/$%D
2/..,(%.*%-,-*2P%
'(3%.1,(%2,'3
%@'&E%/(%>2*-%.1,%LK%&'&1
,6%
L'22'@
,,U$%LK%&'&1
,%'))*D$%)*D0)'.,(
&P%'&&,$$,$%
.*%&'&1
,%-,-*2P%/(.*%.1,%$&')'2%'(
3%C,&.*
2%+(/.$6%T
*4,.1,2%D
/.1%L'22'@
,,U$%)*'30*5%N"#%/($.2+&./*($?%.1
/$%-,'($%.1
'.%.1,%LK%&'&1
,%&'(%@,%
.2,'.,3%$*-,D1'.%)/E
,%'(%,A.,(3,3%2,4
/$.,2%>/),6%T1/$%$/4
(/>/&'(
.)P%
/-52*C,$%.1
,%5,2>*
2-'(&,%*
>%-'(P%')4*2/.1-$?%,$5
,&/'))P%D/.1%.1,%
&'&1,%&*(.2*)%/($.2+&./*($%3,$&2/@
,3%F,&./*
(%Z6:6%T1,%$/(4),0
.12,'3
,3%",(./+-%52*&,$$*
2%52*C/3,3%'(%a9B%G&'&1
,%'(3%a9B%
I&'&1
,6%b,%$5
,&/>P%'%Z
:9B%G&'&1
,%'(3%Z:9B%I&'&1
,%.*%$+55*2.%
>*+2%,A,&+./*(%.12,'3
$%5,2%!
"#%&*2,6%
%
'()*+%!L
"!G7))7>++!1
23!4')+!7
0*!7--'4/7%+*!-C-%+5
!>:'4<-"!%9
+!123!/-!*
+)/,+*!()'5!%9+!2+0%/$5!6)'4+--'
)!/0&')*+)!*
+-/80=!6:$-!
ME&>/%!/0
-%)$4%/'0-=!5
$:%/&%9
)+7*/08!70*!7!B/*+!I23.!J749!4')+!
97-!(7-%!744+--!%'
!/%-!?NMOP!:'47:!-$>-+%!'
(!7!4'9+)+0
%!?0*!:+,+:!
4749+.!GQ!4749+!-/R+-!7
)+!L?OP!(')!S47
49+!70*!L?OP!(')!T4749+.!
U/08!0+%B')<!7
44+--+-!67--!%9
)'$89!%9+!G?!4749+!(')!4'
9+)+0
4C.!
L'22'@
,,U$%4)*@')%:(3%),C,)%ML:O%&'&1
,%/$%3/C/3,3%/(.*%$,5'2'.,%
)*&')%
$+@$,.$?%
*(,%5,2%!"#%&*2,6%_'&1%!"#%1'$%'%>'$.%
3/2,&.%
'&&,$$%5'.1%.*%/.$%*
D(%)*&')%$+
@$,.%*
>%.1,%L:%&'&1
,6%I'.'%2,'3
%@P%'%
!"#%&*2,%/$%$.*
2,3%/(%/.$%L
:%&'&1
,%$+@$,.%'(
3%&'(
%@,%'&&,$$,3
%c+/&E)P?%/(%5'2')),)%D
/.1%*.1,2%!
"#$%'&&,$$/(
4%.1,/2%*
D(%)*&')%L
:%
&'&1,%$+
@$,.$6%I
'.'%D2/..,(
%@P%'%!
"#%&*2,%/$%$.*
2,3%/(%/.$%*
D(%L:%
&'&1,%$+
@$,.%'(
3%/$%>)+
$1,3%>2*-%*.1,2%$+
@$,.$?%/>%(
,&,$$'2P6%T1,%
2/(4%(,.D*2E%,($+2,$%&*
1,2,(
&P%>*2%$1
'2,3%3'.'?%'$%3
,$&2/@,3%/(%
F,&./*
(%Z6^6%b
,%$5,&/>P
%:<[9B%>*2%,'&1
%L:%&'&1
,%$+@$,.6%T
1/$%
$+55*2.$%)'24
,%./),%$/`,$%>*2%$*
>.D'2,%2,(
3,2/(
4?%'$%3
,$&2/@,3%/(%
F,&./*
(%^6K6%
!"7#8-%9%&#:5/0#%5*#3%-.(#3450&49#;560&1-0/456#
L'22'@
,,U$%$&')'2%5/5,)/(,%/$%3
,2/C,3%>2*-%.1,%3+')0/$$+
,%",(./+-%
52*&,$$*
2?%D1/&1%+$,$%
'%$1*2.?%/(,A5,($/C,%,A,&+./*(%5/5,)/(,6%
L'22'@
,,%52*C/3,$%-
*3,2(%'33/./*($%$+
&1%'$%-
+)./0.1
2,'3/(4?%[^0
@/.%,A
.,($/*($?%'(
3%$*51/$./&'.,3
%52,>,.&1
/(46%T1,%&*
2,$%$+55*2.%
.1,%>+))%"
,(./+-%52*&,$$*
2%Aa[%/($.2+&./*(%$,.%
$*%.1,P%&'(%2+(%
,A/$./(
4%&*3,%/(&)+3/(4%*5,2'./(
4%$P$.,-
%E,2(,)$%'(
3%'55)/&'./*
($6%
L'22'@
,,%'33$%(,D%$&')'2%/(
$.2+&./*($%$+
&1%'$%@
/.%&*+(.%'(
3%@/.%
$&'(?%D1/&1%>/(3$%.1,%(,A.%$,.%@
/.%D/.1/(%'%2,4
/$.,26%%
L'22'@
,,%')$*
%'33$%(,D%/($.2+&./*($%'(3%/($.2+&./*(%-*3,$%>*2%
,A5)/&/.%&'&1
,%&*(.2*)6%_A'-5),$%/(
&)+3,%/(
$.2+&./*($%.*%52,>,.&1
%3'.'%/(
.*%.1,%LK%*2%L:%&'&1
,$%'(3%/($.2+&./*(%-*3,$%.*
%2,3+&,%.1
,%52/*2/.P%*>%'%&'&1
,%)/(,6%J*2%,A'-5),?%$.2,'-
/(4%3'.'%
.P5/&'))P
%$D,,5$%,A
/$./(4%3'.'%*
+.%*>%'%&'&1
,6%L'22'@
,,%/$%'@),%.*
%-'2E%,'&1
%$.2,'-
/(4%&'&1
,%)/(,%>*
2%,'2)P%,C/&./*
(%'>.,2%/.%/$%'&&,$$,3
6%T1,$,%
&'&1,%&*(.2*)%/($.2+&./*($%')$*
%'))*D%.1,%L:%&'&1
,%.*%@,%+$,3%
$/-/)'2)P
%.*%'%$&2'.&1
5'3%-,-*2P?%D1/),%2,-
'/(/(4%>+))P%&*1,2,(
.6%%
b/.1/(%'%$/(
4),%&*
2,?%$P(&12*(/`/(4%'&&,$$%.*
%$1'2,3
%-,-*2P%@P%
-+)./5),%.1
2,'3$%/$%/(
,A5,($/C,6%T
1,%.12,'3
$%*(%'%$/(
4),%&*
2,%$1'2,%
.1,%$'-
,%)*&')%
LK%&'&1
,?%$*%'%$/(4),%'.*-/&%$,-
'51*2,%2,'3
%D/.1/(%.1,%LK%&'&1
,%/$%$+>>/&/,(
.6%FP(&12*(/`/(4%'&&,$$%@
,.D,,(%
Larra
bee: A
Many-C
ore
x86 A
rchite
ctu
re fo
r Vis
ual C
om
putin
g • 1
8:3
AC
M T
ransactio
ns o
n G
raphics, V
ol. 2
7, N
o. 3
, Article 1
8, P
ublicatio
n d
ate: August 2
008.
16-lane Vector Unit takes 2/3 of CPU core chip area.
Handles the graphics “heavy lifting”.
x86-ISA Scalar Unit
based on the 1992 original
Pentium.
In-order static
pipeline, dual issue.
52
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Vector Unit specializations ...
!
!"#$%&#'()*
+',(%,(!*+'('-
&'.,%/'0(,%.
)'(%$(+'1"%+',(%.
$'+2&+*)',,*
+(#*)3,4(56%,(%,(7(8
'##(3.*8.(&+*9#'!(%.(!"#$%2&
+*)',,*
+(:',%;
.4((
<"#$%2%,,"
'(=>?()*+',(
*@$'.(#*,'(
&'+@*
+!7.)'(
:"'($*($6'(
:%@@%)"
#$A(*@(@%.:%.;(%.,$+")$%*.,($67$()7.('-')"$'($*;'$6'+4(
B7++79
''C,(:"7#2%,,"
'(:')*:'+(6
7,(7(6%;6(!"#$%2%,,"
'(+7$'(%.()*:'(
$67$(8
'C/'($',$':
4(56'(&7%+%.
;(+"#',(@*
+($6'(&+%!7+A(7.:(,')*
.:7+A(
%.,$+")$%*.(&%&',(7+'(
:'$'+!
%.%,$%)0(
86%)6(7##*8,()*!&%#'+,(
$*(
&'+@*
+!(*@@#%.
'(7.7#A,%,(8
%$6(7(8
%:'+(,)*
&'($67.(7(+"
.$%!'(*"$2*@2
*+:'+(%.,$+")$%*.(&%)3'+()7.4(D##(%.,$+")$%*.,()7.(%,,"
'(*.($6'(
&+%!7+A(&%&'#%.'0(86%)6(!%.%!%E',(
$6'()*!9%.7$*+%7#(
&+*9#'!,(
@*+(7()*
!&%#'+4(5
6'(,')*
.:7+A(&%&'#%.'()7.
('-')"$'(7(#7+;
'(,"9,'$(
*@($6'(,)7#7+(-
FG(%.,$+")$%*.(,'$0(%.
)#":%.;(#*7:,0(,$*
+',0(,%!&#'(
DB?(*&'+7$%*
.,0(9+7.)6',0()7)6
'(!7.%&"#7$%*
.(%.,$+")$%*.,0(7.
:(
/')$*
+(,$*+',4(H
')7",'($6
'(,')*.:7+A(&%&'#%.'(%,(+'#7$%/
'#A(,!
7##(7.:()6'7&0($6'(7+'7(7.
:(&*8'+(8
7,$':(9A(@7%#%.
;($*(:"7#2%,,"
'(*.(
'/'+A()A)#'(
%,(,!7##4(
I.(*"+(7.7#A,%,0(
%$(%,(+'#7$%/
'#A('7,A
(@*+(
)*!&%#'+,($*
(,)6':"#'(:"7#2%,,"
'(%.,$+")$%*.,4(
J%.7##A0(B7++79
''(,"&&*+$,(@*
"+($6+'7:
,(*@('-')"$%*.0(8%$6(,'&
7+7$'(+';%,$'+(,'$,(&
'+($6+'7:
4(K8%$)6%.;($6+'7:
,()*/'+,()7,',(8
6'+'($6
'()*!&%#'+(
%,(".79#'($*(,)6':"#'()*:'(8%$6*"$(,$7##,4(
K8%$)6%.;(
$6+'7:
,(7#,*()*/'+,(&
7+$(*@($6'(#7$'.
)A($*(#*7:(@+*!($6'(BL()7)6
'($*($6'(BM()7)6
'0(@*+($6*,'()7,',(8
6'.(:7$7()7.
.*$(9'(&+'@'$)6
':(
%.$*($6'(BM()7)6
'(%.(7:/7.)'4(=
7)6'(",'(%,(!
*+'('@@')$%/
'(86'.(
!"#$%&#'($6
+'7:,(+"
..%.;(*.($6'(,7!
'()*+'("
,'($6'(,7!
'(:7$7,'$0(
'4;4(+'.
:'+%.
;($+%7.
;#',($*
($6'(,7!
'($%#'4((
!"!#$%&'()#*)(&%++,-.#/-,'#
B7++79
''(;7%.,(%$,()*
!&"$7$%*
.7#(:'.,%$A(@+*!($6'(MG28%:'(/')$*
+(&+*)',,%.
;(".%$(NO
>?P0(8
6%)6('-')"$',(%.
$';'+0(,%.
;#'2&
+')%,%*.(
@#*7$0(7.:(:*"9#'2&+')%,%*
.(@#*7$(%.,$+")$%*.,4(56'(O>?(7.:(%$,(
+';%,$'+,(7+'(7&
&+*-%!7$'#A
(*.'($6%+:($6'(7+'7(*
@($6'(=>?()*+'(9"$(
&+*/%:'(!*,$(*@($6'(%.$';'+(7.
:(@#*7$%.;(&*%.$(&'+@*
+!7.)'4(J
%;"+'(
Q(,6*8,(7(9
#*)3(:%7;+7!(*@($6'(O>?(8%$6($6'(BM()7)6
'4((
(
!"#$%&!'"!#$%&'
(!)*+&!,-'%.!/+01(02"!&3$!#45!6)77'(&6!
89
'7$(0*/!+*6&()
%&+'*6:!;&!6)
77'(&6!6<
+==-+*1!&3$!($1
+6&$(!+*7)&6!0*/!
*)2$(+%!%'
*>$(6+'
*!0*/!($7
-+%0&+'*!'*!&3$!2
$2'(?!+*
7)&:!@
06.!
($1+6&$(6!0
--'<!7($/+%0&+*1!&3$!($6)
-&+*1!>$%&'
(!<(+&$6:!
R'()6*,'(7(MG28%:'(O>?(7,(7($+7:
'*@@(9'$8''.(%.)+'7,':
()*!&"$7$%*
.7#(:'.,%$A(7.:($6'(:%@@%)"
#$A(*@(*9$7%.%.;(6%;6(
"$%#%E7$%*
.(@*+(8%:'+(O>?,4(S7+#A(7.7#A,%,(
,";;',$':
(FFT(
"$%#%E7$%*
.(@*+($A&%)7#(&
%-'#(,6
7:'+(8
*+3#*7:,(%@(M
G(#7.
',(&+*)',,(
MG(,'&
7+7$'(&%-'#,(*
.'()*
!&*.'.$(7$(7($%!
'0($67$(%,0(8
%$6(,'&
7+7$'(%.,$+")$%*.,($*
(&+*)',,(+':
0(;+''.
0('$)40(@*+(MG(&%-'#,(7$(7($%!
'0(%.,$'7:
(*@(&+*)',,%.
;(!"#$%&#'()*
#*+()6
7..'#,(7$(*
.)'4(5
6'(U/%:%7(
V'J*+)'(
F(*&'+7$',(
%.(7(,%!%#7+(
@7,6%*.0(*+;7.%E%.;(%$,(,)7#7+(
KI<W(&+*)',,*
+,(%.(;+*"&,(*@(XL($67$('-')"$'($6'(,7!
'(%.,$+")$%*.(YU%)3*##,('$(7#4(L
ZZF[4(5
6'(!
7%.(:%@@'+'.
)'(%,($67$(%.
(B7++79
''($6'(#**&()*.$+*#0()7)6
'(!7.7;'!'.$0(7.:(*$6'+(,")6(
*&'+7$%*
.,(7+'()*
:'($67$(+"
.,(%.(&7+7##'#(8
%$6($6'(O>?0(%.,$'7:
(*@(
9'%.;(%!&#'!'.$':(7,(@%-
':(@".)$%*.(#*;%)4(
B7++79
''(O>?(%.,$+")$%*.,(7##*
8("&($*($6+''(,*
"+)'(*
&'+7.
:,0(*.'(
*@(86%)6()7.
()*!'(:%+')$#A
(@+*!($6'(BM()7)6
'4(I@($6'(:7$7(6
7,(
9''.(&+'@'$)6
':(%.$*($6'()7)6
'0(7,(:',)+%9
':(%.(K')$%*
.(X4L0($6'.(
$6'(BM()7)6
'(%,(%.('@@')$(7.
('-$'.:':(+';
%,$'+(@%#'4(F29%$(".*+!0(F2
9%$("%.$0(MG29%$(,%.
$(7.:(MG29%$(@#*
7$(:7$7()7.
(9'(+'7:
(@+*!($6'(
)7)6'(7.
:()*./'+$':
($*(XL29%$(@#*
7$,(*+(XL29%$(%.
$';'+,(8
%$6(.*(#*,,(
*@(&'+@*
+!7.)'4(5
6%,(,%;
.%@%)7.
$#A(%.)+'7,',($6
'(7!*".$(*@(:7$7(
$67$()7.
(9'(,$*
+':(%.($6'()7)6
',(7.:(7#,*
(+':")',($6
'(.'':(@*+(
,'&7+7$'(:
7$7()*./'+,%*
.(%.,$+")$%*.,4(
56'(.'-$(,$7;
'(%,($*(7#%;
.($6'(:7$7(@+*
!(+';%,$'+,(7.
:(!'!*+A(8%$6(
$6'(&+*)',,%.
;(#7.',(%.
($6'(O>?4(\';%,$'+(:
7$7()7.(9'(,8
%EE#':(%.(
7(/7+%'$A
(*@(87A,0('4;
4($*(,"&&*+$(!
7$+%-(!"#$%&#%)7$%*
.4(W7$7(@+*
!(
!'!*+A()7.(9'(+'&#%)7$':
(7)+*
,,($6'(O>?(#7.',4(
56%,(%,(7(
)*!!*.(*&'+7$%*
.(%.(9*$6(;+7&6%),(7.
:(.*.2;+7&6%),(&
7+7##'#(:7$7(
&+*)',,%.
;0(86%)6(,%;.%@%)7.
$#A(%.)+'7,',($6
'()7)6'('@@%)%'.
)A4(
56'(O>?(,"&&*+$,(7(8
%:'(/7+%'$A
(*@(%.,$+")$%*.,(*.(9*$6(%.$';'+(
7.:(@#*7$%.;(&*%.$(:7$7(
$A&',4(
56'(%.,$+")$%*.(,'$(
&+*/%:',($6'(
,$7.:7+:(7+%$6
!'$%)(*
&'+7$%*
.,0(%.
)#":%.;(@",':(!"#$%&#A27::0(7.:(
$6'(,$7.
:7+:(#*;%)7#(*
&'+7$%*
.,0(%.
)#":%.;(%.,$+")$%*.,($*
('-$+7)$(
.*.29A$'27#%;
.':(@%'#:
,(@+*!(&%-'#,4(
56','(
7+'(7##(
#*7:2*&(
%.,$+")$%*.,0(8
6%)6(+'7:
(@+*!(+';
%,$'+,(*+(!'!*+A(7.:(8+%$'($6
'(+',"
#$($*(7(/
')$*+(+';
%,$'+4(D::%$%*.7#(#*
7:(7.:(,$*+'(%.
,$+")$%*.,(
,"&&*+$(7(8%:'+(/7+%'$A
(*@()*./'+,%*
.,(9'$8''.(@#*7$%.;(&*%.$(
/7#"',(7.
:($6'(#',,()*
!!*.(*+(!*+'()*
!&#'-(:7$7(@*
+!7$,(@*
".:(
*.(!*,$(V
>?,4(?
,%.;(,'&
7+7$'(%.,$+")$%*.,(@*+($6','(@*
+!7$,(,7/
',(,%;.%@%)7.
$(7+'7(7.:(&*8'+(7$(7(,!
7##(&'+@*
+!7.)'()*
,$4(
56'(O>?(%.,$+")$%*.(,'$(7#,*
(%.)#":',(;
7$6'+(7.
:(,)7$$'+(,"
&&*+$0(
$67$(%,0(#*
7:,(7.
:(,$*+',(@+*
!(.*.2)*.$%;"*",(7:
:+',,',4(I.
,$'7:(*@(
#*7:%.;(7(M
G28%:'(/')$*
+(@+*!(7(,%.
;#'(7:
:+',,0(M
G('#'!
'.$,(7+'(
#*7:':(@+*!(*+(,$*+':($*("&($*(MG(:%@@'+'.
$(7::+',,',(
$67$(7+'(
,&')%@%':
(%.(7.*$6'+(/')$*
+(+';%,$'+4(
56%,(7##*8,(MG(,67:'+(
%.,$7.
)',($*(9'(+".(%.(&7+7##'#0(
'7)6(*@(86%)6(7&&'7+,(
$*(+".(
,'+%7##A0('/'.(86'.(&'+@*
+!%.;(7++7A
(7))',,',(
8%$6()*!&"$':(
%.:%)',4(5
6'(,&
'':(*@(;7$6'+],)7$$'+(%,(#%!
%$':(9A($6'()7)6
'0(86%)6(
$A&%)7##A
(*.#A(7))',,',(*
.'()7)6
'(#%.'(&'+()A
)#'4(^*8'/'+0(!
7.A(
8*+3#*7:,(67/'(6%;6#A()*6'+'.
$(7))',,(
&7$$'+.
,0(7.:($6'+'@*
+'($73'(!")6(#',,($6
7.(MG()A)#',($*
('-')"$'4((
J%.7##A0(B7++79
''(O>?(%.,$+")$%*.,()7.
(9'(&+':%)7$':
(9A(7(!
7,3(
+';%,$'+0(
86%)6(67,(*
.'(9%$(&'+(/
')$*+(#7.
'4(56'(!7,3()*.$+*#,(
86%)6(&7+$,(*
@(7(/')$*
+(+';%,$'+(*
+(!'!*+A(#*)7$%*
.(7+'(8
+%$$'.(
7.:(86%)6(7+'(#'@$("
.$*")6':4(J*+('-
7!&#'0(7(,)7#7+(%@2$6
'.2'#,'(
)*.$+*#(,$+")$"+'()7.(9'(!7&&':(*.$*($6'(O>?(9A(",%.;(7.(
%.,$+")$%*.($*(,'$(7(!
7,3(+';
%,$'+(97,':
(*.(7()*
!&7+%,*
.0(7.:($6'.(
'-')"$%.;(9*$6(%@(7.
:('#,'()#7"
,',(8%$6(*&&*,%$'(&
*#7+%$%',(*
@($6'(
!7,3(+';
%,$'+()*.$+*##%.;(86'$6'+($*
(8+%$'(+',"
#$,4(=#7",',()7.
(9'(
,3%&&':('.$%+'#A
(%@($6'(!
7,3(+';
%,$'+(%,(7##(E'+*,(*+(7##(*
.',4(5
6%,(
+':")',(9
+7.)6(!%,&+':%)$%*
.(&'.7#$%',(@*
+(,!7##()#7"
,',(7.:(;%/',(
$6'()*
!&%#'+C,(%.
,$+")$%*.(,)6
':"#'+(;
+'7$'+(@+'':*!4((
56'(O>?(7#,*
(",',(
$6','(
!7,3,(@*+(&7)3':(#*7:(7.:(,$*+'(
%.,$+")$%*.,0(86%)6(7))',,(
'.79#':('#'!
'.$,(@+*!(,'1"'.$%7#(
#*)7$%*
.,(%.(!'!*+A4(56%,('.79#',(
$6'(&+*;+7!!'+($*(9".:#'(
,&7+,'(,$+7.
:,(*@('-
')"$%*.(,7$%,@A
%.;()*!&#'-(9+7.)6()*.:%$%*.,(
%.$*(7(@*
+!7$(!
*+'('@@%)%'.
$(@*+(/')$*
+()*!&"$7$%*
.4((
!"0#1-'%)2*)(&%++()#3,-.#4%'5()6#
B7++79
''(",',(7(9
%2:%+')$%*
.7#(+%.
;(.'$8*+3($*(7##*
8(7;'.$,(,"
)6(
7 ,(=>?()*+',0(B
L()7)6
',(7.:(*$6'+(#*
;%)(9
#*)3,($*()*!!".%)7$'(
8%$6('7)6
(*$6'+(8
%$6%.($6'()6
%&4(R
6'.(,)7#%.
;($*(!*+'($6
7.(MG(
)*+',0(8
'(",'(!
"#$%&#'(,6
*+$(#%.
3':(+%.;,4((
S7)6(+%.;(:7$72&
7$6(%,(_
ML29%$,(8
%:'(&'+(:
%+')$%*.4(D##($6
'(+*"$%.;(
:')%,%*
.,(7+'(!
7:'(9'@*+'(%.
`')$%.;(!',,7;
',(%.$*($6'(.'$8*+34(
J*+('-
7!&#'0('7)6
(7;'.$()7.
(7))'&$(7(!
',,7;'(@+*
!(*.'(:%+')$%*
.(
18:4
• L. S
eile
r et a
l.
AC
M T
ransactio
ns o
n G
raph
ics, Vo
l. 27
, No
. 3, A
rticle 18
, Pu
blicatio
n d
ate: Au
gu
st 20
08
.
Load/store instruction variants may specify type conversion.
This supports a toll-free way store low resolution data in packed form, while letting the CPU compute on the data in a high-resolution format.
53
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Can two EECS students prototype a GPU in one semester?
54
-Xilinx Virtex-5 ML505-110T-Vertex Shaders : Red-Pixel Shaders: Yellow-Rasterizer: Cyan-DDR2 Interface: Green
60% slice usage
55
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Fall 2008 CS 194-6: Pure project course.
Project teams: 1-to-3 person groups. This Friday: setting project topics. 10-noon, 125 Cory (not 119 Cory!).
No exams.
Mondays: Lectures teach topics useful for the project
No homework.
Fridays: Project
check-ups. 56
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
CS 194-6: Project red-letter days.
Grade is based on revised versions of presentation slides (email PDFs to Greg and John after presentation).
Specification Design ReviewMon, Sept 29
Implementation Design Review
Mon, Nov 3
Final Presentation
Fri, Dec 5
57
YONG-JIN KWONCHEN SUN
PEGASUSPipeline Engineered Graphics Accelerator Supporting Unified
Shaders
58
Top level Block Diagram
59
3D API
<Constants>
Transform Matrix (4x4)
Inverse Perspective Matrix (4x4)
Transposed Rotation Matrix (4x4)
</Constant>
<Lights>
0, Vertex Index1
1, Vertex Index2
</Lights>
<Vertex>
0: x y z r g b nx ny nz
1: x y z r g b nx ny nz
2: x y z r g b nx ny nz
</Vertex>
<Triangle>
0: V1 V2 V3
1: V3 V4 V5
</Triangle>
Bit-widths:
Matrix – 256b
Element: 16 bits
Lights – 32b
Light Index: 16 bits
Vertex Index: 16 bits
Vertices – 115b
Vertex: 16 bits
Index: 16 bits
Color: 8 bits
Normal: 9 bits
Triangles – 64b
Triangle Index: 16 bits
Vertex Index: 16 bits
60
Java Framework
61
Java vs Hardware Comparison
� Java simulation on top
� Hardware Render on Bottom
� No lights yet� Ignore the
Black Vs. Gray background
62
More Java vs Hardware
-Java on the left, Actual render on the right-No hardware results yet for pixel shaded teapot
-Hardware fully functional (buggy shader code)-One thing to note: 500+ vertices and 900+ triangles
63
UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors
Next Tuesday’s Lecture:
Highly recommended !
64