Upload
mikesfbay
View
224
Download
0
Embed Size (px)
Citation preview
8/13/2019 PS2Optimisations
1/39
George Bain - PS2 Programming Optimisations 1AGDC 20022002 Sony Computer Entertainment Europe
PS2 ProgrammingPS2 ProgrammingOptimisationsOptimisations
George BainSCEE Technology Group
8/13/2019 PS2Optimisations
2/39
George Bain - PS2 Programming Optimisations 2AGDC 20022002 Sony Computer Entertainment Europe
Topics CoveredTopics Covered
Performance Analyser
DMA Transfers Vector Units
Graphics Synthesizer EE Core: CPU
File loading
8/13/2019 PS2Optimisations
3/39
George Bain - PS2 Programming Optimisations 3AGDC 20022002 Sony Computer Entertainment Europe
Performance AnalyserPerformance Analyser
Capture snapshot of
EE (Core, Bus, Vu0, and Vu1) GIF and GS
7 frames of bus activity
Identify bottlenecks! Also used as a Dev Kit
8/13/2019 PS2Optimisations
4/39George Bain - PS2 Programming Optimisations 4AGDC 20022002 Sony Computer Entertainment Europe
PS2 MemoryPS2 Memory
CPU 32MBRDRAM
4MB EmbeddedGraphics Synthesizer
8K Data
8K Frame
16K Instruction
16K Scratchpad
8K Texture
4K DataVector Unit 04K Instruction
16K DataVector Unit 1
16K Instruction
N/A
8/13/2019 PS2Optimisations
5/39George Bain - PS2 Programming Optimisations 5AGDC 20022002 Sony Computer Entertainment Europe
DMA Bus BandwidthDMA Bus Bandwidth
EE RDRAM to Device = 2.4 GB/Sec
DMAC Transfers in 8QW slices Align DMA Reference data on 8QW Boundary Increase DMA transfer speed 30-40%
Limit DMA tags
Tag alignment
UNPACK STCYCL DMA Cnt tag
7 QW
NOP NOP DMA End tag
8 QW
boundary
8 QWboundary
8/13/2019 PS2Optimisations
6/39George Bain - PS2 Programming Optimisations 6AGDC 20022002 Sony Computer Entertainment Europe
Checking DMA completionChecking DMA completion
DMA.STR
Register polling
Main BUS
CPU BC0F
Polling
8/13/2019 PS2Optimisations
7/39George Bain - PS2 Programming Optimisations 7AGDC 20022002 Sony Computer Entertainment Europe
Cycle StealingCycle Stealing
Cycle Stealing ON or OFF?
release is time between two DMA slices allow more time for CPU to access the main bus
slows down overall DMA transfer
CPU
GIF andVIF1 DMA
8/13/2019 PS2Optimisations
8/39George Bain - PS2 Programming Optimisations 8AGDC 20022002 Sony Computer Entertainment Europe
Memory FIFOMemory FIFO
What are the advantages?
MFIFO can buffer DMA packets if stall occurs onDrain DMA channel
When VU1 or GS becomes the bottleneck
Avoid Data Cache and perform memory writes to16K scratchpad memory
Scratchpad DMA provides maximum DMA transfer
speed to Memory FIFO
8/13/2019 PS2Optimisations
9/39George Bain - PS2 Programming Optimisations 9AGDC 20022002 Sony Computer Entertainment Europe
GS FIFOGS FIFO
What can cause the GS FIFO to become full?
Large primitives such as a full screen sprite Multiple texture passes
VIF1 DMA
VU1 Run
GIF to GS FIFO
(GS FIFO full)
GS Pixel Engines Busy
GS FIFO
requests data
from GIF
8/13/2019 PS2Optimisations
10/39George Bain - PS2 Programming Optimisations 10AGDC 20022002 Sony Computer Entertainment Europe
Draining MFIFO with VIF1Draining MFIFO with VIF1
What can cause the MFIFO to become full?
1. If GS FIFO is full, GIF doesnt request any data2. XGKICK instruction will stall VU1
3. VIF1 stalls on sync related instructions such as
MSCNT and FLUSHA
SPR MFIFO VIF1 GSVU1 GIF
8/13/2019 PS2Optimisations
11/39George Bain - PS2 Programming Optimisations 11AGDC 20022002 Sony Computer Entertainment Europe
Geometry and Texture SyncingGeometry and Texture Syncing
1.2 GB/Sec Bandwidth to GS
PATH1 for Geometry and PATH3 for Textures
VU1 VU1 MEM
PATH 1
VIF1 FIFO
MAIN BUSMAIN RAM
PATH 2 PATH 3
GIF FIFO
GIF
GS
8/13/2019 PS2Optimisations
12/39George Bain - PS2 Programming Optimisations 12AGDC 20022002 Sony Computer Entertainment Europe
Texture Transfer PathsTexture Transfer Paths
PATH2 Advantages
Easy to transfer textures and set other GS registers
No geometry and texture data sync problems
Disadvantages PATH1 will stall if PATH2 is still in progress
PATH3 Advantages
Parallel DMA transfers through VIF1 and GIF channels
GIF can operate in 2 different modes when using IMAGE mode Avoids PATH1 stalls when operating GIF in IMT mode
Disadvantages Sometimes difficult to synchronize geometry and texture data
8/13/2019 PS2Optimisations
13/39
George Bain - PS2 Programming Optimisations 13AGDC 20022002 Sony Computer Entertainment Europe
GIF in Intermittent ModeGIF in Intermittent Mode
What are the benefits?
Allows texture transfers via the GIF while VIF1 andVU1 continue to process data
What are some things I should consider?
IMT Mode is good when loading large texture blocks If GIF is constantly being occupied by PATH1 then
texture transfer via PATH3 is reduced
Cant draw and transfer textures at same time!
Batch textures together to limit overhead!
8/13/2019 PS2Optimisations
14/39
George Bain - PS2 Programming Optimisations 14AGDC 20022002 Sony Computer Entertainment Europe
GIF IMT ModeGIF IMT Mode OFFOFF
GIF DMA
Complete
Geometry
VU1 Stalling
GIF DMA
Texture
VU1 Running
VIF1 DMA
8/13/2019 PS2Optimisations
15/39
George Bain - PS2 Programming Optimisations 15AGDC 20022002 Sony Computer Entertainment Europe
GIF IMT ModeGIF IMT Mode ONON
GIF DMA
VIF1 DMA
Texture
VU1
Running No XGKICK
Stall
Geometry
8/13/2019 PS2Optimisations
16/39
George Bain - PS2 Programming Optimisations 16AGDC 20022002 Sony Computer Entertainment Europe
Packing Texture DataPacking Texture Data
Pack 4-Bit and 8-Bit texture data
32-Bit textures provide maximum transfer speed 4/8-Bit textures must be converted by the GS
Consider the transfer speed and block layouts
16 and 32-Bit pixel modes have very similar speeds
Size W Size H PATH3 MB/SFormat
32-Bit16-Bit
8-Bit
256 256 1070256 256 1050
256 256 7854-Bit 256 256 380
PATH2 MB/S
10901075
800
385
8/13/2019 PS2Optimisations
17/39
George Bain - PS2 Programming Optimisations 17AGDC 20022002 Sony Computer Entertainment Europe
VCL ToolVCL Tool
Application that simplifies Vu1 Programming
Available for Linux and Windows Generates VSM source code
Handles many tasks Dual Pipeline processing Loop unrolling
Register allocation Instruction scheduling
8/13/2019 PS2Optimisations
18/39
George Bain - PS2 Programming Optimisations 18AGDC 20022002 Sony Computer Entertainment Europe
Vu0 UsageVu0 Usage
Transferring Data to Vu0
Cop2 connection you can transfer 1QW in 2Cycles DMA transfer you can transfer 1QW in 4Cycles
Processing Data with Vu0 Vu0 running Micro code
Triple Buffer Scratchpad memory
Transfer data to Block A
Process Block A and Transfer Block B
Drain Block A, Process B, Transfer C
8/13/2019 PS2Optimisations
19/39
George Bain - PS2 Programming Optimisations 19AGDC 20022002 Sony Computer Entertainment Europe
Geometry Data TransferGeometry Data Transfer
Reduce memory consumption and bandwidth
Remember Vector Unit register VF00.w = 1.0
4QW Per Vertex 3QW Per Vertex
1.0f Z Y X A B G R
1.0f 1.0f T S Ny Nx T S
A B G R Nz Z Y X
1.0f Nz Ny Nx
8/13/2019 PS2Optimisations
20/39
George Bain - PS2 Programming Optimisations 20AGDC 20022002 Sony Computer Entertainment Europe
Compress Geometry DataCompress Geometry Data
use the VIF to convert integer to float
use the VU to convert integer to float
Vector Unpack Mode
X,Y,Z 16 Bit
16 BitS,T
8 BitRGBA
VU Instruction
ITOF0
ITOF12
ITOF0
16 BitNx,Ny,Nz ITOF15
Compress 4 QW to 1.25 QW
8/13/2019 PS2Optimisations
21/39
George Bain - PS2 Programming Optimisations 21AGDC 20022002 Sony Computer Entertainment Europe
GS Frame BuffersGS Frame Buffers
Total of 4 MB of Embedded DRAM
Draw, Display, Z and Texture Buffers What are some recommended buffer sizes?
PAL (512 x 512), NTSC (512 x 448)
Progressive scan support with full height buffers
2-Circuits of the GS to reduce interlace flicker
alpha blend odd/even fields at no cost
8/13/2019 PS2Optimisations
22/39
George Bain - PS2 Programming Optimisations 22AGDC 20022002 Sony Computer Entertainment Europe
GS CapabilitiesGS Capabilities
Bandwidth
Massive total of 48 GB/Sec Frame Buffer 38.4 GB/Sec
Texture Buffer 9.6 GB/Sec
Drawing Speed 16 Pixel for non-textured (2.4 Gpixels/Sec)
75M Flat shaded Triangles/Sec
8 Pixel for textured (1.2 Gpixels/Sec)
37.5M Textured and Gouraud shaded Triangles/Sec
8/13/2019 PS2Optimisations
23/39
George Bain - PS2 Programming Optimisations 23AGDC 20022002 Sony Computer Entertainment Europe
GS PipelineGS Pipeline
Emotion
Engine
Host IF
Set-up and Rasterizing
Pixel Pipeline x 16
Memory IF
VRAM 4MB
Frame Buffer Texture Buffer
PCRTC
48 GB/Sec
Video Out
8/13/2019 PS2Optimisations
24/39
George Bain - PS2 Programming Optimisations 24AGDC 20022002 Sony Computer Entertainment Europe
GS Frame/Z CacheGS Frame/Z Cache
Quick Page refills!
8192bits per cycle
8K page buffer refilled in 8 GS cycles
4K 4K
Frame
32x32
Z
32x32
8/13/2019 PS2Optimisations
25/39
George Bain - PS2 Programming Optimisations 25AGDC 20022002 Sony Computer Entertainment Europe
Reducing Frame Page MissesReducing Frame Page Misses
Fill rate is roughly constant if varying height
Wide Primitives will cause page misses Use 32 Pixel wide strips to reduce page misses
Rarely drop below 1Gpixel/Sec if miss occurs
Primitives using textures greater than a pagesize are usually more of a problem
8Bit texture page is 128x64
8/13/2019 PS2Optimisations
26/39
George Bain - PS2 Programming Optimisations 26AGDC 20022002 Sony Computer Entertainment Europe
Texture Fill RatesTexture Fill Rates
Texture Page misses have biggest effect
Subdivide large texture co-ordinate ranges Keep mip-maps in the same page
Texture reduction reduces the fill rate
32 pixel wide strips wont increase performance
Texel read becomes bottleneck
Texture expansion doesnt affect fill rate
8/13/2019 PS2Optimisations
27/39
George Bain - PS2 Programming Optimisations 27AGDC 20022002 Sony Computer Entertainment Europe
Fill Rate VS Triangle SizeFill Rate VS Triangle Size
0
500
1000
1500
2x2
4x4
8x8
16x
16
32x
32
64x
64
128x12
8
256x25
6
*Texture is on cache without reducing size
Fillrate
Untextured
Textured*
8/13/2019 PS2Optimisations
28/39
George Bain - PS2 Programming Optimisations 28AGDC 20022002 Sony Computer Entertainment Europe
Level Of DetailLevel Of Detail
Make better use of LOD!
5000 polygon model may result in just 50 visiblepixels once projected onto the screen
theres also no point having detailed textures thatare going to be shrunk so much
Mip Mapping Improve visual quality
Mip maps in different pages can cause multipletexture cache reloads
8/13/2019 PS2Optimisations
29/39
George Bain - PS2 Programming Optimisations 29AGDC 20022002 Sony Computer Entertainment Europe
MultiMulti--Pass RenderingPass Rendering
GS Alpha Blend operation is free!
Maximum textured fill rate is 1.2G Pixels/Sec Limit number of passes (4 passes = 300M P/S)
Fur rendering
Reduce passes when object in distance
Bump-mapping is possible
Technique requires full screen passes Back face cull to reduce GS stalls
8/13/2019 PS2Optimisations
30/39
George Bain - PS2 Programming Optimisations 30AGDC 20022002 Sony Computer Entertainment Europe
0200400600800
10001200
8x8
16x
16
32x
32
64x
64
128x12
8
256x25
6
Fillrate
Textured*
Texture*+Fog
GS FoggingGS Fogging
4x4
2x2
Texture is on cache without reducing size
8/13/2019 PS2Optimisations
31/39
George Bain - PS2 Programming Optimisations 31AGDC 20022002 Sony Computer Entertainment Europe
Alternative FoggingAlternative Fogging
Technique 1
1st pass draw a textured polygon 2nd pass alpha blend gouraud shaded polygon
Technique 2 Post-process and perspective correct fogging
Move bits 8-15 of Z-Buffer into Alpha of Draw Buffer
Alpha blend full screen gouraud shaded polygononto Draw Buffer
8/13/2019 PS2Optimisations
32/39
George Bain - PS2 Programming Optimisations 32AGDC 20022002 Sony Computer Entertainment Europe
CPU OptimisationsCPU Optimisations
Emotion Engine Core
FPU (Coprocessor 1) Vu0 (Coprocessor 2)
16K Instruction Cache
8K Data Cache 16K Scratch-Pad Memory
Instruction Set
64Bit MIPS III and some MIPS IV
128Bit Multi-Media
8/13/2019 PS2Optimisations
33/39
George Bain - PS2 Programming Optimisations 33AGDC 20022002 Sony Computer Entertainment Europe
MultiMulti--Media InstructionsMedia Instructions
128-Bit Multi-Media Instructions
Parallel Processing 64 bits x2, 32 bits x4, 16 bits x8, 8 bits x16
Image format conversions
Sound decompressing
Pack DMA packets
Convert PACKED mode to REGLIST mode
Smaller data, faster DMA transfers!
8/13/2019 PS2Optimisations
34/39
George Bain - PS2 Programming Optimisations 34AGDC 20022002 Sony Computer Entertainment Europe
Use of Data CacheUse of Data Cache
Data Suitable for the Data Cache
Data that is frequently read or writtenrepeatedly
Data with a high degree of locality
Dont use Data Cache for
Data that gets used only once
Big chunks of data larger than 8K
8/13/2019 PS2Optimisations
35/39
George Bain - PS2 Programming Optimisations 35AGDC 20022002 Sony Computer Entertainment Europe
Reduce Cache MissesReduce Cache Misses
Prefetch instruction to load data beforehand
Reduce the size of your code for I$ Use Uncached memory for data r/w only once
Performance Counter Lib to measure misses
8/13/2019 PS2Optimisations
36/39
George Bain - PS2 Programming Optimisations 36AGDC 20022002 Sony Computer Entertainment Europe
Scratchpad MemoryScratchpad Memory
16K of high-speed memory (access directly)
2 dedicated DMA Channels (toSPR/fromSPR) SPR DMA provides best throughput
100% Occupy and 85% Send
Data Suitable for the SPR
Frequently used data where speed is a priority
Big chunks of data can be Double Buffered onSPR memory
8/13/2019 PS2Optimisations
37/39
George Bain - PS2 Programming Optimisations 37AGDC 20022002 Sony Computer Entertainment Europe
CD/DVD OptimisationsCD/DVD Optimisations
Align destination buffer on 64 Bytes
Increase performance by 25%! Combine files into a PAK file to reduce files
Avoid seeking when you could be reading
Load the most data you can per read
Combine IOP modules and load into EE
8/13/2019 PS2Optimisations
38/39
George Bain - PS2 Programming Optimisations 38AGDC 20022002 Sony Computer Entertainment Europe
SummarySummary
PA will push developers to the limit!
Parallel Texture and Geometry Transfer DMA is flexible and very powerful!
Take into consideration GS page sizes
Vector Unit 0 and Scratchpad memory
Check assembler output of generated code
8/13/2019 PS2Optimisations
39/39
George Bain - PS2 Programming Optimisations 39AGDC 20022002 Sony Computer Entertainment Europe
Further InformationFurther Information
Contact Information
SCEE Booth Exhibition Stand #9