PS2Optimisations

Embed Size (px)

Citation preview

  • 8/13/2019 PS2Optimisations

    1/39

    George Bain - PS2 Programming Optimisations 1AGDC 20022002 Sony Computer Entertainment Europe

    PS2 ProgrammingPS2 ProgrammingOptimisationsOptimisations

    George BainSCEE Technology Group

  • 8/13/2019 PS2Optimisations

    2/39

    George Bain - PS2 Programming Optimisations 2AGDC 20022002 Sony Computer Entertainment Europe

    Topics CoveredTopics Covered

    Performance Analyser

    DMA Transfers Vector Units

    Graphics Synthesizer EE Core: CPU

    File loading

  • 8/13/2019 PS2Optimisations

    3/39

    George Bain - PS2 Programming Optimisations 3AGDC 20022002 Sony Computer Entertainment Europe

    Performance AnalyserPerformance Analyser

    Capture snapshot of

    EE (Core, Bus, Vu0, and Vu1) GIF and GS

    7 frames of bus activity

    Identify bottlenecks! Also used as a Dev Kit

  • 8/13/2019 PS2Optimisations

    4/39George Bain - PS2 Programming Optimisations 4AGDC 20022002 Sony Computer Entertainment Europe

    PS2 MemoryPS2 Memory

    CPU 32MBRDRAM

    4MB EmbeddedGraphics Synthesizer

    8K Data

    8K Frame

    16K Instruction

    16K Scratchpad

    8K Texture

    4K DataVector Unit 04K Instruction

    16K DataVector Unit 1

    16K Instruction

    N/A

  • 8/13/2019 PS2Optimisations

    5/39George Bain - PS2 Programming Optimisations 5AGDC 20022002 Sony Computer Entertainment Europe

    DMA Bus BandwidthDMA Bus Bandwidth

    EE RDRAM to Device = 2.4 GB/Sec

    DMAC Transfers in 8QW slices Align DMA Reference data on 8QW Boundary Increase DMA transfer speed 30-40%

    Limit DMA tags

    Tag alignment

    UNPACK STCYCL DMA Cnt tag

    7 QW

    NOP NOP DMA End tag

    8 QW

    boundary

    8 QWboundary

  • 8/13/2019 PS2Optimisations

    6/39George Bain - PS2 Programming Optimisations 6AGDC 20022002 Sony Computer Entertainment Europe

    Checking DMA completionChecking DMA completion

    DMA.STR

    Register polling

    Main BUS

    CPU BC0F

    Polling

  • 8/13/2019 PS2Optimisations

    7/39George Bain - PS2 Programming Optimisations 7AGDC 20022002 Sony Computer Entertainment Europe

    Cycle StealingCycle Stealing

    Cycle Stealing ON or OFF?

    release is time between two DMA slices allow more time for CPU to access the main bus

    slows down overall DMA transfer

    CPU

    GIF andVIF1 DMA

  • 8/13/2019 PS2Optimisations

    8/39George Bain - PS2 Programming Optimisations 8AGDC 20022002 Sony Computer Entertainment Europe

    Memory FIFOMemory FIFO

    What are the advantages?

    MFIFO can buffer DMA packets if stall occurs onDrain DMA channel

    When VU1 or GS becomes the bottleneck

    Avoid Data Cache and perform memory writes to16K scratchpad memory

    Scratchpad DMA provides maximum DMA transfer

    speed to Memory FIFO

  • 8/13/2019 PS2Optimisations

    9/39George Bain - PS2 Programming Optimisations 9AGDC 20022002 Sony Computer Entertainment Europe

    GS FIFOGS FIFO

    What can cause the GS FIFO to become full?

    Large primitives such as a full screen sprite Multiple texture passes

    VIF1 DMA

    VU1 Run

    GIF to GS FIFO

    (GS FIFO full)

    GS Pixel Engines Busy

    GS FIFO

    requests data

    from GIF

  • 8/13/2019 PS2Optimisations

    10/39George Bain - PS2 Programming Optimisations 10AGDC 20022002 Sony Computer Entertainment Europe

    Draining MFIFO with VIF1Draining MFIFO with VIF1

    What can cause the MFIFO to become full?

    1. If GS FIFO is full, GIF doesnt request any data2. XGKICK instruction will stall VU1

    3. VIF1 stalls on sync related instructions such as

    MSCNT and FLUSHA

    SPR MFIFO VIF1 GSVU1 GIF

  • 8/13/2019 PS2Optimisations

    11/39George Bain - PS2 Programming Optimisations 11AGDC 20022002 Sony Computer Entertainment Europe

    Geometry and Texture SyncingGeometry and Texture Syncing

    1.2 GB/Sec Bandwidth to GS

    PATH1 for Geometry and PATH3 for Textures

    VU1 VU1 MEM

    PATH 1

    VIF1 FIFO

    MAIN BUSMAIN RAM

    PATH 2 PATH 3

    GIF FIFO

    GIF

    GS

  • 8/13/2019 PS2Optimisations

    12/39George Bain - PS2 Programming Optimisations 12AGDC 20022002 Sony Computer Entertainment Europe

    Texture Transfer PathsTexture Transfer Paths

    PATH2 Advantages

    Easy to transfer textures and set other GS registers

    No geometry and texture data sync problems

    Disadvantages PATH1 will stall if PATH2 is still in progress

    PATH3 Advantages

    Parallel DMA transfers through VIF1 and GIF channels

    GIF can operate in 2 different modes when using IMAGE mode Avoids PATH1 stalls when operating GIF in IMT mode

    Disadvantages Sometimes difficult to synchronize geometry and texture data

  • 8/13/2019 PS2Optimisations

    13/39

    George Bain - PS2 Programming Optimisations 13AGDC 20022002 Sony Computer Entertainment Europe

    GIF in Intermittent ModeGIF in Intermittent Mode

    What are the benefits?

    Allows texture transfers via the GIF while VIF1 andVU1 continue to process data

    What are some things I should consider?

    IMT Mode is good when loading large texture blocks If GIF is constantly being occupied by PATH1 then

    texture transfer via PATH3 is reduced

    Cant draw and transfer textures at same time!

    Batch textures together to limit overhead!

  • 8/13/2019 PS2Optimisations

    14/39

    George Bain - PS2 Programming Optimisations 14AGDC 20022002 Sony Computer Entertainment Europe

    GIF IMT ModeGIF IMT Mode OFFOFF

    GIF DMA

    Complete

    Geometry

    VU1 Stalling

    GIF DMA

    Texture

    VU1 Running

    VIF1 DMA

  • 8/13/2019 PS2Optimisations

    15/39

    George Bain - PS2 Programming Optimisations 15AGDC 20022002 Sony Computer Entertainment Europe

    GIF IMT ModeGIF IMT Mode ONON

    GIF DMA

    VIF1 DMA

    Texture

    VU1

    Running No XGKICK

    Stall

    Geometry

  • 8/13/2019 PS2Optimisations

    16/39

    George Bain - PS2 Programming Optimisations 16AGDC 20022002 Sony Computer Entertainment Europe

    Packing Texture DataPacking Texture Data

    Pack 4-Bit and 8-Bit texture data

    32-Bit textures provide maximum transfer speed 4/8-Bit textures must be converted by the GS

    Consider the transfer speed and block layouts

    16 and 32-Bit pixel modes have very similar speeds

    Size W Size H PATH3 MB/SFormat

    32-Bit16-Bit

    8-Bit

    256 256 1070256 256 1050

    256 256 7854-Bit 256 256 380

    PATH2 MB/S

    10901075

    800

    385

  • 8/13/2019 PS2Optimisations

    17/39

    George Bain - PS2 Programming Optimisations 17AGDC 20022002 Sony Computer Entertainment Europe

    VCL ToolVCL Tool

    Application that simplifies Vu1 Programming

    Available for Linux and Windows Generates VSM source code

    Handles many tasks Dual Pipeline processing Loop unrolling

    Register allocation Instruction scheduling

  • 8/13/2019 PS2Optimisations

    18/39

    George Bain - PS2 Programming Optimisations 18AGDC 20022002 Sony Computer Entertainment Europe

    Vu0 UsageVu0 Usage

    Transferring Data to Vu0

    Cop2 connection you can transfer 1QW in 2Cycles DMA transfer you can transfer 1QW in 4Cycles

    Processing Data with Vu0 Vu0 running Micro code

    Triple Buffer Scratchpad memory

    Transfer data to Block A

    Process Block A and Transfer Block B

    Drain Block A, Process B, Transfer C

  • 8/13/2019 PS2Optimisations

    19/39

    George Bain - PS2 Programming Optimisations 19AGDC 20022002 Sony Computer Entertainment Europe

    Geometry Data TransferGeometry Data Transfer

    Reduce memory consumption and bandwidth

    Remember Vector Unit register VF00.w = 1.0

    4QW Per Vertex 3QW Per Vertex

    1.0f Z Y X A B G R

    1.0f 1.0f T S Ny Nx T S

    A B G R Nz Z Y X

    1.0f Nz Ny Nx

  • 8/13/2019 PS2Optimisations

    20/39

    George Bain - PS2 Programming Optimisations 20AGDC 20022002 Sony Computer Entertainment Europe

    Compress Geometry DataCompress Geometry Data

    use the VIF to convert integer to float

    use the VU to convert integer to float

    Vector Unpack Mode

    X,Y,Z 16 Bit

    16 BitS,T

    8 BitRGBA

    VU Instruction

    ITOF0

    ITOF12

    ITOF0

    16 BitNx,Ny,Nz ITOF15

    Compress 4 QW to 1.25 QW

  • 8/13/2019 PS2Optimisations

    21/39

    George Bain - PS2 Programming Optimisations 21AGDC 20022002 Sony Computer Entertainment Europe

    GS Frame BuffersGS Frame Buffers

    Total of 4 MB of Embedded DRAM

    Draw, Display, Z and Texture Buffers What are some recommended buffer sizes?

    PAL (512 x 512), NTSC (512 x 448)

    Progressive scan support with full height buffers

    2-Circuits of the GS to reduce interlace flicker

    alpha blend odd/even fields at no cost

  • 8/13/2019 PS2Optimisations

    22/39

    George Bain - PS2 Programming Optimisations 22AGDC 20022002 Sony Computer Entertainment Europe

    GS CapabilitiesGS Capabilities

    Bandwidth

    Massive total of 48 GB/Sec Frame Buffer 38.4 GB/Sec

    Texture Buffer 9.6 GB/Sec

    Drawing Speed 16 Pixel for non-textured (2.4 Gpixels/Sec)

    75M Flat shaded Triangles/Sec

    8 Pixel for textured (1.2 Gpixels/Sec)

    37.5M Textured and Gouraud shaded Triangles/Sec

  • 8/13/2019 PS2Optimisations

    23/39

    George Bain - PS2 Programming Optimisations 23AGDC 20022002 Sony Computer Entertainment Europe

    GS PipelineGS Pipeline

    Emotion

    Engine

    Host IF

    Set-up and Rasterizing

    Pixel Pipeline x 16

    Memory IF

    VRAM 4MB

    Frame Buffer Texture Buffer

    PCRTC

    48 GB/Sec

    Video Out

  • 8/13/2019 PS2Optimisations

    24/39

    George Bain - PS2 Programming Optimisations 24AGDC 20022002 Sony Computer Entertainment Europe

    GS Frame/Z CacheGS Frame/Z Cache

    Quick Page refills!

    8192bits per cycle

    8K page buffer refilled in 8 GS cycles

    4K 4K

    Frame

    32x32

    Z

    32x32

  • 8/13/2019 PS2Optimisations

    25/39

    George Bain - PS2 Programming Optimisations 25AGDC 20022002 Sony Computer Entertainment Europe

    Reducing Frame Page MissesReducing Frame Page Misses

    Fill rate is roughly constant if varying height

    Wide Primitives will cause page misses Use 32 Pixel wide strips to reduce page misses

    Rarely drop below 1Gpixel/Sec if miss occurs

    Primitives using textures greater than a pagesize are usually more of a problem

    8Bit texture page is 128x64

  • 8/13/2019 PS2Optimisations

    26/39

    George Bain - PS2 Programming Optimisations 26AGDC 20022002 Sony Computer Entertainment Europe

    Texture Fill RatesTexture Fill Rates

    Texture Page misses have biggest effect

    Subdivide large texture co-ordinate ranges Keep mip-maps in the same page

    Texture reduction reduces the fill rate

    32 pixel wide strips wont increase performance

    Texel read becomes bottleneck

    Texture expansion doesnt affect fill rate

  • 8/13/2019 PS2Optimisations

    27/39

    George Bain - PS2 Programming Optimisations 27AGDC 20022002 Sony Computer Entertainment Europe

    Fill Rate VS Triangle SizeFill Rate VS Triangle Size

    0

    500

    1000

    1500

    2x2

    4x4

    8x8

    16x

    16

    32x

    32

    64x

    64

    128x12

    8

    256x25

    6

    *Texture is on cache without reducing size

    Fillrate

    Untextured

    Textured*

  • 8/13/2019 PS2Optimisations

    28/39

    George Bain - PS2 Programming Optimisations 28AGDC 20022002 Sony Computer Entertainment Europe

    Level Of DetailLevel Of Detail

    Make better use of LOD!

    5000 polygon model may result in just 50 visiblepixels once projected onto the screen

    theres also no point having detailed textures thatare going to be shrunk so much

    Mip Mapping Improve visual quality

    Mip maps in different pages can cause multipletexture cache reloads

  • 8/13/2019 PS2Optimisations

    29/39

    George Bain - PS2 Programming Optimisations 29AGDC 20022002 Sony Computer Entertainment Europe

    MultiMulti--Pass RenderingPass Rendering

    GS Alpha Blend operation is free!

    Maximum textured fill rate is 1.2G Pixels/Sec Limit number of passes (4 passes = 300M P/S)

    Fur rendering

    Reduce passes when object in distance

    Bump-mapping is possible

    Technique requires full screen passes Back face cull to reduce GS stalls

  • 8/13/2019 PS2Optimisations

    30/39

    George Bain - PS2 Programming Optimisations 30AGDC 20022002 Sony Computer Entertainment Europe

    0200400600800

    10001200

    8x8

    16x

    16

    32x

    32

    64x

    64

    128x12

    8

    256x25

    6

    Fillrate

    Textured*

    Texture*+Fog

    GS FoggingGS Fogging

    4x4

    2x2

    Texture is on cache without reducing size

  • 8/13/2019 PS2Optimisations

    31/39

    George Bain - PS2 Programming Optimisations 31AGDC 20022002 Sony Computer Entertainment Europe

    Alternative FoggingAlternative Fogging

    Technique 1

    1st pass draw a textured polygon 2nd pass alpha blend gouraud shaded polygon

    Technique 2 Post-process and perspective correct fogging

    Move bits 8-15 of Z-Buffer into Alpha of Draw Buffer

    Alpha blend full screen gouraud shaded polygononto Draw Buffer

  • 8/13/2019 PS2Optimisations

    32/39

    George Bain - PS2 Programming Optimisations 32AGDC 20022002 Sony Computer Entertainment Europe

    CPU OptimisationsCPU Optimisations

    Emotion Engine Core

    FPU (Coprocessor 1) Vu0 (Coprocessor 2)

    16K Instruction Cache

    8K Data Cache 16K Scratch-Pad Memory

    Instruction Set

    64Bit MIPS III and some MIPS IV

    128Bit Multi-Media

  • 8/13/2019 PS2Optimisations

    33/39

    George Bain - PS2 Programming Optimisations 33AGDC 20022002 Sony Computer Entertainment Europe

    MultiMulti--Media InstructionsMedia Instructions

    128-Bit Multi-Media Instructions

    Parallel Processing 64 bits x2, 32 bits x4, 16 bits x8, 8 bits x16

    Image format conversions

    Sound decompressing

    Pack DMA packets

    Convert PACKED mode to REGLIST mode

    Smaller data, faster DMA transfers!

  • 8/13/2019 PS2Optimisations

    34/39

    George Bain - PS2 Programming Optimisations 34AGDC 20022002 Sony Computer Entertainment Europe

    Use of Data CacheUse of Data Cache

    Data Suitable for the Data Cache

    Data that is frequently read or writtenrepeatedly

    Data with a high degree of locality

    Dont use Data Cache for

    Data that gets used only once

    Big chunks of data larger than 8K

  • 8/13/2019 PS2Optimisations

    35/39

    George Bain - PS2 Programming Optimisations 35AGDC 20022002 Sony Computer Entertainment Europe

    Reduce Cache MissesReduce Cache Misses

    Prefetch instruction to load data beforehand

    Reduce the size of your code for I$ Use Uncached memory for data r/w only once

    Performance Counter Lib to measure misses

  • 8/13/2019 PS2Optimisations

    36/39

    George Bain - PS2 Programming Optimisations 36AGDC 20022002 Sony Computer Entertainment Europe

    Scratchpad MemoryScratchpad Memory

    16K of high-speed memory (access directly)

    2 dedicated DMA Channels (toSPR/fromSPR) SPR DMA provides best throughput

    100% Occupy and 85% Send

    Data Suitable for the SPR

    Frequently used data where speed is a priority

    Big chunks of data can be Double Buffered onSPR memory

  • 8/13/2019 PS2Optimisations

    37/39

    George Bain - PS2 Programming Optimisations 37AGDC 20022002 Sony Computer Entertainment Europe

    CD/DVD OptimisationsCD/DVD Optimisations

    Align destination buffer on 64 Bytes

    Increase performance by 25%! Combine files into a PAK file to reduce files

    Avoid seeking when you could be reading

    Load the most data you can per read

    Combine IOP modules and load into EE

  • 8/13/2019 PS2Optimisations

    38/39

    George Bain - PS2 Programming Optimisations 38AGDC 20022002 Sony Computer Entertainment Europe

    SummarySummary

    PA will push developers to the limit!

    Parallel Texture and Geometry Transfer DMA is flexible and very powerful!

    Take into consideration GS page sizes

    Vector Unit 0 and Scratchpad memory

    Check assembler output of generated code

  • 8/13/2019 PS2Optimisations

    39/39

    George Bain - PS2 Programming Optimisations 39AGDC 20022002 Sony Computer Entertainment Europe

    Further InformationFurther Information

    Contact Information

    SCEE Booth Exhibition Stand #9