36
Advanced Mobile Optimizations How to go to 60 fps after you have removed all Sleep calls ;-)

Advanced Mobile Optimizations.ppt

Embed Size (px)

DESCRIPTION

 

Citation preview

  • 1. Advanced Mobile Optimizations How to go to 60 fps after you have removed all Sleep calls ;-)

2. Disclaimer

  • The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies ofUnity

3. Optimization Mindset

  • you can't just make your game faster
    • there is no magic bullet
    • very specific stuff
      • not the same as scripting charachter

4. Optimization Mindset

  • not in specific order
  • know
  • think
  • measure

5. Optimization Mindset

  • You can't avoid any of that
    • no, really

6. Optimization Mindset

  • know + think = shoot in the dark
    • you just write code hoping for the best
  • know + measure = shoot in the dark
    • you are missing "understand" part
  • think + measure = shoot in the dark
    • you solve abstract problem, not real

7. Optimization Mindset:know + think

  • hardware is more complex then you think
    • highly parallel
    • deep pipelining
    • when you write asm - high-level already

8. Optimization Mindset:know + measure

  • knowledge is static
  • knowledge comes from the past
  • knowledge is general

9. Optimization Mindset:know + measure

  • qsort vs bubble sort
    • sure, qsort is faster
  • but you are missing the point
    • maybe radix?
    • maybe no need to sort?
    • maybe insertion?
    • parallel sorting network?

10. Optimization Mindset:think + measure

  • solving abstract problem
    • example: GPU
      • optimizing for RIVA TNT and GTX is different

11. Optimization Mindset

  • well, if you are missing two from the three
    • no comments

12. Know

  • your hardware
  • your data
    • knowing data is interleaved with think
    • we will talk more of it in "think"

13. Know your hardware

  • GPU
  • CPU
  • whatever
    • e.g. disk load speed

14. Know your hardware: GPU

  • Pipeline
    • meaning - slow step = slow everything
    • you are as slow as your bottleneck
  • Know your pipeline
  • Won't go into full pipeline spec
    • Resources section
  • Just common/biggest problems

15. Know your hardware:GPU Geometry

  • pre/post tnl cache
    • should use indexed geometry or not
  • cache hit rate
    • strips vs tri list
  • memory throughput
    • vertex size
  • fetch cost (memory)
    • pack attributes or not

16. Know your hardware:GPU Textures

  • Texture Cache
    • swizzle
    • compression
    • mip-maps
  • Biggest memory hog

17. Know your hardware:GPU Shaders

  • VertexProgram vs FragmentShader
    • balancing
    • attributes
  • Unified Shaders
    • load balancing
  • Precision
    • gles: highp/mediump/lowp
    • CG: float/half/fixed (iirc)

18. Know your hardware:GPU Rasterization

  • Fillrate (memory speed)
    • alpha
  • 2x2 samples (or more)
    • why GometryLOD matters

19. Know your hardware: CPU

  • Mobile = in-order RISC
    • for stupid code far worse than CISC
  • 2 main issues:
    • Memory speed
    • Computation speed

20. Know your hardware:CPU Memory

  • This is single most important factor
    • memory access far slower then computation
  • Latency vs Throughput
  • Caches
    • fast memory
    • your best friend
    • L1/L2/whatever
  • LHS

21. Know your hardware:CPU Computations

  • SIMD
    • better memory usage
    • better arithmetic usage (4 vals instead of 1)

22. Know your target hardware

  • There were general rules
  • But you are running on that particular piece of sh... hardware

23. Know your target hardware: PowerVR

  • TBDR
    • perfect hidden surface removal
    • Alpha-Test/discard
  • shader precision
  • unified shaders
  • Tegra / ATI-AMD / Adreno more common

24. Know your target hardware: ARM

  • VFP = FPU on steroids (not real SIMD)
    • scalar instructions at same speed as vectorized
  • NEON = SIMD
    • more registers
    • awesome load/store instructions
    • not as cool as Altivec but cool enough for mobiles

25. Know your target hardware: ARM

  • Conditional execution of most instructions
  • Fold shifts and rotates into the "data processing" instructions
    • load structure from array by index
  • Thumb + float = disaster
    • switch back and forth between Thumb mode and regular 32-bit mode

26. Know your hardware: Resources

  • RTR
  • lots of whitepapers:
    • powerVR (imgtech) tegra (nvidia) adreno (qualcomm)
    • AMD/ATI - basically the same as X360, but much smaller tiles
  • ARM dev center

27. Think

  • Think about your data
  • Think about your algorithms
  • Think about your constraints
  • Think about your hardware

28. Think Basics

  • CPU vs GPU
    • e.g. draw calls
      • pure CPU cost
  • CPU:
    • memory vs arithmetic
      • memory slower
  • GPU:
    • vprog vs fshader
    • memory vs arithmetic

29. Think Memory

  • fragmentation
  • data organization
    • AOS vs SOA
    • hot/cold split
  • data structures
    • linear vs random
    • array vs list
    • map vs hashtable
    • allocators

30. Think Constraints

  • GPU: will you see the difference?
    • really?
    • on mobile screen?
    • on that one small thingy in the corner?
  • CPU: will you need that?
    • e.g. physics in casual game?
  • Memory: will you need that?
    • will you need more then XXX actors?

31. Measure

  • you didn't optimize anything if you didn't measure difference
  • you can't optimize if you don't know what needs to be optimized
    • if you can't measure what takes time

32. Measure Tools

  • there are lots of tools
    • instruments (ios)
    • perfhud (tegra)
    • adreno profiler (qualcomm)
    • some more probably
  • Poor-man profiler
    • timers

33. Unity use case: random bits

  • Mobile shaders
    • specialized of usual built-ins
  • Skinning
    • full NEON/VFP impl
      • usually 10-15% of c-code time
        • and we are not done optimizing it ;-)
  • Rej's baking material to texture and coming soon BRDF baking to texture

34. Unity use case: random bits

  • Remote Profiler
    • run on target hw, data is transferred over wifi
    • collect in Editor and show pretty graphs ;-)
  • Sort alpha-test *after* opaque
  • check *lots* of extensions
  • LODs - almost done
  • Vertex Cache optimization - after LODs ;-)

35. Closing Words

  • Know hardware
  • Know data
  • Think data
  • Think constraints
  • Measure always
    • You better know earlier
  • You should be always optimizing

36. Questions