39
Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Embed Size (px)

Citation preview

Page 1: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Common C++ Performance Mistakes in Games

Pete IsenseeXbox Advanced Technology Group

Page 2: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

About the Data• ATG reviews code to find

bottlenecks and make perf recommendations– 50 titles per year– 96% use C++– 1 in 3 use “advanced” features like

templates or generics

Page 3: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Why This Talk Is Important• The majority of Xbox games are

CPU bound• The CPU bottleneck is often a

language or C++ library issue• These issues are not usually

specific to the platform

Page 4: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Format• Definition of the problem• Examples• Recommendation• For reference

– A frame is 17 or 33 ms (60fps / 30fps)– Bottlenecks given in ms per frame

Page 5: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Issue: STL• Game using std::list

– Adding ~20,000 objects every frame– Rebuilding the list every frame– Time spent: 6.5 ms/frame!– ~156K overhead (2 pointers per

node)– Objects spread all over the heap

Page 6: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

std::set and map• Many games use set/map as

sorted lists• Inserts are slow (log(N))• Memory overhead: 3 ptrs + color• Worst case in game: 3.8 ms/frame

Page 7: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

std::vector• Hundreds of push_back()s per

frame• VS7.1 expands vector by 50%• Question: How many reallocations

for 100 push_back()s?• Answer: 13!

(1,2,3,4,5,7,10,14,20,29,43,64,95)

Page 8: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Clearly, the STL is Evil

Page 9: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Use the Right Tool for the Job

• The STL is powerful, but it’s not free

• Filling any container is expensive• Be aware of container overhead• Be aware of heap fragmentation

and cache coherency• Prefer vector, vector::reserve()

Page 10: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

The STL is Evil, Sometimes• The STL doesn’t solve every problem• The STL solves some problems poorly• Sometimes good old C-arrays are the

perfect container• Mike Abrash puts it well:

– “The best optimizer is between your ears”

Page 11: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Issue: NIH Syndrome• Example: Custom binary tree

– Sorted list of transparent objects– Badly unbalanced– 1 ms/frame to add only 400 items

• Example: Custom dynamic array class– Poorer performance than std::vector– Fewer features

Page 12: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Optimizations that Aren’tvoid appMemcpy( void* d, const void* s, size_t b )

{

// lots of assembly code here ...

}

appMemcpy( pDest, pSrc, 100 ); // bottleneck

• appMemcpy was slower than memcpy for anything under 64K

Page 13: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Invent Only What You Need

• std::set/map more efficient than the custom tree by 10X– Tested and proven– Still high overhead

• An even better solution– Unsorted vector or array– Sort once– 20X improvement

Page 14: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Profile• Run your profiler

– Rinse. Repeat.– Prove the improvement.

• Don’t rewrite the C runtime or STL just because you can. There are more interesting places to spend your time.

Page 15: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Issue: Tool Knowledge• If you’re a programmer, you use

C/C++ every day• C++ is complex• CRT and STL libraries are complex• The complexities matter• Sometimes they really matter

Page 16: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

vector::clear• Game reused global vector in frame loop• clear() called every frame to empty the

vector• C++ Standard

– clear() erases all elements (size() goes to 0)– No mention of what happens to vector

capacity

• On VS7.1/Dinkumware, frees the memory• Every frame reallocated memory

Page 17: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Zero-Initializationstruct Array { int x[1000]; };

struct Container {

Array arr;

Container() : arr() { }

};

Container x; // bottleneck

• Costing 3.5 ms/frame• Removing : arr() speeds this by

20X

Page 18: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Know Thine Holy Standard• Use resize(0) to reduce container

size without affecting capacity• T() means zero-initialize PODs. Don’t

use T() unless you mean it.• Get a copy of the C++ Standard.

Really.– www.techstreet.com; search on 14882– Only $18 for the PDF

Page 19: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Issue: C Runtimevoid BuildScore( char* s, int n )

{

if( n > 0 )

sprintf( s, “%d”, n );

else

sprintf( s, “” );

}

• n was often zero• sprintf was a hotspot

Page 20: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

qsort• Sorting is important in games• qsort is not an ideal sorting

function– No type safety– Comparison function call overhead– No opportunity for compiler inlining

• There are faster options

Page 21: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Clearly, the CRT is Evil

Page 22: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Understand Your Options• itoa() can replace sprintf( s, “%d”, n

) • *s = ‘\0’ can replace sprintf( s, “” )• std::sort can replace qsort

– Type safe– Comparison can be inlined

• Other sorting options can be even faster: partial_sort, partition

Page 23: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Issue: Function Calls• 50,000-100,000 calls/frame is normal• At 60Hz, Xbox has 12.2M cycles/frame• Function call/return averages 20 cycles• A game calling 61,000 functions/frame

spends 10% CPU (1.7 ms/frame) in function call overhead

Page 24: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Extreme Function-ality• 120,000 functions/frame• 140,000 functions/frame• 130,000 calls to a single

function/frame (ColumnVec<3,float>::operator[])

• And the winner:– 340,000 calls per frame!– 9 ms/frame of call overhead

Page 25: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Beware Elegance• Elegance → levels of indirection →

more functions → perf impact• Use algorithmic solutions first

– One pass through the world– Better object rejection– Do AI/physics/networking less often

than once/frame

Page 26: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Inline Judiciously• Remember: inline is a suggestion• Try “inline any suitable” compiler

option– 15 to 20 fps– 68,000 calls down to 47,000

• Try __forceinline or similar keyword– Adding to 5 funcs shaved 1.5 ms/frame

• Don’t over-inline

Page 27: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Issue: for loops// Example 1: Copy indices to push buffer

for( DWORD i = 0; i < dwIndexCnt; ++i )

*pPushBuffer++ = arrIndices[ i ];

// Example 2: Initialize vector array

for( DWORD i = 0; i < dwMax; ++i )

mVectorArr[i] = XGVECTOR4(0,0,0,0);

// Example 3: Process items in world

for( itr i = c.begin(); i < c.end(); ++i )

Process( *i );

Page 28: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Watch Out For For• Never copy/clear a POD with a for

loop• std::algorithms are optimized; use

them

memcpy( pPushBuffer, arrIndices,

dwIndexCnt * sizeof(DWORD) );

memset( mVectorArr, 0, dwMax * sizeof(XGVECTOR4) );

for_each( c.begin(), c.end(), Process );

Page 29: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Issue: Exception Handling• Most games never throw• Most games never catch• Yet, most games enable EH• EH adds code to do stack unwinding

– A little bit of overhead to a lot of code– 10% size increase is common– 2 ms/frame in worst case

Page 30: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Disable Exception Handling

• Don’t throw or catch exceptions• Turn off the C++ EH compiler

option• For Dinkumware STL

– Define “_HAS_EXCEPTIONS=0”– Write empty _Throw and _Raise_handler; see

stdthrow.cpp and raisehan.cpp in crt folder– Add #pragma warning(disable: 4530)

Page 31: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Issue: Strings• Programmers love strings• Love hurts• ~7000 calls to stricmp in frame

loop– 1.5 ms/frame

• Binary search of a string table– 2 ms/frame

Page 32: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Avoid strings• String comparisons don’t belong in

the frame loop• Put strings in an table and

compare indices• At least optimize the comparison

– Compare pointers only– Prefer strcmp to stricmp

Page 33: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Issue: Memory Allocation• Memory overhead

– Xbox granularity/overhead is 16/16 bytes

– Overhead alone is often 1+ MB

• Too many allocations– Games commonly do thousands of

allocations per frame– Cost: 1-5 ms/frame

Page 34: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Hidden Allocations• push_back(), insert() and friends

typically allocate memory• String constructors allocate• Init-style calls often allocate• Temporary objects, particularly

string constants that convert to string objects

Page 35: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Minimize Per-Frame Allocations

• Use memory-friendly data structures, e.g. arrays, vectors

• Reserve memory in advance• Use custom allocators

– Pool same-size allocations in a single block of memory to avoid overhead

• Use the explicit keyword to avoid hidden temporaries

• Avoid strings

Page 36: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Other Tidbits• Compiler settings: experiment• dynamic_cast: just say no• Constructors: performance killers• Unused static array space: track this• Loop unrolling: huge wins, sometimes• Suspicious comments: watch out

– “Immensely slow matrix multiplication”

Page 37: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Wrap Up• Use the Right Tool for

the Job• The STL is Evil,

Sometimes• Invent Only What You

Need• Profile• Know Thine Holy

Standard• Understand Your Options

• Beware Elegance• Inline Judiciously• Watch Out For For• Disable Exception

Handling• Avoid Strings• Minimize Per-frame

Allocations

Page 38: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Call to Action: Evolve!• Pass the rubber chicken

– Share your C++ performance mistakes with your team

• Mentor junior programmers– So they only make new mistakes

• Don’t stop learning– You can never know enough C++

Page 39: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Questions• Fill out your feedback forms• Email: [email protected]• This presentation:

www.tantalon.com/pete.htm