20
Performance Tuning Panotools - PTMender

Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Embed Size (px)

Citation preview

Page 1: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Performance TuningPanotools - PTMender

Page 2: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Layout

• Project Goal

• About Panotools

• Multi-threading

• SIMD, micro-architectural pitfalls

• Results

Page 3: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Project Goal

• Gaining performance on PanoTools

• This goal will be achieved through:

1. Multi-threading the application – using new multi-core machines which is the most significant performance promise.

2. Using SSE code.

3. Trying to find micro-architectural pitfalls and solving them – using VTune tuning assist.

Page 4: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

About Panotools

• Panotools is the cross-platform library behind Panorama Tools and many other GUI photo stitchers.

• Gaining much popularity as back-end engine for many panoramic applications.

• Selected to participate in the “Google Summer Of Code 2007”.

• We focused on the PTMender module of the library.

More details on Panotools on: http://panotools.sourceforge.net/

Page 5: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Multi-threading

• Two major approaches in multi-threading an existing single-threaded application:

1. Data decomposition – Dividing data to smaller parts and performing parallel work on each part.This is not always possible due to algorithmic dependencies between divided parts.

2. Functional decomposition – Dividing the work according to functional tasks. Each thread performs a unique predefined task.This is harder to perform and requires deep understanding of original algorithm.

Page 6: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Multi-threading – contd.

• Naturally we started looking for Data decomposition.

• In theory, because PTMender works on several files we could have processed a number of files simultaneously.

• Alternatively, we could have divided a single file and processed its parts simultaneously.

• In practice, using the Call Graph function in VTune, we noticed a native division of each file into independent parts on which the algorithm runs.

• Clearly, the chosen method was the later because it provides a better scalability.

Page 7: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

VTune - Call graph

Page 8: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Serial task

Page 9: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

The Parallel model

thread0

thread1

Page 10: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Multi-threading – contd.

• Data sharing – We created arrays of thread specific data structures.

And not:

Padding is used to create full cache line separation between array entries and prevent “false sharing”.

typedef struct thread_vars{Image result;TrformStr transform;int pad[16];

}thread_vars_t; thread_vars_t thread_private[NUM_THREADS]

Image result[NUM_THREADS]TrformStr transform[NUM_THREADS];

Page 11: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Thread Checker

Page 12: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Thread Checker - Debug

Page 13: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Noise

• Effects of data races were later obvious from output observations

Page 14: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Thread Checker – Debug - Contd.• Adding synchronization around critical sections

#ifdef PROTECT_WRITE// Request ownership of mutex.

dwWaitResult = WaitForSingleObject(

hTiffWriteMutex, // handle to mutex5000L); // five-second time-out interval

if (dwWaitResult == WAIT_OBJECT_0){__try { // Write to the database.

#endif

Page 15: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Thread Profiler

Page 16: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Thread Profiler – contd.

Page 17: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Image comparison

Page 18: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

SIMD & uArchitecture

• Unfortunately we did not find good opportunities for vectorizing.

• Main Micro-architectural issue is Mispredicted indirect calls. This cannot be solves since the panotools mechanism works allot with function pointers for flexibility

• FP activity is significant. We changed floating point model in compilation from “precise” to “fast” and reduced instruction count in benchmark to under 90% from original code generation

Page 19: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Results

Page 20: Performance Tuning Panotools - PTMender. Layout Project Goal About Panotools Multi-threading SIMD, micro-architectural pitfalls Results

Thank you