13
Slide 1 of 13 Code Optimization and Performance Tuning Using Intel VTune In this session, you will learn to: Identify methods for improving the performance of multithreaded applications Objectives

10 intel v_tune_session_14

Embed Size (px)

Citation preview

Page 1: 10 intel  v_tune_session_14

Slide 1 of 13

Code Optimization and Performance Tuning Using Intel VTune

In this session, you will learn to:Identify methods for improving the performance of multithreaded applications

Objectives

Page 2: 10 intel  v_tune_session_14

Slide 2 of 13

Code Optimization and Performance Tuning Using Intel VTune

There are various metrics that enables you to determine the performance of the multithreaded application. They are as follows:

Computing the speedupDetermining the parallel efficiency Determining granularity Balancing load among multiple threads

Identifying Methods for Improving the Performance of Multithreaded Applications

Page 3: 10 intel  v_tune_session_14

Slide 3 of 13

Code Optimization and Performance Tuning Using Intel VTune

Speedup measures the time required for a parallel program execute versus the time the best serial code requires to accomplish the same task.The method to calculate the speedup of the application is as follows:Speedup = Serial Time/Parallel Time

Identifying Methods for Improving the Performance of Multithreaded Applications (Contd.)

Page 4: 10 intel  v_tune_session_14

Slide 4 of 13

Code Optimization and Performance Tuning Using Intel VTune

Parallel efficiency is a measure of how efficiently core resources are used during parallel computations. The method to calculate the parallel efficiency of an application is as follows: Efficiency=(Speedup/Number of Threads)*100%

Identifying Methods for Improving the Performance of Multithreaded Applications (Contd.)

Page 5: 10 intel  v_tune_session_14

Slide 5 of 13

Code Optimization and Performance Tuning Using Intel VTune

Granularity is often related to how the workload is balanced among multiple threads.It refers to the amount of work done in parallel task. To attain a good performance in a threaded application, it is important to select the right granularity for your application.

If granularity is fine, then performance can suffer from communication overhead. If granularity is too coarse, then performance can suffer from load imbalance.

Our aim should be to achieve the coarsest granularity possible, without creating imbalance between the threads.

Identifying Methods for Improving the Performance of Multithreaded Applications (Contd.)

Page 6: 10 intel  v_tune_session_14

Slide 6 of 13

Code Optimization and Performance Tuning Using Intel VTune

Load imbalance refers to the situation when unequal amount of work is distributed among multiple threads.The key objective of load balancing is to minimize the idle time for the threads.

Identifying Methods for Improving the Performance of Multithreaded Applications (Contd.)

Page 7: 10 intel  v_tune_session_14

Slide 7 of 13

Code Optimization and Performance Tuning Using Intel VTune

There are three main strategies to improve the performance of the application. They are as follows:

Balancing the I/O and computationImprove the threading modelImprove the computation efficiency

Using Intel VTune Performance Analyzer for Threaded Applications

Page 8: 10 intel  v_tune_session_14

Slide 8 of 13

Code Optimization and Performance Tuning Using Intel VTune

You can use Intel VTune Performance Analyzer on serial applications and analyze whether the serial code can be parallelized or not.The following can be the performance areas that may cause performance bottlenecks:

CPU bound processesMemory bound processesI/O bound processes

Using Intel VTune Performance Analyzer for Threaded Applications (Contd.)

► These processes are slow in operation. Threaded CPU-bound applications can increase the speed of the application to a great extend.

► These types of processes may use memory inefficiently or may have large number of cache misses.If tuned properly, memory bound applications may run much faster.

► These processes wait for synchronous I/O, formatted I/O, or when there is library or system-level buffering.

Page 9: 10 intel  v_tune_session_14

Slide 9 of 13

Code Optimization and Performance Tuning Using Intel VTune

VTune Analyzer enables you to determine whether the threading model of your multithreaded application is balanced or not.You can detect load imbalance in two ways:

View the amount of time taken by each threadView the CPU information

Using Intel VTune Performance Analyzer for Threaded Applications (Contd.)

Page 10: 10 intel  v_tune_session_14

Slide 10 of 13

Code Optimization and Performance Tuning Using Intel VTune

Problem Statement:Jim has developed a threaded application in C#, which generates a list of natural numbers and displays the prime, even, and odd numbers occurring within that range. He wants to analyze the performance of his application using the Counter Monitor Wizard of VTune Performance Analyzer. He also wants to get some advice for optimizing his application. Help Jim accomplish his task.

Activity: Analyzing the Processor Utilization

Page 11: 10 intel  v_tune_session_14

Slide 11 of 13

Code Optimization and Performance Tuning Using Intel VTune

SolutionTo analyze the performance of the application using counter monitor, Jim needs to perform the following tasks:

1. Configure counter monitor using the Counter Monitor Configuration wizard.

2. Analyze the results of counter monitor.

Activity: Analyzing the Processor Utilization (Contd.)

Page 12: 10 intel  v_tune_session_14

Slide 12 of 13

Code Optimization and Performance Tuning Using Intel VTune

In this session, you learned that: There are various metrics that enable you to determine the performance of the multithreaded application. These metrics are as follows:

Computing speedupDetermining parallel efficiencyDetermining granularityDetermining load balance of an application

Intel VTune Performance Analyzer can be used to improve the performance of the threaded application. The following can be the performance areas that may cause performance bottlenecks in your application:

CPU bound processesMemory bound processesI/O bound processes

Summary

Page 13: 10 intel  v_tune_session_14

Slide 13 of 13

Code Optimization and Performance Tuning Using Intel VTune

You can detect load imbalance in your application in two ways:View the amount of time taken by each threadView the CPU information

Summary