Upload
vivek-chan
View
95
Download
0
Embed Size (px)
Citation preview
Slide 1 of 13
Code Optimization and Performance Tuning Using Intel VTune
In this session, you will learn to:Identify methods for improving the performance of multithreaded applications
Objectives
Slide 2 of 13
Code Optimization and Performance Tuning Using Intel VTune
There are various metrics that enables you to determine the performance of the multithreaded application. They are as follows:
Computing the speedupDetermining the parallel efficiency Determining granularity Balancing load among multiple threads
Identifying Methods for Improving the Performance of Multithreaded Applications
Slide 3 of 13
Code Optimization and Performance Tuning Using Intel VTune
Speedup measures the time required for a parallel program execute versus the time the best serial code requires to accomplish the same task.The method to calculate the speedup of the application is as follows:Speedup = Serial Time/Parallel Time
Identifying Methods for Improving the Performance of Multithreaded Applications (Contd.)
Slide 4 of 13
Code Optimization and Performance Tuning Using Intel VTune
Parallel efficiency is a measure of how efficiently core resources are used during parallel computations. The method to calculate the parallel efficiency of an application is as follows: Efficiency=(Speedup/Number of Threads)*100%
Identifying Methods for Improving the Performance of Multithreaded Applications (Contd.)
Slide 5 of 13
Code Optimization and Performance Tuning Using Intel VTune
Granularity is often related to how the workload is balanced among multiple threads.It refers to the amount of work done in parallel task. To attain a good performance in a threaded application, it is important to select the right granularity for your application.
If granularity is fine, then performance can suffer from communication overhead. If granularity is too coarse, then performance can suffer from load imbalance.
Our aim should be to achieve the coarsest granularity possible, without creating imbalance between the threads.
Identifying Methods for Improving the Performance of Multithreaded Applications (Contd.)
Slide 6 of 13
Code Optimization and Performance Tuning Using Intel VTune
Load imbalance refers to the situation when unequal amount of work is distributed among multiple threads.The key objective of load balancing is to minimize the idle time for the threads.
Identifying Methods for Improving the Performance of Multithreaded Applications (Contd.)
Slide 7 of 13
Code Optimization and Performance Tuning Using Intel VTune
There are three main strategies to improve the performance of the application. They are as follows:
Balancing the I/O and computationImprove the threading modelImprove the computation efficiency
Using Intel VTune Performance Analyzer for Threaded Applications
Slide 8 of 13
Code Optimization and Performance Tuning Using Intel VTune
You can use Intel VTune Performance Analyzer on serial applications and analyze whether the serial code can be parallelized or not.The following can be the performance areas that may cause performance bottlenecks:
CPU bound processesMemory bound processesI/O bound processes
Using Intel VTune Performance Analyzer for Threaded Applications (Contd.)
► These processes are slow in operation. Threaded CPU-bound applications can increase the speed of the application to a great extend.
► These types of processes may use memory inefficiently or may have large number of cache misses.If tuned properly, memory bound applications may run much faster.
► These processes wait for synchronous I/O, formatted I/O, or when there is library or system-level buffering.
Slide 9 of 13
Code Optimization and Performance Tuning Using Intel VTune
VTune Analyzer enables you to determine whether the threading model of your multithreaded application is balanced or not.You can detect load imbalance in two ways:
View the amount of time taken by each threadView the CPU information
Using Intel VTune Performance Analyzer for Threaded Applications (Contd.)
Slide 10 of 13
Code Optimization and Performance Tuning Using Intel VTune
Problem Statement:Jim has developed a threaded application in C#, which generates a list of natural numbers and displays the prime, even, and odd numbers occurring within that range. He wants to analyze the performance of his application using the Counter Monitor Wizard of VTune Performance Analyzer. He also wants to get some advice for optimizing his application. Help Jim accomplish his task.
Activity: Analyzing the Processor Utilization
Slide 11 of 13
Code Optimization and Performance Tuning Using Intel VTune
SolutionTo analyze the performance of the application using counter monitor, Jim needs to perform the following tasks:
1. Configure counter monitor using the Counter Monitor Configuration wizard.
2. Analyze the results of counter monitor.
Activity: Analyzing the Processor Utilization (Contd.)
Slide 12 of 13
Code Optimization and Performance Tuning Using Intel VTune
In this session, you learned that: There are various metrics that enable you to determine the performance of the multithreaded application. These metrics are as follows:
Computing speedupDetermining parallel efficiencyDetermining granularityDetermining load balance of an application
Intel VTune Performance Analyzer can be used to improve the performance of the threaded application. The following can be the performance areas that may cause performance bottlenecks in your application:
CPU bound processesMemory bound processesI/O bound processes
Summary
Slide 13 of 13
Code Optimization and Performance Tuning Using Intel VTune
You can detect load imbalance in your application in two ways:View the amount of time taken by each threadView the CPU information
Summary