Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Essential Performance
Advanced Performance
Distributed Performance
Efficient Performance
Building parallel application using Guided Auto Parallelization
Om P SachanIntel Compiler and Languages
1
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Optimization Notice
2
Optimization Notice
Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors.
Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.
Notice revision #20110307
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Agenda
• Introduction to Guided Auto-parallelization.• Run Guided Auto-parallelization.• Analyze Guided Auto-parallelization reports.• Implement Guided Auto-parallelization
recommendations.
Intel Confidential
3
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners. 44/6/2010
Parallelization in Mainstream
• Performance gains coming from more cores per die– Increasing clock frequencies play a smaller role
• Exposes parallelism to the programmer• Every computer is a parallel computer
– Implies most programs must execute in parallel• Parallelism successful in HPC, servers, graphics, ...
– Not widespread in the client domain • Client apps focused on
– Quality user experience– Scalability – Programmer productivity (critical for time-to-market)
Development of multi-threaded apps is hard
Need for a low-cost and effective way of threading apps
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners. 54/6/2010
Parallelization in Mainstream
• Requires multi-pronged approach:– Simpler parallel programming models and abstractions– Domain-specific parallel libraries– Compiler auto-parallelization, auto-vectorization, and
data-transformation– Advise user on how to parallelize
– Good debugging tools– Easy-to-use tools for performance analysis
• Tradeoffs between scalability and productivity
Compiler can play an important role in enabling parallelism
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners. 64/6/2010
Workflow with Compiler as a Tool
Compiler Application Source
C/C++/Fortran
ApplicationBinary
+ Opt Reports
Identify hotspots, problems
Performance Tools
Simplifies programmer effort in application tuning
Application Source + Hotspots
Compiler in advice-
mode
Advice messages
ModifiedApplication
Source
Compiler (extra
options)
ImprovedApplication
Binary
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Confidential
74/6/2010
Compiler as a Tool
• Use compiler as a tool to give selective advice • Initially targets:
– Automatic parallelization of loop-nests– Automatic vectorization of inner-loops– Data transformation suggestions
• Programmer writes serial code – then follows the compiler advice to assert new properties– Does not require a lot of extra time and effort from user
• Code remains performance-portable• Programmer reasons about application properties• Tool based on expertise of “common pitfalls”
– Conservative disambiguation assumptions– Compiler assumes upper-bound is changing inside loop– ...
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners. 84/6/2010
How it Works• Targeted for Mainstream and HPC Users• Advice may involve
– suggestions for source-change– adding pragmas – adding new options
• Simple source changes that assert new properties – Add a new pragma for loop if semantics are satisfied– Use a local-variable for the upper-bound of a loop– Initialize scalar variable unconditionally at top of loop– Reorder fields of a structure (or split into two)
• Desired behavior– Each advice is specific using source-level variable names – User does semantic analysis – apply or reject each advice– Advice should be as localized as possible– Following the advice should result in better optimizations
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Confidential
10
Activity 1
Prepare and run Sample code
Use lab document
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Confidential
114/6/2010
Usage Model
• Two main usage models:– Users compiling with auto-parallelization enabled– Users compiling with no auto-parallelization – but still can gain
from improved vectorization• User can specify regions of a file or routine that are considered
“hot” – Advice will be restricted to the hot region– Default is to provide advice on entire compilation-unit
• Under guide-mode, no executable-code generated– Only output is a set of advice messages
• User not required to use advanced options (IPO, PGO), but advice may change based on options
• User may apply all (or a subset) of the advice – Recompile in normal-mode enables better optimizations
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners. 124/6/2010
Usage Model (contd.)
• Advice targeted only for improving application perf– Use tool during the perf-tuning part of the software
development cycle• Each advice has a “VERIFY” part
– User is responsible for checking whether it is “safe” to apply each suggestion
• User not required to use adv options (IPO, PGO)– When IPO is ON in guide-mode, advice will get emitted as part
of link-step• There may be multiple msgs targeting same loop
– User has to apply ALL to get desired optimization• Default debug mode generates no GAP messages
– /Zi implies /Od, override by adding /O2 explicitly
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners. 134/6/2010
Limitations• User may have to deal with lots of messages
– Duplicate messages– If no hot region is specified
• User is responsible for semantic verification–possibility of bugs– Adding an ivdep pragma in a loop is an assertion by the user– May lead to errors if user is not diligent with the verification– Good documentation with examples can help mitigate this
• More vector/par-loops – does not always guarantee perf gains• Tool does not guide the user on how to write parallel code• Not a general purpose mechanism to achieve maximum perf
– Turning on GAP will not vectorize EVERY loop– Only a subset where compiler can do an intelligent workaround
Not a panacea for all problems related to parallelization
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Confidential
144/6/2010
How to Use GAP
• Targeting Windows and Linux (IA32 & Intel64)• With normal options for the app (-O2 and above), add:
– -Qguide:3 (Mainstream) – -Qguide:4 (HPC)
• No code generation in gap-mode (no executable generated)• Can be used with and without –Qparallel option
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Confidential
15
Activity 2
Implementing Guided Auto-parallelization Recommendations, use sample code
Use lab document
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Intel Confidential
16
Summary
• Learned Guided Auto-parallelization.
• Analyze Guided Auto-parallelization reports.
• Implemented Guided Auto-parallelization recommendations.
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners. 17
Intel Confidential
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Optimization Notice
18
Optimization Notice
Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors.
Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.
Notice revision #20110307
Intel Confidential
Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Legal Disclaimer
19
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products.
BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.*Other names and brands may be claimed as the property of others.
Copyright © 2011. Intel Corporation.
http://intel.com/software/products
Intel Confidential