Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Visualizing and Finding Optimization Opportunities with Intel® Advisor Roofline featureIntel Software Developer Conference – Frankfurt, 2017
Klaus-Dieter Oertel, Intel
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Agenda
• Intel® Advisor for vectorization optimization
• What is the theoretical roofline model ?
• How is it implemented in Advisor ?
• Some examples
• Resources
Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Intel® Xeon® Processor
64-bit5100 series
5500 series
5600 series
E5-2600E5-2600
V2E5-2600
V3E5-2600
V4Platinum
8180
Core(s) 1 2 4 6 8 12 18 22 28
Threads 2 2 8 12 16 24 36 44 56
SIMD Width
128 128 128 128 256 256 256 256 512
Intel® Xeon Phi™ processor
Knights Landing
72
288
512
*Product specification for launched and shipped products available on ark.intel.com.
High performance software must be both Parallel (multi-thread, multi-process)
Vectorized
Changing Hardware Impacts SoftwareMore Cores More Threads Wider Vectors
3
Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Vectorize and Thread for Dramatic Performance GainsTogether they are more effective than either one alone
4
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Configurations at the end of this presentation.
2012E5-2600
Sandy Bridge
2013E5-2600 v2
Ivy Bridge
2010X5680
Westmere
2007X5472
Harpertown
2009X5570
Nehalem
2014E5-2600 v3
Haswell
2016E5-2600 v4 Broadwell
Vectorized & Threaded
Threaded
VectorizedSerial
187x
Intel® Xeon™
Processor:codenamed:
The Difference Is Growing With
Each New Generation of
Hardware
“Automatic” Vectorization Not EnoughExplicit pragmas and optimization often required
Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice5
Faster Vectorization Optimization: Vectorize where it will pay off most
Quickly ID what is blocking vectorization
Tips for effective vectorization
Safely force compiler vectorization
Optimize memory stride
The data and guidance you need: Compiler diagnostics +
Performance Data + SIMD efficiency
Detect problems & recommend fixes
Loop-Carried Dependency Analysis
Memory Access Patterns Analysis
Intel® Advisor – Vectorization AdvisorGet breakthrough vectorization performance
Optimize for AVX-512 with
or without access to AVX-512 hardware
http://intel.ly/advisor-xePart of Intel® Parallel Studio XE
Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
The Right Data At Your FingertipsGet all the data you need for high impact vectorization
Filter by which loops are vectorized!
Focus on hot loops
What vectorization issues do I have?
How efficient is the code?
What prevents vectorization?
Which Vector instructions are being used?
Trip Counts
Get Faster Code Faster!
6
Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice7
Find Effective Optimization StrategiesIntel Advisor: Cache-aware roofline analysis
Roofs Show Platform Limits
Memory, cache & compute limits
Dots Are Loops
Bigger, red dots take more time so optimization has a bigger impact
Dots farther from a roof have more room for improvement
Higher Dot = Higher GFLOPs/sec
Optimization moves dots up
Algorithmic changes move dots horizontally
Which loops should we optimize? A and G are the best candidates B has room to improve, but will have less impact E, C, D, and H are poor candidates
Roofs
Roofline tutorial video
New!
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Agenda
• Intel® Advisor for vectorization optimization
• What is the theoretical roofline model ?
• How is it implemented in Advisor ?
• Some examples
• Resources
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
What is the roofline model ?Do you know how fast you should run ?
• Comes from Berkeley
• Performance is limited by equations/implementation & code generation/hardware
• 2 hardware limitations
• PEAK Flops
• PEAK Bandwidth
• The application performance is bounded by hardware specifications
9
Gflop/s= 𝒎𝒊𝒏 𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑷𝑬𝑨𝑲𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑩𝑾 ∗ 𝑨𝑰
Arithmetic Intensity (Flops/Bytes)
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Platform PEAK FlopSHow many floating point operations per second
• Theoretical value can be computed by specificationExample with 2 sockets Intel® Xeon® Processor E5-2697 v2PEAK FLOP = 2 x 2.7 x 12 x 8 x 2 = 1036.8 Gflop/s
• More realistic value can be obtained by running Linpack=~ 930 Gflop/s on a 2 sockets Intel® Xeon® Processor E5-2697 v2
10
Number of sockets
Core Frequency
Number of cores
Number of single precisionelement in a SIMD register
1 port for addition, 1 for multiplication
Gflop/s= 𝒎𝒊𝒏 𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑷𝑬𝑨𝑲𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑩𝑾 ∗ 𝑨𝑰
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Platform PEAK bandwidthHow many bytes can be transferred per second
• Theoretical value can be computed by specificationExample with 2 sockets Intel® Xeon® Processor E5-2697 v2PEAK BW = 2 x 1.866 x 8 x 4 = 119 GB/s
• More realistic value can be obtained by running Stream=~ 100 GB/s on a 2 sockets Intel® Xeon® Processor E5-2697 v2
11
Number of socketsMemory Frequency
Byte per channel
Number of mem channels
Gflop/s= 𝒎𝒊𝒏 𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑷𝑬𝑨𝑲𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑩𝑾 ∗ 𝑨𝑰
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Drawing the RooflineDefining the speed of light
12
Gflops/s
AI [Flop/B]
1036
2 sockets Intel® Xeon® Processor E5-2697 v2Peak Flop = 1036 Gflop/sPeak BW = 119 GB/s
Gflop/s= 𝒎𝒊𝒏 𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑷𝑬𝑨𝑲𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑩𝑾 ∗ 𝑨𝑰
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Drawing the RooflineDefining the speed of light
13
Gflops/s
AI [Flop/B]
1036
2 sockets Intel® Xeon® Processor E5-2697 v2Peak Flop = 1036 Gflop/sPeak BW = 119 GB/s
Gflop/s= 𝒎𝒊𝒏 𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑷𝑬𝑨𝑲𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑩𝑾 ∗ 𝑨𝑰
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Drawing the RooflineDefining the speed of light
14
Gflops/s
AI [Flop/B]8.7
1036
2 sockets Intel® Xeon® Processor E5-2697 v2Peak Flop = 1036 Gflop/sPeak BW = 119 GB/s
Gflop/s= 𝒎𝒊𝒏 𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑷𝑬𝑨𝑲𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑩𝑾 ∗ 𝑨𝑰
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
What is the performance boundary?Manual way to do it
• Manual counting on matrix/matrix multiplication
• # add = N * N * N #Read = 3 * N * N * 4 bytes
# mul = N * N * N #Write = N * N * 4 bytes
• 𝐴𝐼 =2𝑁3
16𝑁2 =1
8𝑁
15
for(i=0; i<N; i++)for(j=0; j<N; j++)
for(k=0; k<N; k++)c[i][j] = c[i][j] + a[i][k] * b[k][j]
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Compute the maximum performanceBW * Arithmetic Intensity
16
2 sockets Intel® Xeon® Processor E5-2697 v2Peak Flop = 1036 Gflop/sPeak BW = 119 GB/s
Gflops/s
AI [Flop/B]8.7
1036
For sgemmAI = 1/8 NIf N = 8, AI = 1
1
119
Gflop/s= 𝒎𝒊𝒏 𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑷𝑬𝑨𝑲𝑷𝒍𝒂𝒕𝒇𝒐𝒓𝒎 𝑩𝑾 ∗ 𝑨𝑰
If N = 8, sgemm should not be able to perform better than 119 GFlop/s
on a 2 sockets Ivy Bridge
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
And NOW?How to get better performance?
17
Gflops/s
8.7
1036
1
119
Optimize memory access
Vectorization + threading
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Agenda
• Intel® Advisor for vectorization optimization
• What is the theoretical roofline model ?
• How is it implemented in Advisor ?
• Some examples
• Resources
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice. 19
Intel® Advisor implements a Cache Aware Roofline Model (CARM)
- “Algorithmic”, “Cumulative (L1+L2+LLC+DRAM)” traffic-based
- Invariant for the given code / platform combination
How does it work ?
- Counts every memory movement
- Bytes and Flops -> Instrumentation
- Time -> Sampling
CARM: Cache aware Roofline ModelDRAM: DRAM aware Roofline ModelTRAM: Theoretical Roofline Model Typically AI_CARM < AI_DRAM < AI_TRAM
Roofline in Intel® AdvisorThe cache aware roofline model
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice. 20
Purely Cache/DRAM-bound
Purely compute bound
Understanding the roofline in Intel® Advisor
Intel® Advisor for vectorization optimization
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Agenda
• Intel® Advisor for vectorization optimization
• What is the theoretical roofline model ?
• How is it implemented in Advisor ?
• Some examples
• Resources
Roofline model and compiler optimizations
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Roofline model and optimizations
• Matrix/matrix addition
• Let’s have a look at the roofline model
void addition(float* a, float* b, float* c, int size){
int i, j;
for(j=0; j<size; j++){
for(i=0; i<size; i++){
c[i*size + j] = a[i*size + j]+b[i*size + j];
}
}
}
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Roofline model and optimizations
• Compilation with –O1
Very poor performance, far from the DRAM roofline !
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Roofline model and optimizations
• Lets look at the Memory Access Pattern Analysis
Constant stride found !!! Looks like loops should be reversed
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Roofline model and optimizations
• Compilation with –O3
Vectorization of Loop carried dependency
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of loop carried dependency
• Loop carried dependency
void addition(float* a, float* b, float* c, int size){
int i, j;
for(i=0; i<size; i++){
for(j=pad; j<size; j++){
c[i*size + j] = a[i*size + j]+c[i*size + j-pad];
}
}
}
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of loop carried dependency
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of loop carried dependency
• Loop carried dependency
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of loop carried dependency
• Loop carried dependency
void addition(float* a, float* b, float* c, int size){
int i, j;
for(i=0; i<size; i++){
#pragma omp simd safelen(4)
for(j=pad; j<size; j++){
c[i*size + j] = a[i*size + j]+c[i*size + j-pad];
}
}
}
In this case, we assume that pad >=4
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of loop carried dependency
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of loop carried dependency
Safelen was 4
Vectorization of function call
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of a function call with OMP
• Function call inside of a loop can prevent the vectorization
for(int i=0; i<SIZE; i++){
for(int j=0; j<SIZE; j++){
single_line_addition(a, c, i*SIZE + j);
}
}
//function is defined in another compilation unit
void single_line_addition(float* a, float* c, int ind){
c[ind] = a[ind]+c[ind];
}
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of a function call with OMP
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of a function call with OMP
Advisor tells you that this pattern can be a problemand proposes a solution
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of a function call with OMP
• Omp declare simd
for(int i=0; i<SIZE; i++){
#pragma omp simd
for(int j=0; j<SIZE; j++){
single_line_addition(a, c, i*SIZE + j);
}
}
#pragma omp declare simd uniform(a, c) linear(ind)
void single_line_addition(float* a, float* c, int ind);
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of a function call with OMP
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Vectorization of a function call with OMP
Before
After
Intel® Advisor for vectorization optimization
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Agenda
• Intel® Advisor for vectorization optimization
• What is the theoretical roofline model ?
• How is it implemented in Advisor ?
• Some examples
• Resources
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
References
Roofline model proposed by Williams, Waterman, Patterson: http://www.eecs.berkeley.edu/~waterman/papers/roofline.pdf
“Cache-aware Roofline model: Upgrading the loft” (Ilic, Pratas, Sousa, INESC-ID/IST, Thec Uni of Lisbon) http://www.inesc-id.pt/ficheiros/publicacoes/9068.pdf
Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Resources
Intel® Advisor – Threading Design & Prototyping:
Product page – overview, features, FAQs, support…
Training materials – movies, tech briefs, documentation…
Evaluation guides – step by step walk through
Reviews
Additional Analysis Tools:
Intel® VTune Amplifier – performance profiler
Intel® Inspector - memory and thread checker / debugger
Additional Development Products:
Intel® Software Development Products
Intel® Distribution for Python* – accelerated Python distribution
43
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Download a free, 30-day trial of
Intel® Parallel Studio XE 2018 today
software.intel.com/en-us/intel-parallel-studio-xe
And Don’t Forget…
Code that performs and outperforms
To fill out the evaluation survey via a URL that will be provided at the end of the day
OR
Watch your email for a link to the survey
P.S.Everyone who fills out the survey will receive a personalized certificate indicating completion of the training!
Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance ofthat product when combined with other products.
Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
4545
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. For more complete information about compiler optimizations, see our Optimization Notice.
Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice47
Advisor usingGCC, Microsoft or Intel Compiler:
Finds un-vectorized loops
Analyze SIMD, AVX, AVX2, AVX-512
Dependency Analysis – safely force vectorization with a pragma
Memory Access Pattern Analysis -optimize stride and caching
Trip Counts
FLOPS metrics with masking
Roofline Analysis – balance memory vs. compute optimization
Intel Compiler Adds:
Usually better optimized vectorization
Better compiler optimization messages
Intel Advisor with Intel Compiler Adds:
Finds inefficiently vectorized loops and estimates performance gain
Compiler optimization report messages displayed on the source
More tips for improving vectorization
Optimize for AVX-512 even without AVX-512 hardware
Advisor works with GCC and Microsoft CompilersAdds bonus capabilities with the Intel Compiler
Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Configurations for 2007-2016 Benchmarks
Platform
Unscaled Core
FrequencyCores/S
ocket Num
SocketsL1 Data Cache
L2 Cache
L3 Cache Memory
Memory Frequency
Memory Access
H/W Prefetchers
EnabledHT
EnabledTurbo
Enabled C StatesO/S
NameOperating
SystemCompiler Version
Intel® Xeon™5472 Processor
3.0 GHZ 4 2 32K 6 MB None 32 GB 800 MHz UMA Y N N DisabledFedora
203.11.10-301.fc20
icc version 14.0.1
Intel® Xeon™ X5570 Processor
2.9 GHZ 4 2 32K 256K 8 MB 48 GB 1333 MHz NUMA Y Y Y DisabledFedora
203.11.10-301.fc20
icc version 14.0.1
Intel® Xeon™ X5680 Processor
3.33 GHZ 6 2 32K 256K 12 MB 48 MB 1333 MHz NUMA Y Y Y DisabledFedora
203.11.10-301.fc20
icc version 14.0.1
Intel® Xeon™ E52690 Processor
2.9 GHZ 8 2 32K 256K 20 MB 64 GB 1600 MHz NUMA Y Y Y DisabledFedora
203.11.10-301.fc20
icc version 14.0.1
Intel® Xeon™ E5 2697v2 Processor
2.7 GHZ 12 2 32K 256K 30 MB 64 GB 1867 MHz NUMA Y Y Y DisabledRHEL
7.13.10.0-
229.el7.x86_64icc version
14.0.1
Intel® Xeon™ E52600v3 Processor
2.2 GHz 18 2 32K 256K 46 MB 128 GB 2133 MHz NUMA Y Y Y DisabledFedora
203.13.5-202.fc20
icc version 14.0.1
Intel® Xeon™ E52600v4 Processor
2.3 GHz 18 2 32K 256K 46 MB 256 GB 2400 MHz NUMA Y Y Y DisabledRHEL
7.03.10.0-123. el7.x86_64
icc version14.0.1
Intel® Xeon™ E52600v4 Processor
2.2 GHz 22 2 32K 256K 56 MB 128 GB 2133 MHz NUMA Y Y Y DisabledCentOS
7.23.10.0-327. el7.x86_64
icc version14.0.1
Platform Hardware and Software Configuration
48