Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Teaching “Think Parallel” Four positive trends toward Parallel Programming, including advances in teaching/learning James Reinders, Intel April 2013
1
Entdecken Sie weitere interessante Artikel und News zum Thema auf all-electronics.de!
Hier klicken & informieren!
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 3
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 4
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
Parallel Programming is IMPORTANT These FOUR factors combine to help enable parallel programming to be on the rise more quickly.
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 10
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
• Industry-leading performance from advanced compilers
• Comprehensive libraries
• Parallel programming models
• Insightful analysis tools
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 11
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
Intel® Advisor XE
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 12
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
Intel® Composer XE
Intel® C/C++ Compiler Intel® Fortran Compiler
Intel® Math Kernel Library Intel® Integrated Performance Primitives
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 13
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
Intel® Inspector XE
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 14
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
Intel® VTune™ Amplifier XE
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 15
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
Parallel Programming is IMPORTANT Programming models are improving to be more productive.
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• Standard used by many parallel applications – Supported by every major compiler for Fortran and C
• OpenMP 4.0 standard – new mid-2013
!$omp parallel do do i=1,10 A(i) = B(i) * C(i) enddo !$omp end parallel
OpenMP* (Open Multi-Processing)
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• Support in Intel Compilers since 2011 • OpenMP 4.0 standard – new mid-2013
– Expect to see in all OpenMP compliant compilers!
#pragma omp simd reduction(+:val) reduction(+:val2) for(int pos = 0; pos < RAND_N; pos++) { float callValue=expectedCall(Sval,Xval,MuByT,VBySqrtT,l_Random[pos]); val += callValue; val2 += callValue * callValue; }
SIMD directives: Intel innovation
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• OLDER methods (like IVDEP directives, restrict keyword, etc.): – Keep adding directives, keywords, compile time switches, hints, etc.
hoping you code will vectorize – Pro: start with working code, each step of the way it continues to work – Pro: if your algorithm, as written, cannot safely vectorize – the
compiler will never vectorize it (hard for most programmers to see as different than the compiler is being too conservative)
– Con: you are trying to guess what additional information the compiler needs to be comfortable in order to vectorize
SIMD vs. Prior methods (you can now choose!)
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• OLDER methods (like IVDEP directives, restrict keyword, etc.): – Keep adding directives, keywords, compile time switches, hints, etc.
hoping you code will vectorize – Pro: start with working code, each step of the way it continues to work – Pro: if your algorithm, as written, cannot safely vectorize – the
compiler will never vectorize it (hard for most programmers to see as different than the compiler is being too conservative)
– Con: you are trying to guess what additional information the compiler needs to be comfortable in order to vectorize
• NEW (SIMD directives) method:
– Pro: Add the directive, and the compiler WILL vectorize the code. No time is spent fussing with the compiler and worrying it is too conservative.
– Con: right or wrong, it is vectorized – if wrong, you have debugging work to do to discover how to restructure your algorithm to vectorize without changing results. No more “help” from the compiler at noticing real problems with changing your algorithm to use vector instructions.
SIMD vs. Prior methods (you can now choose!)
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
OpenMP 4.0 offers industry convergence – for a true standard; Intel first to support!
Feature OpenACC LEO Desired Standard
Support for C and C++, Fortran ✔ ✔ ✔ Support single code base of herero-machine ✔ ✔ ✔ Overlap communication and computation ✔ ✔ ✔ Interoperate with MPI ✔ ✔ ✔ Interoperate with OpenMP ✔ ✔ Offload to GPU ✔ ✔ Offload to MIC Co-processor ✔ ✔ Ability to support all accelerators ✔ Ability to support all GPUs ✔ Ability to support all co-processors ✔ Proof of performance portability ✔ Support for nested parallelism ✔ ✔ User-managed memory consistency ✔ ✔ ✔ Multiple vendor support ✔ ✔ Restrict clause support ✔ Support for dynamic dispatch ✔ ✔ Parallel on/off separate from offload ✔ ✔ PGI, CAPS compiler support 2012 ✔ Cray compiler support soon ✔ Intel compiler support 2010* ✔ Broad standards body approval ✔ OpenMP 4.0 (late 2012) maybe * public product availability was 2012
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
threadingbuildingblocks.org
TBB for C++ scaling Most popular solution for C++ parallel programming
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
cilkplus.org
TBB has a “sister” Cilk™ Plus: • Help for C programmers • Involve compiler • Vectorization support
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Cilk™ Plus
23
Cilk™ Plus
Tasking
Cilk Keywords
Hyperobjects
Vectorization
Array Notation
SIMD Annotation
Elemental Functions
cilkplus.org
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Intel products, plus gcc and LLVM branches available
Intel® Cilk™ Plus
24
Cilk™ Plus
Tasking
Cilk Keywords
Hyperobjects
Vectorization
Array Notation
SIMD Annotation
Elemental Functions
cilkplus.org
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
< adopted by OpenMP 4.0 (mid-2013)
< adopted by OpenMP 4.0 (mid-2013)
Intel products, plus gcc and LLVM branches available
Intel® Cilk™ Plus
25
Cilk™ Plus
Tasking
Cilk Keywords
Hyperobjects
Vectorization
Array Notation
SIMD Annotation
Elemental Functions
cilkplus.org
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 26
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
Parallel Programming Model abstractions that yield portability, performance, productivity, usability, maintainability.
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 27
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
Parallel Programming is IMPORTANT These FOUR factors combine to help enable parallel programming to be on the rise more quickly.
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 28
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Xeon Phi™ Coprocessors Highly-parallel Processing for Unparalleled Discovery
29
Groundbreaking: differences
Up to 61 IA cores/1.1 GHz/ 244 Threads
Up to 8GB memory with up to 352 GB/s bandwidth
512-bit SIMD instructions
Linux operating system, IP addressable
Standard programming languages and tools
Leading to Groundbreaking results
Up to 1 TeraFlop/s double precision peak performance1 Enjoy up to 2.2x higher memory bandwidth than on an Intel® Xeon® processor E5 family-based server.2
Up to 4x more performance per watt than with an Intel® Xeon® processor E5 family-based server. 3
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Notes 1, 2 & 3, see backup for system configuration details.
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Xeon Phi™ Coprocessors: They’re So Much More General purpose IA Hardware leads to less idle time for your investment.
30
Intel® Xeon Phi™ Coprocessor Custom HW Acceleration
It’s a supercomputer on a chip
GPU ASIC FPGA
Restrictive architectures
Source: Intel Estimates
Restrictive architectures limit the ability for applications to use arbitrary nested parallelism, functions calls and threading models
Operate as a compute node
Run a full OS
Program to MPI
Run x86 code
Run restricted code Run offloaded code
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
vision span from few cores to many cores with consistent models, languages, tools, and techniques
31
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Multicore CPU Multicore CPU Intel® MIC architecture coprocessor
Source
Compilers Libraries,
Parallel Models
32
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Multicore CPU Multicore CPU Intel® MIC architecture coprocessor
Source
Compilers Libraries,
Parallel Models
Game Changer
“Unparalleled productivity… most of this software does not run on a GPU” - Robert Harrison, NICS, ORNL
“R. Harrison, “Opportunities and Challenges Posed by Exascale Computing - ORNL's Plans and Perspectives”, National Institute of Computational Sciences, Nov 2011”
33
Intel® Trace Analyzer and Collector
Intel® MPI Library
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Inspector XE, Intel® VTune™ Amplifier XE,
Intel® Advisor
Intel® C/C++ and Fortran Compilers w/OpenMP
Intel® MKL, Intel® Cilk Plus, Intel® TBB, and Intel® IPP
Intel® Parallel Studio XE
+ Intel® Trace Analyzer and Collector
+ Intel® MPI Library
34
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Inspector XE, Intel® VTune™ Amplifier XE,
Intel® Advisor
Intel® C/C++ and Fortran Compilers w/OpenMP
Intel® MKL, Intel® Cilk Plus, Intel® TBB, and Intel® IPP
Intel® Parallel Studio XE
Intel® Trace Analyzer and Collector
Intel® MPI Library
35
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
SMP on a chip…
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Xeon Phi™ Coprocessor: Increases Application Performance up to 10x
37
Application Performance Examples
* Xeon = Intel® Xeon® processor; * Xeon Phi = Intel® Xeon Phi™ coprocessor
Customer Application Performance Increase1 vs. 2S Xeon*
Los Alamos Molecular Dynamics Up to 2.52x
Acceleware 8th order isotropic variable velocity
Up to 2.05x
Jefferson Labs Lattice QCD Up to 2.27x
Financial Services
BlackScholes SP Monte Carlo SP
Up to 7x
Up to 10.75x
Sinopec Seismic Imaging Up to 2.53x2
Sandia Labs miniFE (Finite Element Solver)
Up to 2x3
Intel Labs Ray Tracing (incoherent rays)
Up to 1.88x4
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in ully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Customer Measured results as of October 22, 2012 Configuration Details: Please reference slide speaker notes. For more information go to http://www.intel.com/performance
Notes: 1. 2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor unless otherwise noted) 2. 2S Xeon* vs. 2S Xeon* + 2 Xeon Phi* (offload) 3. 8 node cluster, each node with 2S Xeon* (comparison is cluster performance with and without 1 Xeon Phi* per node) (Hetero) 4. Intel Measured Oct. 2012
• Intel® Xeon Phi™ coprocessor accelerates highly parallel & vectorizable applications. (graph above) • Table provides examples of such applications
640
1,729
0
500
1000
1500
2000
2S Intel® Xeon®Processor
1 Intel® Xeon Phi™ coprocessor
SGEMM (GF/s)
Synthetic Benchmark Summary (Intel® MKL) (5110P)
38
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Measured results as of October 26, 2012 Configuration Details: Please reference slide speaker notes.For more information go to http://www.intel.com/performance
Up to 2.7X
309
833
0
200
400
600
800
1000
2S Intel® Xeon®processor
1 Intel® Xeon Phi™ coprocessor
DGEMM (GF/s)
303
722
0
200
400
600
800
1000
2S Intel® Xeon®processor
1 Intel® Xeon Phi™ coprocessor
SMP Linpack (GF/s)
78
159 171
0
50
100
150
200
2S Intel®Xeon®
processor
1 Intel® Xeon Phi™
coprocessor
1 Intel® Xeon Phi™
coprocessor
STREAM Triad (GB/s)
Up to 2.7X Up to 2.3X Up to 2.1X
Notes 1. Intel® Xeon® Processor E5-2670 used for all SGEMM Matrix = 13824 x 13824 , DGEMM Matrix 7936 x 7936, SMP Linpack Matrix 30720 x 30720 2. Intel® Xeon Phi™ coprocessor 5110P (ECC on) with “Gold Release Candidate” SW stack SGEMM Matrix = 11264 x 11264, DGEMM Matrix 7680 x 7680, SMP Linpack Matrix 26872 x 28672
ECC O
n
ECC O
ff
85%
Effic
ient
82%
Effic
ient
71%
Effic
ient
Higher is Better Higher is Better Higher is Better Higher is Better
Coprocessor results: Benchmark run 100% on coprocessor, no help from Intel® Xeon® processor host (aka native)
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 39
Better Tools for Parallel
Programming
Better Parallel Models
Wildly more Hardware
Parallelism
Better Educated
Programmers
Lots of Parallelism is a big deal
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Picture worth many words
40
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
http://tinyurl.com/inteljames twitter @jamesreinders http://intel.com/software/mic
41
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Picture worth many words
42
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Books to Help “Think Parallel”
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Xeon Phi™ Coprocessor High Performance Programming, Jim Jeffers, James Reinders, (c) 2013, publisher: Morgan Kaufmann
It all comes down to PARALLEL PROGRAMMING ! (applicable to processors and Intel® Xeon Phi™ coprocessors both) Forward, Preface Chapters: 1. Introduction 2. High Performance Closed Track
Test Drive! 3. A Friendly Country Road Race 4. Driving Around Town:
Optimizing A Real-World Code Example
5. Lots of Data (Vectors) 6. Lots of Tasks (not Threads) 7. Offload 8. Coprocessor Architecture 9. Coprocessor System Software 10. Linux on the Coprocessor 11. Math Library 12. MPI 13. Profiling and Timing 14. Summary Glossary, Index
Available since February 2013.
This book belongs on the bookshelf of every HPC professional. Not only
does it successfully and accessibly teach us how to use and obtain high
performance on the Intel MIC architecture, it is about much more
than that. It takes us back to the universal fundamentals of high-
performance computing including how to think and reason about the
performance of algorithms mapped to modern architectures, and it puts into your hands powerful tools that
will be useful for years to come. —Robert J. Harrison
Institute for Advanced Computational Science, Stony Brook University
Learn more about this book:
lotsofcores.com
“© 2013, James Reinders & Jim Jeffers, book image used with permission
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Picture worth many words
48
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
It all comes down to PARALLEL PROGRAMMING ! (applicable to processors and Intel® Xeon Phi™ coprocessors both)
Forward, Preface
Chapters:
1.Introduction
2.High Performance Closed Track Test Drive!
3.A Friendly Country Road Race
4.Driving Around Town: Optimizing A Real-World Code Example
5.Lots of Data (Vectors)
6.Lots of Tasks (not Threads)
7.Offload
8.Coprocessor Architecture
9.Coprocessor System Software
10. Linux on the Coprocessor
11. Math Library
12. MPI
13. Profiling and Timing
14. Summary
Glossary, Index
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
This is a really great book…
I've been dreaming for a while of a modern accessible book that I could
recommend to my threading-deprived colleagues and
assorted enquirers to get them up to speed with the core concepts of
multithreading as well as something that covers all the major current
interesting implementations.
Finally I have that book.
—Martin Watt, Principal Engineer,
Dreamworks Animation
Structured Parallel Programming, Michael McCool, Arch Robison, James Reinders (c) 2012, publisher: Morgan Kaufmann
Teaches parallel programming using a new pattern-based approach. Extensive examples in Cilk Plus and TBB. Not about any specific hardware, but relevant to all. It’s about effective parallel programming. Great for teaching!
Learn more about this book:
parallelbook.com
Available since July 2012.
© 2012, Michael McCool, Arch Robison, James Reinders, book image used with permission
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Structured Parallel Programming, Michael McCool, Arch Robison, James Reinders (c) 2012, publisher: Morgan Kaufmann
Learn more about this book:
parallelbook.com
Available since July 2012 in English. February 2013 in Japanese.
© 2012, Michael McCool, Arch Robison, James Reinders, book image used with permission
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Teaching Parallelism • Patterns & our parallel programming tools • Map, Reduce:
– Dot product, Cilk Plus • Stencil, Recurrence:
– Forward seismic simulation, Cilk Plus • Pipeline:
– Compression, Cilk Plus and TBB
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Structured Programming with Patterns • Patterns are “best practices” for solving
specific problems. • Patterns can be used to organize your
code, leading to algorithms that are more scalable and maintainable.
• A pattern supports a particular algorithmic structure with an efficient implementation.
• Intel’s tools support a set of useful parallel patterns with low-overhead implementations.
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Structured Serial Patterns The following patterns are the basis of “structured programming” for serial computation: • Sequence • Selection • Iteration • Nesting • Functions • Recursion
• Random read • Random write • Stack allocation • Heap allocation • Objects • Closures
Compositions of structured serial control flow patterns can be used in place of unstructured mechanisms such as “goto.” Using these patterns, “goto” can (mostly)
be eliminated and the maintainability of software improved.
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Structured Parallel Patterns The following additional parallel patterns can be used for “structured parallel programming”:
• Superscalar sequence • Speculative selection • Map • Recurrence • Scan • Reduce • Pack/expand • Fork/join • Pipeline
• Partition • Segmentation • Stencil • Search/match • Gather • Merge scatter • Priority scatter • *Permutation scatter • !Atomic scatter
Using these patterns, threads and vector intrinsics can (mostly) be eliminated and the
maintainability of software improved.
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• Map invokes a function on every element of an index set.
• The index set may be abstract or associated with the elements of an array.
• Corresponds to “parallel loop” where iterations are independent.
Examples: gamma correction and thresholding in images; color space conversions; Monte Carlo sampling; ray tracing.
Map
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• Reduce combines every element in a collection into one using an associative operator:
x+(y+z) = (x+y)+z • For example: reduce can be
used to find the sum or maximum of an array.
• Vectorization may require that the operator also be commutative:
x+y = y+x
Examples: averaging of Monte Carlo samples; convergence testing; image comparison metrics; matrix operations.
Reduce
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• Stencil applies a function to neighbourhoods of an array.
• Neighbourhoods are given by set of relative offsets.
• Boundary conditions need to be considered.
Examples: image filtering including convolution, median, anisotropic diffusion
Stencil
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• Recurrence results from loop nests with both input and output dependencies between iterations
• Can also result from iterated stencils
Examples: Simulation including
fluid flow, electromagnetic, and financial PDE solvers, lattice QCD, sequence alignment and pattern matching
Recurrence
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• Pipeline uses a sequence of stages that transform a flow of data
• Some stages may retain state
• Data can be consumed and produced incrementally: “online”
Examples: image filtering, data compression and decompression, signal processing
Pipeline
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
Pipeline: Cilk Plus and TBB
66
parallel_pipeline ( ntoken, make_filter<void,T>( filter::serial_in_order, [&]( flow_control & fc ) -> T{ T item = f(); if( !item ) fc.stop(); return item; } ) & make_filter<T,U>( filter::parallel, g ) & make_filter<U,void>( filter:: serial_in_order, h ) );
Intel® TBB
S s; reducer_consume<S,U> sink ( &s, h ); ... void Stage2( T x ) { sink.consume(g(x)); } ... while( T x = f() ) cilk_spawn Stage2(x); cilk_sync;
Intel® Cilk™ Plus (special case)
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
This is a really great book…
I've been dreaming for a while of a modern accessible book that I could
recommend to my threading-deprived colleagues and
assorted enquirers to get them up to speed with the core concepts of
multithreading as well as something that covers all the major current
interesting implementations.
Finally I have that book.
—Martin Watt, Principal Engineer,
Dreamworks Animation
Structured Parallel Programming, Michael McCool, Arch Robison, James Reinders (c) 2012, publisher: Morgan Kaufmann
Teaches parallel programming using a new pattern-based approach. Extensive examples in Cilk Plus and TBB. Not about any specific hardware, but relevant to all. It’s about effective parallel programming. Great for teaching!
Learn more about this book:
parallelbook.com
Available since July 2012.
© 2012, Michael McCool, Arch Robison, James Reinders, book image used with permission
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
http://intel.com/software/mic
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
parallel programming from few to many cores with consistent models, languages, tools, and techniques
http://intel.com/software/mic
http://tinyurl.com/inteljames twitter @jamesreinders
69
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2013, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
Legal Disclaimer & Optimization Notice
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
72
4/16/201