25
병렬프로그래밍 김명신, Technical Evangelist, Microsoft

병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

병렬프로그래밍 김명신, Technical Evangelist, Microsoft

Page 2: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

먼저

Page 3: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

Phase/Trend Major Constraints 2x Efficient App Runs…

(1950-90s) Compute-constrained Processor 2x compute speed

2x users

(200x-) Mobile + bigger experiences

(e.g., tablet, ‘smartphone’)

Power (battery life)

Processor

2x battery life

2x compute speed

(2009-) Cloud / datacenter

(e.g., Office 365, Shazam, Siri)

Server HW (57%)

Power (31%) *

0.56x nodes

0.56x power

(2009-) Heterogeneous cores

(e.g., Cell, GPGPU)

Power (dark silicon)

Processor

0.5x power envelope

2x compute speed

(2020ish-) Moore’s End Processor 2x compute speed forever

* http://perspectives.mvdirona.com/2010/09/18/OverallDataCenterCosts.aspx

(1995ish-2007ish) Surplus local

compute + low UI innovation

(e.g., 2nd party LOB client WIMP apps) *WIMP(Windows, Icon, Menu, Pointing Device)

Programmer time n/a

Page 4: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

* http://perspectives.mvdirona.com/2010/09/18/OverallDataCenterCosts.aspx

(200x-) Mobile + bigger experiences

(e.g., tablet, ‘smartphone’)

Power (battery life)

Processor

2x battery life

2x compute speed

(2009-) Cloud / datacenter

(e.g., Office 365, Shazam, Siri)

Server HW (57%)

Power (31%) *

0.56x nodes

0.56x power

(2009-) Heterogeneous cores

(e.g., Cell, GPGPU)

Power (dark silicon)

Processor

0.5x power envelope

2x compute speed

(2020ish-) Moore’s End Processor 2x compute speed forever

Note: The final four are going to dominate for the rest of our careers.

Phase/Trend Major Constraints 2x Efficient App Runs…

Page 5: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

Distributed Parallel Telco network, Internet, DFS,

Cluster computing, Grid computing

Multi Processor, Multi Core, NUMA

Page 7: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 8: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

1

1 − 𝑃 +𝑃𝑆

P : Parallel Portion

S : Speed up

1

1 − 0.5 + 0.52

= 1.333 …

50% 구간을

2배 성능 향상시

Page 9: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 10: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 11: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 12: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

Performance Wizard Concurrency Visualizer

Page 13: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

How(Old Features)

Multithread Programming

OpenMP PPL / TPL

Page 14: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

How(VS2012 New Features)

Auto-Vectorization

Auto-Parallelization

C++ AMP

Page 15: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

const int N = 1000; float a[N], b[N]; // initialize a[i] = i, b[i] = 100 + i; for (int i = 0 ; i < N ; ++i) a[i] += b[i];

Page 16: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

By default, ON

SSE instruction in Intel / NEON instruction in ARM

Vector registers are called XMM0~XMM15

SSE 4.2 instruction set if available

To disable vectorization

#pragma loop(no_vector)

Page 17: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 18: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

Compiler evaluate the code to find loops that might benefit form parallelization

Use, /Qpar

To enable the auto-parallelization, manually

#pragma loop(hint_parallel(n))

Page 19: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 20: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 21: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 22: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 23: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger

Accelerated Massive Parallelism

C++, not C

Just one general language extension

Portable, mix & match hardware from any vender, one exe

General and future-proof

Open specification

Page 24: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger
Page 25: 병렬프로그래밍 - cuvix.co.kr...Phase/Trend Major Constraints 2x Efficient App Runs… (1950-90s) Compute-constrained Processor 2x compute speed 2x users (200x-) Mobile + bigger