Upload
ethan-weaver
View
230
Download
0
Tags:
Embed Size (px)
Citation preview
04/10/25 Parallel and Distributed Programming 1
Shared-memory Parallel Programming
Taura Lab M1Yuuki Horita
04/10/25 Parallel and Distributed Programming 2
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
04/10/25 Parallel and Distributed Programming 3
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
04/10/25 Parallel and Distributed Programming 4
Parallel Programming Model
Message Passing Model Talked by Imatake-kun just now
Shared Memory Model Memory is shared with all process
elements Multiprocessor (SMP, SunFire, …) DSM (Distributed Shared Memory)
Process elements can communicate each other through the shared memory
04/10/25 Parallel and Distributed Programming 5
Shared Memory Model
PE PE PE……
Memory
04/10/25 Parallel and Distributed Programming 6
Shared Memory Model
Simplicity not necessary to think about the location
of the computation data Fast communication (Multiprocessor)
not necessary to use networks in process communication
Dynamic load sharing the same reason as simplicity
04/10/25 Parallel and Distributed Programming 7
Shared Memory Parallel Programming Multi-thread programming
Pthreads OpenMP
Parallel Programming model for shared memory multiprocessor
04/10/25 Parallel and Distributed Programming 8
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
04/10/25 Parallel and Distributed Programming 9
Sample Sequential Program
…loop{
for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } }
}…
FDM (Finite Difference Method)
04/10/25 Parallel and Distributed Programming 10
Parallelization Procedure
Sequential Computation
Decomposition
Tasks
Assignment
Process Elements
Orchestration
Mapping
Processors
04/10/25 Parallel and Distributed Programming 11
Parallelize the Sequential Program
Decomposition
a task
…loop{
for (i=0; i<N; i++){ for (j=0; j<N; j++){
a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } }
}…
04/10/25 Parallel and Distributed Programming 12
Parallelize the Sequential Program
Assignment
PE
PE
PE
PE
Divide the tasks equally among process elements
04/10/25 Parallel and Distributed Programming 13
Parallelize the Sequential Program
Orchestration
PE
PE
PE
PE
need to communicate and to synchronize
04/10/25 Parallel and Distributed Programming 14
Parallelize the Sequential Program
Mapping
PE
PE
PE
PE
Multiprocessor
04/10/25 Parallel and Distributed Programming 15
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
04/10/25 Parallel and Distributed Programming 16
Multi-thread Programming
A process element is a thread cf. a process
Memory is shared among all threads generated from the same process
Threads can communicate with each other through shared memory
04/10/25 Parallel and Distributed Programming 17
Fork-Join Model
Fork
Join
Parallelized Section
Serialized Section
Serialized SectionProgram starts (Main Thread)
Main Thread creates new threads
Other threads join Main Thread
Main Thread continues processing
Main Thread
04/10/25 Parallel and Distributed Programming 18
Libraries for Thread Programming
Pthreads (C/C++) pthread_create() pthread_join()
Java Thread Thread Class / Runnable Interface
04/10/25 Parallel and Distributed Programming 19
Pthreads API (fork/join) pthread_t // thread variable pthread_create (
pthread_t *thread, // thread variable pthread_attr_t *attr, // thread attributes void *(*func)(void *), // start function void *arg // arguments of the function )
pthread_join ( pthread_t thread, // thread variable void **thread_return // the return value of the thread)
04/10/25 Parallel and Distributed Programming 20
Pthreads Parallel Programming#include …
void do_sequentially (void){ /* sequential execution */}
main (){ … do_sequentially(); // want to parallelize …}
04/10/25 Parallel and Distributed Programming 21
Pthreads Parallel Programming#include …#include <pthread.h>
void do_in_parallel (void){ /* parallel execution */}
main (){ pthread_t tid; … pthread_create(&tid, NULL, (void *)do_in_parallel, NULL); do_in_parallel(); pthread_join(tid); …}
04/10/25 Parallel and Distributed Programming 22
Exclusive Access Control
int sum = 0;
thread_A(){ sum++;}
thread_B(){ sum++;}
ThreadA ThreadB
a ← read sum
write a → sum
a = a + 1
a ← read sum
write a → sum
a = a + 1
0
0
1
1
sum = 0
sum = 1sum = 1
04/10/25 Parallel and Distributed Programming 23
Pthreads API (Exclusive Access Control)
Variable pthread_mutex_t
Initialization Function pthread_mutex_init( pthread_mutex_t *mutex, pthread_mutexattr_t *mutexattr )
Lock Function pthread_mutex_lock(pthread_mutex_t *mutex) pthread_mutex_unlock(pthread_mutex_t *mutex)
04/10/25 Parallel and Distributed Programming 24
Exclusive Access Control
int sum = 0;pthread_mutex_t mutex;pthread_mutex_init(&mutex, 0)
thread_A(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex);}
thread_B(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex);}
acquire lock
sum ++
release lock
sum ++
acquire lock
acquire lock
release lock
ThreadA ThreadB
04/10/25 Parallel and Distributed Programming 25
Pthreads API (Condition Variable) Variable
pthread_cond_t Initialization Function
pthread_cond_init( pthread_cond_t *cond, pthread_condattr_t *condattr )
Condition Function pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) pthread_cond_broadcast(pthread_cond_t *cond) pthread_cond_signal(pthread_cond_t *cond);
04/10/25 Parallel and Distributed Programming 26
Condition Wait
acquire lock
release lock
pthread_mutex_lock(&mutex)while( condition is not satisfied ){ pthread_cond_wait(&cond, &mutex);}pthread_mutex_unlock(&mutex);
Is condition satisfied?
release lock
sleep
pthread_cond_broadcastpthread_cond_signal
pthread_mutex_lock(&mutex)update_condition();pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex);
ThreadA
ThreadB
04/10/25 Parallel and Distributed Programming 27
Synchronization Synchronization in the sample program
n = 0;…pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex);
04/10/25 Parallel and Distributed Programming 28
Characteristics of Pthreads
troublesome to describe exclusive access control and synchronization
likely to be deadlocked still hard to parallelize a given
sequential program
04/10/25 Parallel and Distributed Programming 29
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
04/10/25 Parallel and Distributed Programming 30
What’s OpenMP? specification for a set of compiler
directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs Fortran ver1.0 API – Oct.1997 C/C++ ver1.0 API – Oct. 1998
04/10/25 Parallel and Distributed Programming 31
Background of OpenMP spread of shared memory multiprocessors need for common directives in shared
memory multiprocessors Each vendors had provided a different set of
directives need for simpler and more flexible interface
for developing parallel applications Pthread is hard for developers to describe
parallel applications
04/10/25 Parallel and Distributed Programming 32
OpenMP API
Directives Libraries Environment Variables
04/10/25 Parallel and Distributed Programming 33
Directives
C/C++
Fortran
#pragma omp directive_name …
!$OMP directive_name …
If user’s compiler doesn’t support openMP, the directive sentences are ignored and therefore the program can be executed as a sequential program.
04/10/25 Parallel and Distributed Programming 34
Parallel Region the part parallelized by some threads
#pragma omp parallel{ /* parallel region */}
create some threads at the beginning of the parallel region
join at the end of the parallel region
04/10/25 Parallel and Distributed Programming 35
Parallel Region (thread)
the number of thread omp_get_num_threads() :
get current # of threads omp_set_num_threads(int nthreads) :
set # of threads to nthreads $OMP_NUM_THREADS
thread ID (0 ~ # of threads-1)
omp_get_thread_num() : get thread ID
04/10/25 Parallel and Distributed Programming 36
Work Sharing Construction
specify the task assignment inside parallel region for
sharing iterations among threads sections
sharing sections among threads single
executing only by one thread
04/10/25 Parallel and Distributed Programming 37
Example of Work Sharing
for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}
omp_set_num_threads(4);
#pragma omp parallel#pragma omp forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}
omp_set_num_threads(4);
#pragma omp parallel forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}
Memory access conflict at i and j makes the computation slow
04/10/25 Parallel and Distributed Programming 38
Data Scoping Attributes specify the data scoping at parallel
construction or work sharing construction shared( var_list )
var_list is shared among threads private( var_list )
var_list is private reduction (operator : var_list )
ex) #pragma omp for reduction (+: sum) var_list is private in construction and reflected
after the construction
04/10/25 Parallel and Distributed Programming 39
Example of Data Scoping Attributes
omp_set_num_threads(4);
#pragma omp parallel for private(i, j)for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}
04/10/25 Parallel and Distributed Programming 40
Synchronization barrier
wait until all threads reach this line #pragma omp barrier
critical execute exclusively #pragma omp critical [(name)] { … }
atomic update a scalar variable atomically #pragma omp atomic
……
04/10/25 Parallel and Distributed Programming 41
Synchronization (Pthreads/OpenMP)
Synchronization in the sample program
<Pthreads>
pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex);
<OpenMP>
#pragma omp barrier
04/10/25 Parallel and Distributed Programming 42
Summary of OpenMP
Incremental parallelization of sequential programs
Portability Easier to implement parallel
application than Pthreads and MPI
04/10/25 Parallel and Distributed Programming 43
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
04/10/25 Parallel and Distributed Programming 44
Message Passing Model / Shared Memory Model
Message Passing Shared Memory
Architecture any SMP or DSM
Programming difficult easier
Performance good better (SMP)worse (DSM)
Cost less expensive very expensiveSunFire15K $4,140,830
04/10/25 Parallel and Distributed Programming 45
Thank you!