35
SHARED-MEMORY PROGRAMMING 6 th week

SHARED-MEMORY PROGRAMMING 6 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SHARED-MEMORY PROGRAMMING 6 th week References Introduction

Embed Size (px)

Citation preview

SHARED-MEMORY PROGRAMMING6th week

-2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

SHARED-MEMORY PROGRAMMING

6th week References Introduction The ANSI X3H5 Shared-Memory Model The POSIX Threads Model The OpenMP Standard

-3- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

REFERENCES

Scalable Parallel Computing: Technology, Architecture and Programming, Kai Hwang and ZhiweiXu, ch12

Parallel Processing Course– Yang-Suk Kee([email protected])

School of EECS, Seoul National University

-4- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Introduction to Shared-Memory Programming Model

Thread(Process)

Thread(Process)

Thread(Process)

Thread(Process)

SystemSystem

X

read(X) write(X)

Processor Memory

Shared variable

Shared-Memory Model / Shared Address Space (SAS) Model

-5- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Introduction… (cont’d)

Naming– Any process can name any variable in shared space

Operations– Loads and stores, plus those needed for ordering

Simplest Ordering Model– Within a process/thread: sequential program order– Across threads: some interleaving (as in time-

sharing)– Additional orders through synchronization– Again, compilers/hardware can violate orders

without getting caught

-6- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

SYNCHORNIZATION

Mutual exclusion (locks)– Ensure certain operations on certain data can be

performed by only one process at a time– Room that only one person can enter at a time– No ordering guarantees

Event synchronization – Ordering of events to preserve dependences – e.g. producer —> consumer of data– 3 main types:

point-to-point global group

-7- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

NAMING AND OPEATIONS

Naming and operations in programming model can be directly supported by lower levels, or translated by compiler, libraries or OS

Example– Shared virtual address space in programming

model Hardware interface supports shared physical

address space – Direct support by hardware through v-to-p

mappings, no software layers

-8- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

NAMING AND OPERATIONS (cont’d)

Hardware supports independent physical address spaces– Can provide SAS through OS, so in system/user

interface v-to-p mappings only for data that are local remote data accesses incur page faults; brought in via

page fault handlers same programming model, different hardware

requirements and cost model

– Or through compilers or runtime, so above sys/user interface

shared objects, instrumentation of shared accesses, compiler support

-9- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

SHARED-MEMORY STANDARDS

No widely-accepted standard Three popular platform-independent Shared-

Memory standards are– X3H5 – OpenMP– POSIX Pthreads

-10- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

THE ANSI X3H5 MODEL

Established in 1993 Has greatly influencence on many

commercial shared-memory systems Defines one conceptual standard

programming model and 3 bindings for C, Fortran 77 and Fortran 90

-11- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

THE ANSI X3H5 MODEL (cont’d)

Main features– Parallelism Constructs– Parallel Blocks– Parallel Loop– Implicit Barrier– Support for thread Interaction and synchronization

-12- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PARALLELISM CONSTRUCTS

Is a pair of parallel and end parallel with the enclosed code

Program starts in sequential mode with one initial thread (base thread/ master thread)

When the program encounters a parallel, it switches to parallel mode by creating a number of children threads.

The team of master thread and children threads execute in parallel till an end parallel

After the end parallel, the program switches back to sequential mode (only base thread continues execution)

-13- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PARALLEL CONSTRUCTS IN AN X3H5 PROGRAM

Program mainAparalllel

Bpsectionssection Csection Dend

psectionspsingle Eend psingle

pdo i=1,6 F(i)end pdo no

waitG

end parallelH

End

executed by only the base thread

executed by every thread in the team (parallel mode)

executed by one team member

executed by another thread

executed by only one thread (sequential mode )

all threads share 6 iterations of the loop to execute

-14- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PARALLEL CONSTRUCTS IN AN X3H5 PROGRAM: ILLUSTRATION

Threads

Implicit barrier

Implicit barrier

Implicit barrier

Implicit barrier

no Implicit barrier

B B B

C D

E

P Q RA

F(1:2)

G G

F(3:4) F(5:6)

G

H

-15- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

OTHER CONSTRUCTS

Inside a parallel construct, there are– Work-sharing constructs

Parallel block Parallel loop (pdo…end pdo) A single process (psingle…end psingle)

– Other code to be duplicatedly executed by every thread in the team

Parallel Block– Consists of many sections (psections…end

psections)– Used to specify MPMD parallelism– Each section is to be executed by a team member

-16- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

OTHER CONSTRUCTS (cont’d)

Parallel Loop ( pdo … end pdo)– Used to specify SPMD parallelism– The same code is to be executed by all team

members

-17- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

OTHER FEATURES OF X3H5

Implicit Barrier– At parallel, end parallel, end psections, end pdo and end

psingle ( use no wait to avoid this)– Fence operation forces all memory acceses up to the barrier

point to be consistent Parallel and Work-sharing constructs can be nested Support for thread interaction

– shared/privated variable in a parallel construct– implicit and explicit barrirer– 4 types of synchornization objects: latch, lock, event and

ordinal Support for thread synchronization

– Lock/event synchornization– Critical region and ordinal objects

-18- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

THE POSIX THREADS (Pthreads) MODEL

Established by IEEE in 1995 Functionality and interface are similar to

those of Solaris Threads Defines a set of primitive routines to manage

and synchornize threads Uses mutex objects and conditional variables

for thread synchronization

-19- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

THE Pthreads MODEL (cont’d)

Thread management– pthread_create– pthread_exit– pthread_join– pthread_self

Thread synchornization primitives

–pthread_mutex_init

–pthread_mutex_destroy

–pthread_mutex_lock

–pthread_mutex_trylock

–pthread_mutex_unlock

–pthread_cond_init

–pthread_cond_destroy

–pthread_cond_wait

–pthread_cond_timedwait

–pthread_cond_signal

–pthread_cond_broadcast

-20- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

HELLO WORLD PROGRAM:PTHREAD VERSION

int main(void){ pthread_t thread[4]; pthread_attr_t attr; int arg[4] = {0,1,2,3}; int i; // setup joinable threads with // system scope pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); …..

….(cont’d)… //create N threads for(i=0; i<4; i++) pthread_create(&thread[i], &attr, thrfunc, (void*)&arg[i]); //wait for the N threads to finish for(i=0; i<4; i++) pthread_join(thread[i], NULL);}//end main

#include <pthread.h>#include <stdio.h>void* thrfunc(void* arg){ printf(“hello from thread %d\n”, *(int*)arg);}//end thrfunc

-21- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

THE OpenMP STANDARD

An Application Program Interface (API) to be used to explicitly direct multi-threaded, shared memory parallelism

Inherits many concepts from ANSI X3H5 model Three API components

– Compiler Directives – Runtime Library Routines – Environment Variables

Portable– APIs for C/C++ and Fortran – Multiple platforms: most Unix platforms and Windows NT

-22- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

THE OpenMP STANDARD (cont’d)

Standardized– Jointly proposed by a group of major computer

hardware and software vendors – Expected to become an ANSI standard

What does OpenMP stand for? – Open specifications for multi-processing

Collaborative work with interested parties from the hardware and software industry, government and academia

-23- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

OpenMP IS NOT…

Distributed memory parallel systems by itself Implemented identically by all vendors Guaranteed to make the most efficient use of

shared memory – There are no data locality constructs

-24- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

GOALS OF OpenMP

Standardization– Provide a standard among a variety of shared

memory architectures(platforms)– High-level interfaces to thread programming

Lean and Mean– A simple and limited set of directives for shared

address space programming– Just 3 or 4 directives are enough to represent

significant parallelism

-25- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

HELLO WORLD: OpenMP VERSION

#include <omp.h>

#include <stdio.h>

int main(void)

{

#pragma omp parallel

printf(“hello from thread %d\n”, omp_get_thread_num());

}

-26- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

GOALS OF OpenMP (cont’d)

Ease of use– Incrementally parallelize a serial program

Unlike all or nothing approach of message-passing

– Implement both coarse-grain and fine-grain parallelism

Portability– Fortran (77, 90, and 95), C, and C++– Public forum for API and membership

-27- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION: SEQUENTIAL VERSION

for (i=0; i<N; i++) {

for (j=0; j<N; j++) {

temp = 0;

for (k=0; k<N; k++)

temp += a[i][k] * b[k][j];

c[i][j] = temp;

}

}

-28- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION:OPENMP VERSION

Add directive#pragma omp parallel for private(temp), schedule(static)

for (i=0; i<N; i++) {

for (j=0; j<N; j++) {

temp = 0;

for (k=0; k<N; k++)

temp += a[i][k] * b[k][j];

c[i][j] = temp;

}

}

-29- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PROGRAMMING MODEL

Thread Based Parallelism– A shared memory process with multiple threads– Based upon multiple threads in the shared

memory programming paradigm Explicit Parallelism

– Explicit (not automatic) programming model– Offer the programmer full control over

parallelization

-30- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PROGRAMMING MODEL (cont’d)

Fork - Join Model– All OpenMP programs begin as a single sequential

process: the master thread– Fork at the beginning of parallel constructs

The master thread creates a team of parallel threads The statements enclosed by the parallel region construct

are executed in parallel

– Join at the end of parallel constructs The threads synchronize and terminate after completing

the statements in the parallel construct Only the master thread exists

-31- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

FORK-JOIN MODEL

-32- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PROGRAMMING MODEL (cont’d)

Compiler Directive Based– Parallelism is specified through the use of compiler

directives imbedded in C/C++ or Fortran source code

Nested Parallelism Support– Parallel constructs may include other parallel

constructs inside. – Implementation-dependent

Dynamic Threads– Alter the number of threads used to execute

parallel regions– Implementation-dependent

-33- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

GENERAL CODE STRUCTURE

#include <omp.h>main () { int var1, var2, var3; Serial code ... /*Beginning of parallel section. Fork a team of threads. Specify variable scoping */ #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads ... All threads join master thread and disband } Resume serial code }

-34- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

OPENMP COMPONENTS

Directives– Work-sharing constructs– Data environment clauses– Synchronization constructs

Runtime libraries Environment variables

-35- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

COMPARISON OF 5 SHARED-MEMORY PROGRAMMING STANDARD

Attribute X3H5 MPI Pthreads

HPF OpenMP

ScalableNo Yes Sometime

sYes Yes

Fotran binding Yes Yes No Yes Yes

C binding Yes Yes Yes No Planned

High level Yes No No Yes Yes

Performace oriented

No Yes No Yes Yes

Supports data parallelism

Yes No No Yes Yes

Portable Yes Yes Yes Yes Yes

Vendors supportNo Widely Unix SMP Widel

yStarting

Incremental parallelization

Yes No No No YesCourtesy: OpenMP Standards Board, 1997