Author
brook-mccoy
View
214
Download
0
Tags:
Embed Size (px)
Parallel Programming with MPIProf. Sivarama DandamudiSchool of Computer ScienceCarleton University
IntroductionProblemLack of a standard for messagepassing routinesBig issue: Portability problemsMPI defines a core set of library routines (API) for message passing (more than 125 functions in total!!)Several commercial and public domain implementationsCray, IBM, IntelMPI-CH from Argonne National LaboratoryLAM from Ohio Supercomputer Center/Indiana University
Introduction (contd)Some Additional Goals [Snir et al. 1996]Allows efficient communicationAvoids memory-to-memory copyingAllows computation and communication overlapNon-blocking communicationAllows implementation in a heterogeneous environmentProvides reliable communication interfaceUsers dont have to worry about communication failures
MPIMPI is large but not complex125 functionsBut.Need only 6 functions to write a simple MPI programMPI_InitMPI_FinalizeMPI_Comm_sizeMPI_Comm_rankMPI_SendMpi_Recv
MPI (contd)Before any other function is called, we must initializeMPI_Init(&argc, &argc)To indicate end of MPI callsMPI_Finalize()Cleans up the MPI stateShould be the last MPI function call
MPI (contd)A typical program structureint main(int argc, char **argv) { MPI_Init(&argc, &argv); . . . /* main program */ . . . MPI_Finalize();}
MPI (contd)MPI uses communicators to group processes that communicate with each otherPredefined communicator: MPI_COMM_WORLDconsists of all processes running when the program begins executionSufficient for simple programs
MPI (contd)Process rankSimilar to mytid in PVM
MPI_Comm_rank(MPI_Comm comm, int *rank)First argument: CommunicatorSecond argument: returns process rank
MPI (contd)Number of processesMPI_Comm_size(MPI_Comm comm, int *size)First argument: CommunicatorSecond argument: returns number of processesExample:MPI_Comm_size(MPI_COMM_WORLD, &nprocs)
MPI (contd)Sending a message (blocking version)MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm ) Data types MPI_CHAR, MPI_INT, MPI_LONG MPI_FLOAT, MPI_DOUBLEBuffer descriptionDestination specification
MPI (contd)Receiving a message (blocking version)MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) Wildcard specification allowedMPI_ANY_SOURCEMPI_ANY_TAG
MPI (contd)Receiving a messageStatus of the received messageStatus gives two pieces of information directly Useful when wildcards are used status.MPI_SOURCE Gives identity of the sourcestatus.MPI_TAG Gives the tag information
MPI (contd)Receiving a messageStatus also gives message size information indirectly MPI_Get_count(MPI_Status *status, MPI_Datatype datatype, int *count)
Takes status and the datatype as inputs and returns the number of elements via count
MPI (contd)Non-blocking communicationPrefix send and recv by I (for immediate)MPI_IsendMPI_IrecvNeed completion operations to see if the operation is completedMPI_WaitMPI_Test
MPI (contd)Sending a message (non-blocking version)MPI_Isend(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm,MPI_Request *request ) Returns the request handle
MPI (contd)Receiving a message (non-blocking version)MPI_Irecv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm,MPI_Request *request ) Same arguments as Isend
MPI (contd)How do we know when a non-blocking operation is done?Use MPI_Test or MPI_WaitCompletion of a send indicates:Sender can access the send bufferCompletion of a receive indicatesReceive buffer contains the message
MPI (contd)MPI_Test returns the statusDoes not wait for the operation to completeMPI_Test(MPI_Request*request, int *flag, MPI_Status *status) Request handleOperation status: true (if completed)If flag = true, gives status
MPI (contd)MPI_Wait waits until the operation is completedMPI_Wait( MPI_Request *request, MPI_Status *status) Request handleGives status
MPI Collective CommunicationCollective communicationSeveral functions are provided to support collective communicationSome examples are given here:MPI_BarrierMPI_BcastMPI_ScatterMPI_GatherMPI_ReduceBroadcastBarrier synchronizationGlobal reduction
From Snir et al. 1996
MPI Collective Communication (contd)MPI_Barrier blocks the caller until all group members have called itMPI_Barrier( MPI_Comm comm ) The call returns at any process only after all group members have entered the call
MPI Collective Communication (contd)MPI_Bcast broadcasts a message from root to all processes of the groupMPI_Bcast( void* buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm )
MPI Collective Communication (contd)MPI_Scatter distributes data from the root process to all the others in the groupMPI_Scatter(void* send_buf, int send_count, MPI_Datatype send_type, void* recv_buf, int recv_count, MPI_Datatype recv_type, int root, MPI_Comm comm )
MPI Collective Communication (contd)MPI_Gather inverse of the scatter operation (gathers data and stores it in rank order)MPI_Scatter(void* send_buf, int send_count, MPI_Datatype send_type, void* recv_buf, int recv_count, MPI_Datatype recv_type, int root, MPI_Comm comm )
MPI Collective Communication (contd)MPI_Reduce performs global reduction operations such as sum, max, min, AND, etc.MPI_Reduce(void* send_buf, void* recv_buf, int count, MPI_Datatype datatype, MPI_Op operation, int root, MPI_Comm comm )
MPI Collective Communication (contd)Predefined reduce operations includeMPI_MAXmaximumMPI_MINminimumMPI_SUMsum MPI_PRODproductMPI_LANDlogical ANDMPI_BANDbitwise ANDMPI_LORlogical ORMPI_BORbitwise ORMPI_LXORlogical XORMPI_BXORbitwise XOR
FromSnir et al. 1996Last slide