Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
page 1
CA4/CAE4
Concurrent Programming
Dr. Martin Crane
Recommended Text:
“Foundations of Multithreaded, Parallel and DistributedProgramming”, G.R. Andrews, Addison-Wesley, 2000.ISBN 0-201-35752-6
Additional Texts:
“Principles of Concurrent and DistributedProgramming”, M. Ben-Ari, Prentice Hall, 1990.
“The SR Programming Language, Concurrency inPractice” , G.R. Andrews and R.A. Olsson,Benjamin/Cummings, 1993.
“Using MPI: Portable Parallel Programming with theMessage Passing Interface” , W. Gropp, E. Lusk, A.Skjellum, 2nd Edition, MIT Press 1999.
page 2
Course Outline
• Introduction to Concurrent Processing
• Critical Sections and Mutual Exclusion
• Semaphores
• Monitors
• Message Passing (Java and MPI)
• Remote Procedure Calls (RPC)
• Rendezvous
• Languages for Concurrent Processing (SR, Java,Linda)
• Load Balancing and Resource Allocation
• Fault Tolerance
page 3
Parallel vs. Distr ibuted Computing
The individual processors of a distributed systemare physically distributed (loosely coupled), whereasparallel systems are “ in the same box” (tightly coupled).Both are examples of multiprocessing, but distributedcomputing introduces extra issues.
Why bother?
The advantages of concurrent processing are:
1) Faster processing
2) Better resource usage
3) Fault tolerance
Parallel Computing
Distributed Computing
Concurrent Computing
page 4
Types of Concurrent Processing
Over lapping I /O and Processing
Multi-Programming
One processor time-slices between severalprograms.
I/O
Main
Active
Waiting
I/O
P1
Active
Waiting
P1 I/O P2 I/O P1 I/O
P2
page 5
Multi-tasking
This is a generalisation of the multi-programmingconcept. Each program is decomposed into a set ofconcurrent tasks. The processor time-slices between allthe tasks.
Multi-Processing
The set of concurrent tasks are processed by a set ofinterconnected processors. Each processor may havemore than one task allocated to it, and may time-slicebetween its allocated tasks.
Applications of Concurrent Processing
Predictive Modelling and Simulation
Weather forecasting, Oceanography and astrophysics,Socioeconomics and governmental use, etc.
Engineer ing Design and Automation
Finite-element analysis, Computational Aerodynamics,Artificial Intelligence (Image processing, patternrecognition, speech understanding, expert systems, etc.),Remote Sensing, etc.
page 6
Energy Resource Exploration
Seismic Exploration, Reservoir Modelling, PlasmaFusion Power, Nuclear Reactor Safety, etc.
Medical
Computer-assisted Tomography.
Military
Weapons Research, Intelligence gathering surveillance,etc.
Research
Computational Chemistry and Physics, GenomeResearch, VLSI analysis and design, etc.
Parallel Speed-up
Minsky’s Conjecture
Minsky argued that algorithms such as binarysearch, branch and bound, etc. would give the bestspeed-up. All processors would be used in the 1stiteration, half in the 2nd, and so, giving a speed-up oflog2 P .
page 7
Amdahl’s Law
Gene Amdahl divided a program into 2 sections,one that is inherently serial and the other which can beparallel. Let α be the fraction of the program which isinherently serial. Then,
( )( )( ) ( )
Speed up ST
TP
P
PP− = + −
+ −
=+ −
≤ ∀ >, ,α α
α α α α11 1 1
11
If α = 5%, then S ≤ 20.
The Sandia Exper iments
The Karp prize was established for the first programto achieve a speed-up of 200 or better. In 1988, a teamfrom Sandia laboratories reported a speed-up of over1,000 on a 1,024 processor system on three differentproblems.
How?
Moler ’s Law
An implicit assumption of Amdahl’s Law is that thefraction of the program which is inherently serial, α, isindependent of the size of the program. The Sandiaexperiments showed this to be false. As the problem size
page 8
increased the inherently serial parts of a programremained the same or increased at a slower rate than theproblem size. So Amdahl’s law should be
( )S
n≤ 1
α
so as the problem size, n, increases, ( )α n decreasesand S increases.
Moler defined an efficient parallel algorithm as onewhere ( )α n → 0 as n → ∞ (Moler’s law).
Sullivan’s First Theorem
The speed-up of a program is ( )min ,P C , where P isthe number of processors and C is the concurrency of theprogram.
Execution graph
(directed acyclic graph)
If N is the number of operations in the executiongraph, and D is the longest path through the graph thenthe concurrency C = N/D.
The maximum speed-up is a property of thestructure of the parallel program.
page 9
Architectural Classification Schemes
Flynn’s Classification
Bases on multiplicity of instruction streams anddata streams.
SISD Single Instruction Single Data
SIMD Single Instruction Multiple Data
Array processors
CU PU MM
IS
IS DS
MM
MM
MM
PU
PU
PU
CU
IS
IS
DS
DS
DS
page 10
MISD Multiple Instruction Single Data
Some say this is impractical - no MISDmachine exists.
MIMD Multiple Instruction Multiple Data
Most parallel systems - tightly vs. loosely coupled
CU
CU
CU
PU
PU
PU
IS
IS
IS
MM MM MM
DS
DS
IS IS IS
PU
PU
PU
IS
IS
IS
CU
CU
CU
IS
IS
IS
MM
MM
MM
DS
DS
DS
page 11
Processor Topologies
Farm (Star)
Ring
Used in image processing applications, or anythingwhich uses a nearest-neighbour operator.
Mesh
Image processing, finite-element analysis, nearest-neighbour operations.
page 12
Torus
A variant of the mesh.
Hypercube
A structure with the minimum diameter property.
3-d hypercube
4-d hypercube
page 13
A Model of Concurrent Programming
A concurrent program is the interleaving of sets ofsequential atomic instructions.
A concurrent program can be considered as a set ofinteracting sequential processes. These sequentialprocesses execute at the same time, on the same ordifferent processors. The processes are said to beinterleaved, that is at any given time each processor isexecuting one of the instructions of the sequentialprocesses. The relative rate at which the instructions ofeach process are executed is not important.
Each sequential process consists of a series ofatomic instructions. An atomic instruction is aninstruction that once it starts, proceeds to completionwithout interruption. Different processors have differentatomic instructions , and this can have a big effect.
N : Integer := 0;
Task body P1 isbegin
N := N + 1;end P1;
Task body P2 isbegin
N := N + 1;end P2;
page 14
If the processor includes instructions like INC thenthis program will be correct no matter which instructionis executed first. But if all arithmetic must be performedin registers then following interleaving does not producethe desired results.
P1: load reg, NP2: load reg, NP1: add reg, #1P2: add reg, #1P1: store reg, NP2: store reg, N
A concurrent program must be correct under allpossible interleavings.
Correctness
If ( )P a� is a property of the input (pre-condition),and ( )Q a b� �, is a property of the input and output(post-condition), then correctness is defined as:
Partial correctness
( ) ( )( )( ) ( )P a a b Q a b� � � � �∧ ⇒terminates Prog , ,Total correctness
( ) ( )( ) ( )( )P a a b Q a b� � � � �⇒ ∧terminates Prog , ,
page 15
Totally correct programs terminate. A totally correctspecification of the incrementing tasks is:
( )( )( )a N a a a a∈ ⇒ ∧ = +terminates INC , 1There are 2 types of correctness properties:
Safety proper ties These must always be true.
Mutual exclusion Two processes must notinterleave certain sequences ofinstructions.
Absence of deadlock Deadlock is when anon-terminating system cannotrespond to any signal.
Liveness proper ties These must eventually be true.
Absence of starvation Information sent is delivered.
Fairness That any contention must beresolved.
There 4 different way to specify fairness.
Weak Fairness If a process continuously makes arequest, eventually it will be granted.
Strong fairness If a process makes a request infinitelyoften, eventually it will be granted.
page 16
Linear waiting If a process makes a request, it willbe granted before any other processis granted the request more thanonce.
FIFO If a process makes a request, it willbe granted before any other processmakes a later request.
Mutual Exclusion
A concurrent program must be correct in allallowable interleavings. Therefore there must be somesections of the different processes which cannot beallowed to be interleaved. These are called criticalsections.
do true ->Non_Critical_SectionPre_protocolCritical_SectionPost_protocol
od
page 17
First proposed solution
var Turn: int := 1;
process P1do true ->
Non_Critical_Sectiondo Turn != 1 ->odCritical_SectionTurn := 2
odend
process P2do true ->
Non_Critical_Sectiondo Turn != 2 ->odCritical_SectionTurn := 1
odend
This solution satisfies mutual exclusion.
This solution cannot deadlock, since both processeswould have to loop on the test on Turn infinitely andfail. This would imply Turn = 1 and Turn = 2 at thesame time.
page 18
There is no starvation. This would require one taskto execute its critical section infinitely often and theother task to be stuck in its pre-protocol.
However this solution can fail in the absence ofcontention. If one process halts in its critical section theother process will always fail in its pre-protocol. Even ifthe processes are guaranteed not to halt, both processesare forced to execute at the same rate. This, in general, isnot acceptable.
Second proposed solution
The first attempt failed because both processesshared the same variable.
var C1:int := 1var C2:int := 1
process P1do true ->
Non_Critical_Sectiondo C2 != 1 ->odC1 := 0Critical_SectionC1 := 1
odend
page 19
process P2do true ->
Non_Critical_Sectiondo C1 != 1 ->odC2 := 0Critical_SectionC2 := 1
odend
This unfortunately violates the mutual exclusionrequirement. To prove this we need to find only oneinterleaving which allows P1 and P2 into their criticalsection simultaneously. Starting from the initial state, wehave:
P1 checks C2 and finds C2 = 1.P2 checks C1 and finds C1 = 1.P1 sets C1 = 0.P2 sets C2 = 0.P1 enters its critical section.P2 enters its critical section. QED
page 20
Third proposed solution
The problem with the last attempt is that once thepre-protocol loop is completed you cannot stop a processfrom entering its critical section. So the pre-protocolloop should be considered as part of the critical section.
var C1:int := 1var C2:int := 1
process P1do true ->
Non_Critical_Section # a1C1 := 0 # b1do C2 != 1 -> # c1odCritical_Section # d1C1 := 1 # e1
odend
process P2do true ->
Non_Critical_Section # a2C2 := 0 # b2do C1 != 1 -> # c2odCritical_Section # d2C2 := 1 # e2
odend
page 21
We can prove that the mutual exclusion property isvalid. To do this we need to prove that the followingequations are invariants.
( ) ( ) ( )( ) ( ) ( )
( ) ( )( )
C at c at d at e
C at c at d at e
at d at d
1 0 1 1 1 1
2 0 2 2 2 2
1 2 3
= ≡ ∨ ∨ −= ≡ ∨ ∨ −
¬ ∧ −
eqn
eqn
eqn
at(x) ⇒ x is the next instruction to be executed inthat process.
Eqn 1 is initially true. Only the b1→c1 and e1→a1transitions can affect its truth. But each of thesetransitions also changes the value of C1.
A similar proof is true for eqn 2.
Eqn 3 is initially true, and can only be made false bya c2→d2 transition while at(d1) is true. But by eqn 1,at(d1)⇒C1=0, so c2→d2 cannot occur since thisrequires C1=1. Similar proof for process P2.
Fourth proposed solution
The problem with the last proposed solution wasthat once a process indicated its intention to enter itscritical section, it also insisted on entering its criticalsection. What we need is some way for a process torelinquish its attempt if it fails to gain immediate accessto its critical section, and try again.
page 22
var C1:int := 1var C2:int := 1
process P1do true ->
Non_Critical_SectionC1 := 0do true ->
if C2 = 1 -> exit fiC1 :=1C1 := 0
odCritical_SectionC1 := 1
odend
process P2do true ->
Non_Critical_SectionC2 := 0do true ->
if C1 = 1 -> exit fiC2 :=1C2 := 0
odCritical_SectionC2 := 1
odend
page 23
This proposal has two drawbacks.
1) A process can be starved. You can find aninterleaving in which a process never gets to enterits critical section.
2) The program can livelock. This is a form ofdeadlock. In deadlock there is no possibleinterleaving which allows the processes to entertheir critical sections. In livelock, some interleavingsucceed, but there sequences which do not succeed.
Dekker ’s Algor ithm
This is a combinations of the first and fourthproposals. The first proposal explicitly passed the rightto enter the critical sections between the processes,whereas the fourth proposal had its own variable toprevent problems in the absence of contention. InDekker’s algorithm the right to insist on entering acritical section is explicitly passed between processes.
var C1:int := 1var C2:int := 1var Turn:int := 1
page 24
process P1do true ->
Non_Critical_SectionC1 := 0do true ->
if C2 = 1 -> exit fiif Turn = 2 ->
C1 := 1do Turn != 1 -> odC1 := 0
fiodCritical_SectionC1 := 1Turn := 2
odend
process P2do true ->
Non_Critical_SectionC2 := 0do true ->
if C1 = 1 -> exit fiif Turn = 1 ->
C2 := 1do Turn != 2 -> odC2 := 0
fiodCritical_Section
page 25
C2 := 1Turn := 1
odend
This is a solution for mutual exclusion for 2processes.
Mutual Exclusion for N Processes
There are many N process mutual exclusionalgorithms; all complicated and relatively slow to othermethods. One such algorithm is the Bakery Algorithm.The idea is that each process takes a numbered ticket(whose value constantly increases) when it want to enterits critical section. The process with the lowest currentticket gets to enter its critical section.
var Choosing: [N] intvar Number: [N] int
# Choosing and Number arrays initialised to zero
process P(i:int)do true ->
Non_Critical_SectionChoosing [i] := 1Number [i] := 1 + max (Number)
page 26
Choosing := 0fa j := 1 to N ->
if j != i ->do Choosing [ j] != 0 -> oddo true ->
if (Number [ j] = 0) or(Number [i] < Number [ j]) or ((Number [i] = Number [ j]) and (i < j)) ->exit
fiod
fiafCritical_SectionNumber [i] := 0
odend
The bakery algorithm is not practical because:
a) the ticket numbers will be unbounded if someprocess is always in its critical section, and
b) even in the absence of contention it is veryinefficient as each process must query the otherprocesses for their ticket number.
page 27
Hardware-Assisted Mutual Exclusion
If the atomic instructions available allow a load andstore in a single atomic instruction all out problemsdisappear. For example if there is an atomic test and setinstruction equivalent to li := C; C := 1 in oneinstruction, then we could have mutual exclusion asfollows.
var C:int := 0
process Pvar li:intdo true ->
Non_Critical_Sectiondo li != 0 -> test_and_set (li) odCritical_SectionC := 0
odend
A similar solution exists with atomic exchangeinstructions.