65
Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Embed Size (px)

Citation preview

Page 1: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Shared-Memory Multiprocessors

Prof. Sivarama Dandamudi

School of Computer Science

Carleton University

Page 2: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 2

Roadmap

UI CedarArchitecture overviewOperating system primitivesMultiprocessing primitives

Run queue organizationCentralizedDistributedHierarchical organization

Page 3: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 3

UI Cedar Architecture

Shared-memory MIMD system Experimental system built at University of Illinois

Processors are grouped into clustersUses a hierarchical organizationThree levels of memory hierarchy

Local memoryCluster memoryGlobal memory

Refer to the same physical memory

Page 4: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 4

UI Cedar Architecture (cont’d)

CCU: Cluster Control Unit

Page 5: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 5

UI Cedar Architecture (cont’d)

Local memoryLocal to each processorNo need to go through any network

Cluster memoryProcessors in a cluster can access this memoryAccess via local interconnection network

Global memoryAny processor can access this memoryAccess is via the global interconnection network

Page 6: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 6

UI Cedar Architecture (cont’d)

Processor cluster (PC)Smallest execution unit

Typically 8 processors

A compound function (a chunk of program) Can be assigned to one or more PCs

Each processor consists of FP unit No local data registers (unusual) Local memory can be used as a large data set

Local memory can be dynamically partitioned into pseudo-vector registers of different sizes

Page 7: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 7

UI Cedar Architecture (cont’d)

Processor cluster (PC)Controlled by CCUCCU serves as a synchronization unit

Starts all processors when the data is moved from global to local memory

Signals GCU when a CF is done

Local networkEither a crossbar or a bus

Global networkBased on an extension of the Omega network

Page 8: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 8

UI Cedar Architecture (cont’d)

At least 2 paths from every switch(except the last)

Added redundancy to original Omega

Improves FT and reduces conflicts

Page 9: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 9

UI Cedar Architecture (cont’d)

Memory system

Each PC contains eight 16K memory modules

Memory hierarchy is user transparentCCUs and GCU move program code from global to

local memory in large blocks

Transfer time is overlapped with computation

Page 10: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 10

UI Cedar Architecture (cont’d)

Cache system Implemented in local memories for global memory

accessesNot all accesses are cachedOnly those predetermined by programmer or compiler

To avoid cache consistency problems Caches only

Read-only data, or Data written by a single processor (i.e., private data)

Page 11: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 11

UI Cedar Architecture (cont’d)

GCU Uses macro-dataflow

To reduce scheduling and other overheads Considers large structures (arrays) as one object

Several operations are combined to reduce scheduling overhead

Each PC is considered as an execution unit Each PC executes a Compound Function

Views program as a directed graph Nodes are CFs

Large data structures are stored in global memory No structure copying problem

Page 12: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 12

Synchronization Primitive

Synchronization is supported via a sync variable It is a special data type supported by the hardwareConsists of two contiguous items in global memory

Each item is either 4 bytes (single precision) or 8 bytes (double precision)

First item: key Always an integer

Second item: data Unspecified type (integer, floating point, logical, or address)

Page 13: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 13

Synchronization Primitive (cont’d)

Sync expressionsync(key-relation; key-op; data-op)

key-relation: key relop expression voidkey-op: lvalue = key key = expression lvalue = ++key lvalue = --key ++key --key void

data-op: lvalue = data data = expression void

Page 14: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 14

Synchronization Primitive (cont’d)

Sync expression semanticsKey-relation is evaluated

If true, key-op and data-op are done indivisibly

Result of sync expression is the value of the key-relationIf key-relation is omitted

Key-op and data-op are done unconditionally

When data-op is missing Key does not have to be a key field of a sync variable

It can be any integer

Page 15: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 15

Synchronization Primitive (cont’d)

Sync expression example

while (!sync(lock == 0; ++lock));

/* spin-wait until lock is free */

/* and then set lock */

accum += delta;

lock = 0; /* unlock */

Page 16: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 16

Memory Attributes

Three typesLocality

Global Cluster

Page type Shared Private

Access privilege Read, write, execute A combination of these

Page 17: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 17

Memory Attributes (cont’d)

Locality attributeSpecifies where the page should be located in the

hierarchyGlobal pages are mapped to physical global memoryCluster pages are mapped to cluster memoryDetails of physical mapping are not visible to a user program

Xylem always places a page according to its attribute when a user program references it

Page 18: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 18

Memory Attributes (cont’d)

Page type attribute

Specifies whether the page is shared or private

Indicates how a task logically sees the page Private pages belong to a single task

Any modifications done can only be seen by that task

Other tasks do not see these changes

Modifications done to a shared page can be seen by other tasks

Page 19: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 19

Multiprocessing Support

Cedar compiler takes FORTRAN source code Analyzes for implicit parallelism Generates a control flow graph

User Control Block (UCB) Created when the user first logs in Multiple logins do not create multiple UCBs (one UCB per user)

Process Control Block (PCB)When a process is created (via Unix fork)

One PCB and a single task control block (TCB) are created The new task is scheduled This can create other tasks linked to the same PCB

Page 20: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 20

Multiprocessing Support (cont’d)

Page 21: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 21

Multiprocessing Support (cont’d)

Five primitives are providedcreate_task()delete_task() start_task()end_task()

Stop task

wait_task()Wait for another task

Page 22: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 22

Multiprocessing Support (cont’d)

create_task()

Creates a new TCB Attached to caller’s PCB

Not scheduled for execution

Task is in idle state

Returns an integer to identify the task

No child-parent relationship

Page 23: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 23

Multiprocessing Support (cont’d)

delete_task(tasknum)

Deletes the task identified by tasknum

TCB and associated resources are deallocated

If the task was executing, it is terminated

Error if tasknum is unknown

Page 24: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 24

Multiprocessing Support (cont’d)

start_task(tasknum, pc)Forces the task identified by tasknum to begin execution at

location pc

Task is marked busy and scheduled for execution

If the task is already busy, it is interrupted with no way of

returning

Error if unknown tasknum

Page 25: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 25

Multiprocessing Support (cont’d)

end_task()Marks the calling task as idle and stops its execution

All tasks waiting for this task are unblocked

It does not deallocate resources allocated to the task

A task that waits for this one can Delete it

Start it at another location, or

Let it remain idle

Page 26: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 26

Multiprocessing Support (cont’d)

wait_task(tasknum)Blocks the calling task until the specified task (i.e., tasknum)

enters idle stateA task enters idle state

When it is created When it calls end_task

If the specified task is already in idle state, the calling task continues immediately

Error if unknown tasknum

Page 27: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 27

Example 1 (cont’d)

global shared integer: FLAG, MIDDLElocal private integer: RIGHT

A:<body of node A> FLAG = 0 RIGHT = create_task() call start_task(RIGHT,C) goto B

B: <body of B> if (.NOT. SYNC(FLAG == 0; ++FLAG)) then MIDDLE = create_task() call start_task(MIDDLE,E) endif goto D

A

CB

FD E

G

Page 28: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 28

Example 1 (cont’d)

C: <body of C> if (.NOT. SYNC(FLAG == 0; ++FLAG)) then MIDDLE = create_task() call start_task(MIDDLE,E) endif goto F

D: <body of D> goto GD

E: <body of E> goto GE

E: <body of E> goto GF

A

CB

FD E

G

Page 29: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 29

Example 1 (cont’d)

GE:

GF:

call end_task()

GD:

call wait_task(RIGHT)

call wait_task(MIDDLE)

call delete_task(RIGHT)

call delete_task(MIDDlE)

A

CB

FD E

G

Page 30: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 30

Example 2 (cont’d)

DO 101 I = 1,N

DO 101 J = 1,210

101 A(I,J) = B(I,J) + C(j)

DO 102 I = 1,10000

F(I) = ABS(F(I))

102 IF (G(I) .LT. 0) F(I) = -F(I)

A

CB

D

DOALL101

DOALL102

Page 31: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 31

Example 2 (cont’d)

local private integer T

A: T = create_task() call start_task(T,C) goto B

B: doall 101 goto DB

C: doall 102 goto DC

DB:call wait_task(T)

call delete_task(T)

goto next_node

A

CB

D

DOALL101

DOALL102

DC:call end_task()

Page 32: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 32

Example 2 (cont’d)

C: N = 10 local private integer tasknum(N),T,J global shared integer I

I = 0 do J = 1,N T = create_task() tasknum(J) = T call start_task(T,CC) enddo

do J = 1,N T = tasknum(J) call wait_task(T) call delete_task(T) enddo

CC:local private integer J,K

dowhile SYNC(I<100;J=++I))

J = J*100 – 99 do K = J, J+99 F(K) = abs(F(K)) if(G(K) .LT. 0) F(K) = -F(K) enddo endwhile call end_task()

Page 33: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 33

Run Queue Organization

Run queue organizationsCentralized

A single global queue

DistributedLocal queues

HybridMultiple queues

Hierarchical

Page 34: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 34

Run Queue Organization (cont’d)

Centralized organization A single global queue Tasks are accessible to all

processors Mutually exclusive access

to the global queue is required

Can lead to queue access contention for large number of processors

Good for small systems

Page 35: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 35

Run Queue Organization (cont’d)

Distributed organization A local queue at each

processor Tasks are accessible only to

the associated processor Need a task placement

policy Excellent scalability

Good for large systems Load balancing is a

problem

Page 36: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 36

Run Queue Organization (cont’d)

Performance comparisonRun queue access time is not negligible# of processors = 64Average # of tasks/job = 64 (exponentially distributed)Average task service time = 1 time unit (expo. dist.)Run queue access time

f = 0% to 4% of task service time

Page 37: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 37

Run Queue Organization (cont’d)

0

10

20

30

40

50

0 0.2 0.4 0.6 0.8 1

Utilization

Mea

n re

spon

se ti

me

f = 4% f = 3% f = 2% f = 1% f = 0%

Centralized

Page 38: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 38

Run Queue Organization (cont’d)

0

10

20

30

40

50

0 0.2 0.4 0.6 0.8 1

Utilization

Mea

n re

spon

se ti

me

f = 4% f = 0%

Distributed

Page 39: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 39

Run Queue Organization (cont’d)

0

20

40

60

80

0 1 2 3 4

Service time CV

Mea

n re

spon

se ti

me

Distributed Centralized

Page 40: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 40

Improving Performance

Centralized organizationNeed to minimize access contentionAutonomous policy (Nelson & Squillante)

Every access brings a set of tasks Reduces the number of accesses to the central queue Potential problems

Load imbalance Optimal set size depends on the system load Large service time CV can cause performance deterioration

Page 41: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 41

Improving Performance (cont’d)

Cooperative policy (Nelson & Squillante) Every access brings a tasks for other processors as well

Moves tasks from the central queue to other processor local queues Uses “join the shortest queue” policy Improves load balancing Performs better than Autonomous policy and distributed

organization Potential problems

Difficult to implement for large systems Scheduler need to maintain state information on other processors Their local queue length

Page 42: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 42

Improving Performance (cont’d)

Distributed OrganizationWe have to address the load imbalance problemOblivious placement policies

RandomRound robin (cyclic)

Use adaptive placement policiesShortest queueShortest response time (SRT) queue

Page 43: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 43

Improving Performance (cont’d)

0

10

20

30

40

50

0 0.2 0.4 0.6 0.8 1

Utilization

Mea

n re

spon

se ti

me

Random Round robin Shortest queue SRT queue

Page 44: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 44

Improving Performance (cont’d)

0

50

100

150

200

0 1 2 3 4

Service time CV

Mea

n re

spon

se ti

me

Random Round robin Shortest queue SRT queue

Page 45: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 45

Improving Performance (cont’d)

Implementation problems with adaptive policiesSystem state overhead

Both shortest queue and SRT queue policies need system state information

To reduce this overhead, system state information is collected only from a subset P (P < N) processors

P = # of probes to collect state information If P is small, we are successful in reducing the overhead

In practice, a small number probes is sufficient

Page 46: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 46

Improving Performance (cont’d)

0

5

10

15

20

1 2 3 4 5 6 7 8 9 10

Number of probes

Mea

n re

spon

se ti

me

Shortest queue SRT queue

Page 47: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 47

Improving Performance (cont’d)

A problem with SRT Queue policyNeed to have a priori knowledge of execution timesOften we may get only an estimate

Subject to estimation errorsESRT queue policy

Uses estimate that is with in X % of the actual service time In the experiments we used 30%

SRT queue policyAssumes exact service time is known beforehand

Page 48: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 48

Improving Performance (cont’d)

0

10

20

30

40

0 1 2 3 4

Service time CV

Mea

n re

spon

se ti

me

Shortest queue SRT queue ESRT queue

Page 49: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 49

Hierarchical Organization

Goal is to have best of the both organizationsAvoid bottleneck problems like the distributed organizationGood load sharing like in the centralized organizationShould be self-schedulingNo state information collection

Hierarchical organization provides all these desired features

Performs close to the centralized organization but scaled well like the distributed organization

Page 50: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 50

Hierarchical Organization (cont’d)

Page 51: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 51

Hierarchical Organization (cont’d)

Tr = 1 Tr = 2

Page 52: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 52

Hierarchical Organization (cont’d)

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1

Utilization

Mea

n re

spon

se ti

me

f = 4% f = 3% f = 2% f = 1% f = 0%

Centralized

Page 53: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 53

Hierarchical Organization (cont’d)

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1

Utilization

Mea

n re

spon

se ti

me

Distributed Hierarchical

Page 54: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 54

Hierarchical Organization (cont’d)

0

20

40

60

80

100

0 20 40 60 80 100

Number of tasks

Mea

n re

spon

se ti

me

f = 4% f = 3% f = 2% f = 1% f = 0%

CentralizedFixed task size

Page 55: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 55

Hierarchical Organization (cont’d)

0

20

40

60

80

100

0 20 40 60 80 100

Number of tasks

Mea

n re

spon

se ti

me

Distributed Hierarchical

Fixed task size

Page 56: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 56

Hierarchical Organization (cont’d)

0

20

40

60

80

100

0 20 40 60 80

Number of tasks

Mea

n re

spon

se ti

me

Distributed Hierarchical

Fixed job size

Page 57: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 57

Hierarchical Organization (cont’d)

0

20

40

60

80

100

120

0 1 2 3 4

Service time CV

Mea

n re

spon

se ti

me

Distributed Hierarchical Centralized

Page 58: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 58

Hierarchical Organization (cont’d)

0

20

40

60

80

100

120

0 1 2 3 4Service time CV

Mea

n re

spon

se ti

me

Distributed (utilization = 0.5) Distributed (utilization = 0.75)

Hierarchical (utilization = 0.5) Hierarchical (utilization = 0.75)

Page 59: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 59

Hierarchical Organization (cont’d)

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1

Utilization

Mea

n re

spon

se t

ime

Distributed, N = 64 Distributed, N = 128

Hierarchical, N = 64 Hierarchical, N = 128

Page 60: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 60

Hierarchical Organization (cont’d)

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1

Utilization

Mea

n re

spon

se t

ime

Distributed, N = 64 Distributed, N = 128

Hierarchical, N = 64 Hierarchical, N = 128

Page 61: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 61

Hierarchical Organization (cont’d)

0.7

0.8

0.9

1

1.1

1.2

1.3

0 0.2 0.4 0.6 0.8 1

Utilization

Rat

io o

f m

ean

resp

onse

tim

e

B = 2 to B = 4, f = 2% B = 8 to B = 4, f = 2%

B = 2 to B = 4, f = 4% B = 8 to B = 4, f = 4%

Page 62: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 62

Hierarchical Organization (cont’d)

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

0 0.2 0.4 0.6 0.8 1

Utilization

Rat

io o

f m

ean

resp

onse

tim

e

Tr = 2, f = 4% Tr = 0.5, f = 4%

Tr = 2, f = 2% Tr = 0.5, f = 2%

Page 63: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 63

Hierarchical Organization (cont’d)

Adaptive number of tasks Policy 1

Moves # tasks proportional to the # tasks queued at parent

At least as in the static Tr * # processors below the

child queue

Page 64: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 64

Hierarchical Organization (cont’d)

Adaptive number of tasks Policy 2

Moves # tasks proportional to the # tasks queued at parent

But maintains I to keep this value the same for al children of the parent

At least as in the static Tr * # processors below the

child queue

Page 65: Shared-Memory Multiprocessors Prof. Sivarama Dandamudi School of Computer Science Carleton University

Carleton University © S. Dandamudi 65

Hierarchical Organization (cont’d)

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.2 0.4 0.6 0.8 1

Utilization

Rat

io o

f m

ean

resp

onse

tim

e

Policy 1 Policy 2

Last slide