25
1 Multiprocessor Scheduling Module 3.1 or a good summary on multiprocessor and real-time scheduling, visit: http://www.cs.uah.edu/~weisskop/osnotes_html/M8.html

1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

Embed Size (px)

Citation preview

Page 1: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

1

Multiprocessor Scheduling

Module 3.1

For a good summary on multiprocessor and real-time scheduling,

visit: http://www.cs.uah.edu/~weisskop/osnotes_html/M8.html

Page 2: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

2

Classifications of Multiprocessor Systems

• Loosely coupled multiprocessor, or clusters– Each processor has its own memory and I/O channels.

• Functionally specialized processors– Such as I/O processor

• Nvidia GPGPU

– Controlled by a master processor

• Tightly coupled multiprocessing– MCMP

– Processors share main memory

– Controlled by operating system

– More economical than clusters

Page 3: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

3

Types of parallelism

• Bit-level parallelism

• Instruction-level parallelism

• Data parallelism

• Task parallelism our focus

Page 4: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

4

Synchronization Granuality

• Refers to frequency of synchronization or parallelism among processes in the system

• Five classes exist– Independent (SI is not applicable)

– Very coarse (2000 < SI < 1M)

– Course (200 < SI < 2000)

– Medium (20 < SI < 200)

– Fine (SI < 20)

SI is called the Synchronization Interval, and measured in instructions.

Page 5: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

5

Wikipedia on Fine-grained, coarse-grained, and embarrassing parallelism

• Applications are often classified according to how often their subtasks need to synchronize or communicate with each other.

• An application exhibits fine-grained parallelism if its subtasks must communicate many times per second;

• it exhibits coarse-grained parallelism if they do not communicate many times per second,

• and it is embarrassingly parallel if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize.

Page 6: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

6

Independent Parallelism

• Multiple unrelated processes

• Separate application or job, e.g. spreadsheet, word processor, etc.

• No synchronization

• More than one processor is available– Average response time to users is less

Page 7: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

7

Coarse and Very Coarse-Grained Parallelism

• Very coarse: distributed processing across network nodes to form a single computing environment

• Coarse: Synchronization among processes at a very gross level

• Good for concurrent processes running on a multiprogrammed uniprocessor– Can by supported on a multiprocessor with little

change

Page 8: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

8

Medium-Grained Parallelism

• Parallel processing or multitasking within a single application

• Single application is a collection of threads

• Threads usually interact frequently, leading to a medium-level synchronization.

Page 9: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

9

Fine-Grained Parallelism

• Highly parallel applications• Synchronization every few instructions (on very

short events).• Fill the gap between ILP (instruction level

parallelism) and Medium-grained parallelism.• Can be found in small inner loops

– Use of MPI and OpenMP programming languages

• OS should not intervene. Usually done by HW• In practice, this is very specialized and fragmented

area

Page 10: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

10

Scheduling

• Scheduling on a multiprocessor involves 3 interrelated design issues:– Assignment of processes to processors

– Use of multiprogramming on individual processors• Makes sense for processes (coarse-grained)

• May not be good for threads (medium-grained)

– Actual dispatching of a process• What scheduling policy should we use: FCFS, RR, etc.

Sometimes a very sophisticated policy becomes counter productive.

Page 11: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

11

Assignment of Processes to Processors

• Treat processors as a pooled resource and assign process to processors on demand

• Permanently assign process to a processor– Dedicate short-term queue for each processor

– Less overhead. Each does its own scheduling on its queue.

– Disadvantage: Processor could be idle (has an empty queue) while another processor has a backlog.

Page 12: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

12

Assignment of Processes to Processors

• Global queue– Schedule to any available processor– During the lifetime of the process, process may run on different

processors at different times.– In SMP architecture, context switching can be done with small cost.

• Master/slave architecture– Key kernel functions always run on a particular processor– Master is responsible for scheduling– Slave sends service request to the master– Synchronization is simplified– Disadvantages

• Failure of master brings down whole system• Master can become a performance bottleneck

Page 13: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

13

Assignment of Processes to Processors

• Peer architecture– Operating system can execute on any processor– Each processor does self-scheduling from a pool

of available processes– Complicates the operating system

• Make sure two processors do not choose the same process

• Needs lots of synchronization

Page 14: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

14

Process Scheduling in Today’s SMP

• M/M/M/K Queueing system– Single queue for all processes

– Multiple queues are used for priorities

– All queues feed to the common pool of processors

• Specific scheduling disciplines is less important with more than one processor– A simple FCFS discipline with a static priority may suffice for a

multi-processor system.

– Illustrate using graph, p460

– In conclusion, specific scheduling discipline is much less important with SMP than UP

Page 15: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

15

Threads

• Executes separate from the rest of the process

• An application can be a set of threads that cooperate and execute concurrently in the same address space

• Threads running on separate processors yields a dramatic gain in performance

Page 16: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

16

Multiprocessor Thread Scheduling --Four General Approaches (1/2)

• Load sharing– Processes are not assigned to a particular

processor

• Gang scheduling– A set of related threads is scheduled to run on a

set of processors at the same time

Page 17: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

17

Multiprocessor Thread Scheduling --Four General Approaches (2/2)

• Dedicated processor assignment– Threads are assigned to a specific processor.

(each thread can be run on a processor.)– When program terminates, processors are returned

to the processor available pool

• Dynamic scheduling– Number of threads can be altered during course of

execution

Page 18: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

18

Load Sharing

• Load is distributed evenly across the processors

• No centralized scheduler required– OS runs on every processor to select the next

thread.

• Use global queues for ready threads– Usually FCFS policy

Page 19: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

19

Disadvantages of Load Sharing• Central queue needs mutual exclusion

– May be a bottleneck when more than one processor looks for work at the same time

– A noticeable problem when there are many processors.

• Preemptive threads will unlikely resume execution on the same processor– Cache use is less efficient

• If all threads are in the global queue, all threads of a program will not gain access to the processors at the same time. – Performance is compromised if coordination among threads

is high.

Page 20: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

20

Gang Scheduling

• Simultaneous scheduling of threads that make up a single process

• Useful for applications where performance severely degrades when any part of the application is not running

• Threads often need to synchronize with each other

Page 21: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

21

Advantages

• Closely-related threads execute in parallel

• Synchronization blocking is reduced

• Less context switching

• Scheduling overhead is reduced, as a single sync to signal() may affect many threads

Page 22: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

Scheduling Groups – two time slicing divisions

Page 23: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

23

Dedicated Processor Assignment – Affinitization (1/2)

• When application is scheduled, each of its threads is assigned a processor that remains dedicated/affinitized to that thread until the application runs to completion

• No multiprogramming of processors, i.e. one processor per a specific thread, and no other thread.

• Some processors may be idle as threads may block• Eliminates context switches; with certain types of applications

this can save enough time to compensate for the possible idle time penalty.

• However, in a highly parallel environment with 100 of processors, utilization of processors is not a major issue, but performance is. The total avoidance of switching results in substantial speedup.

Page 24: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

24

Dedicated Processor Assignment – Affinitization (2/2)

• In Dedicated Assignment, when the number of threads exceeds the number of available processors, efficiency drops.– Due to thread preemption, context switching, suspending

of other threads, and cache pollution

• Notice that both gang scheduling and dedicated assignment are more concerned with allocation issues than scheduling issues.

• The important question becomes "How many processors should a process be assigned?" rather than "How shall I choose the next process?"

Page 25: 1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit: weisskop/osnotes_html/M8.html

25

Dynamic Scheduling• Threads within a process are variable• Sharing the work between OS and application.• When a job originates, it requests a certain number of

processors. The OS grants some or all of the request, based on the number of processors currently available.

• Then the application itself can decide which threads run when on which processors. This requires language support as would be provided with thread libraries.

• When processors become free due to the termination of threads or processes, the OS can allocate them as needed to satisfy pending requests.

• Simulations have shown that this approach is superior to gang scheduling and or dedicated scheduling.