39
Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault 1 Shohei Gotoda , Naoki Shibata , Minoru Ito Nara Institute of Science and Technology Shiga University

(Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Embed Size (px)

DESCRIPTION

Shohei Gotoda, Naoki Shibata and Minoru Ito : "Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault," Proceedings of IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2012), pp.260-267, DOI:10.1109/CCGrid.2012.23, May 15, 2012. In this paper, we propose a task scheduling al-gorithm for a multicore processor system which reduces the recovery time in case of a single fail-stop failure of a multicore processor. Many of the recently developed processors have multiple cores on a single die, so that one failure of a computing node results in failure of many processors. In the case of a failure of a multicore processor, all tasks which have been executed on the failed multicore processor have to be recovered at once. The proposed algorithm is based on an existing checkpointing technique, and we assume that the state is saved when nodes send results to the next node. If a series of computations that depends on former results is executed on a single die, we need to execute all parts of the series of computations again in the case of failure of the processor. The proposed scheduling algorithm tries not to concentrate tasks to processors on a die. We designed our algorithm as a parallel algorithm that achieves O(n) speedup where n is the number of processors. We evaluated our method using simulations and experiments with four PCs. We compared our method with existing scheduling method, and in the simulation, the execution time including recovery time in the case of a node failure is reduced by up to 50% while the overhead in the case of no failure was a few percent in typical scenarios.

Citation preview

Page 1: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Task scheduling algorithm for multicore processor system for

minimizing recovery time in case of single node fault

1

Shohei Gotoda†, Naoki Shibata‡, Minoru Ito†

†Nara Institute of Science and Technology‡Shiga University

Page 2: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Background

• Multicore processors Almost all processors designed recently

are multicore processors

• Computing cluster consisting of 1800 nodes experiences about 1000 failures[1]in the first year after deployment[1] Google spotlights data center inner workings cnet.com article on May 30, 2008

Page 3: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Objective of Research

• Fault tolerance We assume a single fail-stop failure of a

multicore processor

• Network contention To generate schedules reproducible on

real systems

3

Devise new scheduling method thatminimizes recovery time

taking account of the above points

Page 4: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Task Graph• A group of tasks that can

be executed in parallel• Vertex (task node)

Task to be executed on a single CPU core

• Edge (task link)Data dependence between

tasks

4

Task node Task link

Task graph

Page 5: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Processor Graph• Topology of the computer

network• Vertex (Processor node)

CPU core (circle)• has only one link

Switch (rectangle)• has more than 2 links

• Edge (Processor link)Communication path between

processors

5

Processor node Processor linkSwitch

Processor graph

321

Page 6: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Task Scheduling• Task scheduling problem

assigns a processor node to each task node

minimizes total execution time

An NP-hard problem

6

1

One processor node is assigned to each task

node321

Processor graph

Task graph

Page 7: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Inputs and Outputs for Task Scheduling

• InputsTask graph and processor

graph

• OutputA schedule

• which is an assignment of a processor node to each task node

• Objective functionMinimize task execution time

7

3

31

31

321

Processor graph

Task graph

Page 8: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Network Contention Model

• Communication delayIf processor link is occupied by

another communication

• We use existing network contention model[2]

8

3

31

32

Contention 321

Processor graph

Task graph

[2] O. Sinnen and L.A. Sousa, “Communication Contention in Task Scheduling,“ IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 6, pp. 503-515, 2005.

Page 9: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Multicore Processor Model

• Each core executes a task independently from other cores

• Communication between cores finishes instantaneously

• One network interface is shared among all cores on a die

• If there is a failure, all cores on a die stop execution simultaneously

9

Core1

Core2

CPU

21

Processor graph

Page 10: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Influence of Multicore Processors

10

• Need for considering multicore processors in schedulingHigh speed communication

link among processors on a single die

• Existing schedulers try to utilize this high speed link

• As a result, many dependent tasks are assigned to cores on a single die

3

31

32

321Assigned to cores on a same die

Processor graph

Task graph

Page 11: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

• Need for considering multicore processors in schedulingHigh speed communication

link among processors on a single die

• Existing schedulers try to utilize this high speed link

• As a result, many dependent tasks are assigned to cores on a single die

In case of fault• Dependent tasks tends to be

destroyed at a time

11

3

31

32

321

Processor graph

Task graph

Influence of Multicore Processors

Assigned to cores on a same die

Page 12: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Related Work (1/2)• Checkpointing [3]

Node state is saved in each nodeBackup node is allocatedRecover processing results from saved state

Multicore is not consideredNetwork contention is not considered

12

[3] Y. Gu, Z. Zhang, F. Ye, H. Yang, M. Kim, H. Lei, and Z. Liu. An empirical study of high availability in stream processing systems. In Middleware ’09: the 10th ACM/IFIP/USENIX International Conference on Middleware (Industrial Track), 2009.

1

2

3

4

Input Queue

Output Queue

Secondary

Primary Backup

Page 13: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Related Work (2/2)• Task scheduling method[5] in which

Multiple task graph templates are prepared beforehand

Processors are assigned according to the templates

• This method is suitable for highly loaded systems

[5] Wolf, J., et al.: SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems. In: ACM Middleware (2008)

Page 14: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Our Contribution• There is no existing method for

scheduling that takes account of both• multicore processor failure• network contention

• We propose a scheduling method taking account of network contention and multicore processor failure

14

Page 15: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Assumptions• Only a single fail-stop failure of a

multicore processor can occurFailed computing node automatically restart

after 30 sec.

• Failure can be detected in one secondby interruption of heartbeat signals

• Use checkpointing technique to recover from saved state

• Network contentionContention model is same as the Sinnen’s

model15

Page 16: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Checkpointing and Recovery

• Each processor node saves state to the main memory when each task is finished Saved state is the data transferred to the

succeeding processor nodes Only output data from each task node is saved as a

state• This is much smaller than the complete memory image

We assume saving state finishes instantaneously• Since this is just copying small data within memory

• Recovery Saved state which is not affected by the failure is

found in the ancestor task nodes. Some tasks are executed again using the saved state

16

Page 17: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

What Proposed Method Tries to Do

• Reduce recovery time in case of failure Minimizes the worst case total execution

time• Worst case in the all possible patterns of

failure• Each of dies can fail

Execution time before failure + recovery

Page 18: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Worst Case Scenario• Critical path

Path in task graph from first to last task with longest execution time

• The worst case scenarioAll tasks in critical path are assigned to processors on a dieFailure happens when the last task is being executedWe need two times of total execution time

18

Example task graph

First

Last

Page 19: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Idea of Proposed Method• We distribute tasks on critical path over

diesBut, there is communication overheadIf we distribute too many tasks, there is too

much overhead

• Usually, the last tasks in critical path have larger influenceWe check tasks from the last task in the

critical pathWe find the last k tasks in the critical path to

other diesWe find the best k

Page 20: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Problem with Existing Method

20

1 2

3

A B C

21

3

BA

Resulting execution

ExistingSchedule

D

DC

• Task 1 is assigned to core A• Task 2 is assigned to core B• Task 3 is assigned to same

die• because of high

communication speed

Time

Page 21: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

• Suppose that failure happens when Task 3 is being executed

• All results are lost

21

1 2

3

A B C

21

3

BA

D

DC

Resulting execution

ExistingSchedule

Time

Problem with Existing Method

Page 22: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Problem with Existing Method

22

1 2

3

A B C

21

3

BA

D

DC

1’ 2’

3’

21

3

‘ ‘

Resulting execution

ExistingSchedule

Time

• Suppose that failure happens when Task 3 is being executed

• All results are lost• We need to execute all tasks

again from the beginningon another die

Page 23: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Improvement in Proposed Method

• Distribute influential tasks to other diesIn this case, task 3 is the most

influential

23

21

3

Proposed schedule

1 2

3

A B C

BA

Resulting executionD

DC

Comm.overhead

Time

Page 24: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Recovery in Proposed Method

• Suppose that failure happens when Task 3 is being executed

• Results of Task 1 and 2 are saved

24

21

3

1 2

3

A B C

BA

D

DC

Resulting execution

Time

Proposed schedule

Page 25: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Recovery in Proposed Method

• Suppose that failure happens when Task 3 is being executed

• Results of Task 1 and 2 are saved

• Execution can be continued from the saved state

25

3’

21

3

1 2

3

A B C

BA

D

DC

3

‘Resulting execution

Time

Proposed schedule

Page 26: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Communication Overhead

• Communication overhead is imposed to the proposed method

26

Existing schedule Proposed schedule

overhead

1 2

3

A B C D

1 2

3

A B C D

Tim

e

Page 27: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Speed-up in Recovery

27

Recovery withexisting schedule

Recovery withproposed schedule

Proposed method has larger effect

if computation time is longer than communication time

1 2

3

A B C D

1 2

3

A B C D

1’ 2’

3’3’

speed-up

時間

Page 28: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Comparison of Schedules

28

Existing schedule

Proposed scheduleTim

e

Time

Task graph

10 32

Processor graph

1

2

6 7

3 4

8 9

5

10

11

12

13

Page 29: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

29

Notavailable

Comparison of Recovery

Existing schedule

Proposed scheduleTim

e

Time

Task graph

10 32

Processor graph

1

2

6 7

3 4

8 9

5

10

11

12

13

Page 30: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Evaluation• Items to compare

Recovery time in case of a failureOverhead in case of no failure

• Compared methodsPROPOSEDCONTENTION

• Sinnen’s method considering network contention

INTERLEAVED• Scheduling algorithm that tries to spread

tasks to all dies as much as possible30

Page 31: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Test Environment• Devices

4 PCs with• Intel Core i7 920 (2.67GHz) (Quad core)• Intel Network Interface Card

Intel Gigabit CT Desktop Adaptor (PCI Express x1)

• 6.0GB Memory

• Program to measure execution time

• Windows 7(64bit) • Java(TM) SE Runtime Environment

(64bit)• Standard TCP socket

31

Page 32: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Task Graph with Low Parallelism

Configuration• Number of task nodes :

90• Number of cores on a

die : 2• Number of dies : 2 ~ 4• Robot control [4]

32

Task graphProcessor graph

10

Die

1 CoreSwitch

4 5

Die

# of dies

32

Die

6 7

Die

[4] Standard Task Graph Set http://www.kasahara.elec.waseda.ac.jp/schedule/index.html

Page 33: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Results with Robot Control Task

• We varied number of dies• In case of failure, proposed method

reduced total execution time by 40%• In case of no failure, up to 6% of

overhead33

In case of a failure No failure

40%

6%

Number of dies Number of dies

CONTENTIONINTERLEAVED

PROPOSED

INTERLEAVED

CONTENTION

PROPOSED

Exe

cutio

n tim

e (s

ec)

Exe

cutio

n tim

e (s

ec)

Page 34: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Configuration

• Number of task nodes :98

• Number of cores on a die : 4

• Number of dies : 2 ~ 4• Sparse matrix solver [4]

34

10

Die

1 Core

Switch

2 3 54

Die

6 7

# of dies

Task Graph with High Parallelism

Processor graph

Task graph

[4] Standard Task Graph Set http://www.kasahara.elec.waseda.ac.jp/schedule/index.html

Page 35: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Results with Sparse Matrix Solver

• We varied number of dies• In case of failure, execution time

including recovery reduced by up to 25%

• In case of no failure, up to 7% of overhead

35

25%7%

In case of a failure No failure

INTERLEAVEDINTERLEAVED

CONTENTION

CONTENTION

PROPOSED

PROPOSED

Number of diesNumber of dies

Exe

cutio

n tim

e (s

ec)

Exe

cutio

n tim

e (s

ec)

Page 36: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Simulation with Varied CCR

• CCRRatio between comm. time and comp. timeHigh CCR means long communication time

• Number of tasks : 50• Number of cores on a die : 4• Number of dies : 4• Task graph

18 random graphs

10

Die

1 Core

Switch

2 3 54

Die

6 7

# of dies

Processor graph

Page 37: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

• We varied CCR• INTERLEAVED has large overhead

when CCR=10 (communication heavy)

• PROPOSED has 30% overhead, but reduced execution time in case of no failure

37

5%

30%

Results with Varied CCRIn case of a failure No failure

Exe

cutio

n tim

e (s

ec)

Exe

cutio

n tim

e (s

ec)

INTERLEAVED

CONTENTION

PROPOSED CONTENTION

PROPOSED

INTERLEAVED

Page 38: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Single thread Multi thread

Tim

e to

gen

erat

e sc

hed

ule

Effect of Parallelization of Proposed Scheduler

• Proposed algorithm is parallelized• Compared times to generate schedules

20 task graphsMulti thread vs Single ThreadSpeed-up : up to x4

38

Environment• Intel Core i7 920

(2.67GHz) • Windows 7(64bit) • Java(TM) SE 6 (64bit)

Page 39: (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

Conclusion

• Proposed task scheduling method consideringNetwork contentionSingle fail-stop failureMulticore processor

• Future workEvaluation on larger computer system

39