9
Research Article Effective Task Scheduling and IP Mapping Algorithm for Heterogeneous NoC-Based MPSoC Peng-Fei Yang and Quan Wang School of Computer, Xidian University, Xi’an 710071, China Correspondence should be addressed to Peng-Fei Yang; [email protected] Received 8 May 2014; Accepted 17 June 2014; Published 10 July 2014 Academic Editor: Yuping Wang Copyright © 2014 P.-F. Yang and Q. Wang. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Quality of task scheduling is critical to define the network communication efficiency and the performance of the entire NoC- (Network-on-Chip-) based MPSoC (multiprocessor System-on-Chip). In this paper, the NoC-based MPSoC design process is favorably divided into two steps, that is, scheduling subtasks to processing elements (PEs) of appropriate type and quantity and then mapping these PEs onto the switching nodes of NoC topology. When the task model is improved so that it reflects better the real intertask relations, optimized particle swarm optimization (PSO) is utilized to achieve the first step with expected less task running and transfer cost as well as the least task execution time. By referring to the topology of NoC and the resultant communication diagram of the first step, the second step is done with the minimal expected network transmission delay as well as less resource consumption and even power consumption. e comparative experiments have shown the preferable resource and power consumption of the algorithm when it is actually adopted in a system design. 1. Introduction e development of integrated circuit has provided strong support for the integration of multiple processing ele- ments (PEs) in single chip, and the on-chip communication between cores has developed from bus-based approach to two-dimensional and three- dimensional Network-on-Chip (NoC). e network-based highly parallel System-on-Chip (SoC) structure has become the inevitable choice for next generation of complex computer architecture [1]. Neverthe- less, the dramatic increase of PEs that can be integrated and the size of executable tasks have brought new problems and challenges to systematic design, among which the dividing and scheduling of the task and IP mapping have become the focus of systematic study. e NoC-based task scheduling and IP mapping, on the basis of given tasks, type and amount of PEs available, and topology of NoC, assign tasks to suitable PEs, map the PEs to reasonable network topology, improve as much system efficiency as possible while the whole system meets the power consumption, and delay requirements. Its significance includes the following: (1) it serves as the bridge between applications and architecture and determines the task imple- mentation, processing performance, and efficiency in archi- tecture; (2) as heterogeneous multicore architecture usually associates with particular field, efficient task scheduling could acquire support applications in specific fields; and (3) as the size of tasks and multicore system architecture is increasing, efficient division of mapping will help improve the quality and efficiency of exploring mapping space and thereby improve the performance and efficiency of the entire SoC. 2. Related Work Current research seldom distinguishes between task schedul- ing and IP mapping detailedly, and the modeling and analysis is conducted providing that a PE only performs a subtask (in some algorithms, subtasks are simplistic and considered to be PEs). at is to say, the task will be abstracted to a simple form of task model which just gives the calling relationship between subtasks; based on the above informa- tion, the scheduling algorithm will allocate as little uptime as possible [24]. e approach has many drawbacks: (1) the heterogeneous nature of NoCs and the communication delay Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 202748, 8 pages http://dx.doi.org/10.1155/2014/202748

Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

Research ArticleEffective Task Scheduling and IP Mapping Algorithm forHeterogeneous NoC-Based MPSoC

Peng-Fei Yang and Quan Wang

School of Computer Xidian University Xirsquoan 710071 China

Correspondence should be addressed to Peng-Fei Yang yangppf163com

Received 8 May 2014 Accepted 17 June 2014 Published 10 July 2014

Academic Editor Yuping Wang

Copyright copy 2014 P-F Yang and Q Wang This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Quality of task scheduling is critical to define the network communication efficiency and the performance of the entire NoC-(Network-on-Chip-) based MPSoC (multiprocessor System-on-Chip) In this paper the NoC-based MPSoC design process isfavorably divided into two steps that is scheduling subtasks to processing elements (PEs) of appropriate type and quantity andthen mapping these PEs onto the switching nodes of NoC topology When the task model is improved so that it reflects betterthe real intertask relations optimized particle swarm optimization (PSO) is utilized to achieve the first step with expected lesstask running and transfer cost as well as the least task execution time By referring to the topology of NoC and the resultantcommunication diagram of the first step the second step is done with the minimal expected network transmission delay as wellas less resource consumption and even power consumptionThe comparative experiments have shown the preferable resource andpower consumption of the algorithm when it is actually adopted in a system design

1 Introduction

The development of integrated circuit has provided strongsupport for the integration of multiple processing ele-ments (PEs) in single chip and the on-chip communicationbetween cores has developed from bus-based approach totwo-dimensional and three- dimensional Network-on-Chip(NoC) The network-based highly parallel System-on-Chip(SoC) structure has become the inevitable choice for nextgeneration of complex computer architecture [1] Neverthe-less the dramatic increase of PEs that can be integrated andthe size of executable tasks have brought new problems andchallenges to systematic design among which the dividingand scheduling of the task and IP mapping have become thefocus of systematic study

The NoC-based task scheduling and IP mapping onthe basis of given tasks type and amount of PEs availableand topology of NoC assign tasks to suitable PEs mapthe PEs to reasonable network topology improve as muchsystemefficiency as possiblewhile thewhole systemmeets thepower consumption and delay requirements Its significanceincludes the following (1) it serves as the bridge between

applications and architecture and determines the task imple-mentation processing performance and efficiency in archi-tecture (2) as heterogeneous multicore architecture usuallyassociates with particular field efficient task scheduling couldacquire support applications in specific fields and (3) as thesize of tasks and multicore system architecture is increasingefficient division ofmappingwill help improve the quality andefficiency of exploring mapping space and thereby improvethe performance and efficiency of the entire SoC

2 Related Work

Current research seldom distinguishes between task schedul-ing and IPmapping detailedly and themodeling and analysisis conducted providing that a PE only performs a subtask(in some algorithms subtasks are simplistic and consideredto be PEs) That is to say the task will be abstracted toa simple form of task model which just gives the callingrelationship between subtasks based on the above informa-tion the scheduling algorithm will allocate as little uptimeas possible [2ndash4] The approach has many drawbacks (1) theheterogeneous nature of NoCs and the communication delay

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 202748 8 pageshttpdxdoiorg1011552014202748

2 Mathematical Problems in Engineering

between tasks are usually neglected (2) as the interdepen-dence among tasks is complex the model only abstractedthe calling relationship between subtasks with the resultthat other factors cannot be fully reflected and that transfercosts among different PEs are inadequately considered Thescheduling order designed by these models is not satisfactoryin practical operation so that continuous recalculation andadjustment are required during the system operation whichinevitably brings additional burden to the system and posesthreats to operating efficiency

In addition in terms of the time of scheduling decisiontask scheduling can be divided into static scheduling anddynamic scheduling Static scheduling means that the com-piler makes scheduling decision at compiling time for exam-ple list-based algorithms [5 6] clustering algorithms [7ndash9] and duplication-based algorithms [10 11] However staticscheduling model has some drawbacks as the model is anapproximation of communication and execution time amongprocessors it might disagree with the actual implementationof the program or even produce poor scheduling results

Dynamic scheduling means that a scheduler needs toschedule tasks to appropriate processors for the implemen-tation according to their performance and in a real-time wayso that the various requirements for the system can be metResearch in this areamainly employ heuristic algorithm suchas genetic algorithm (GA) [12] and ant-colony-based opti-mization (ACO) [13 14] heuristic task scheduling dynamicscheduling algorithm based on task pool [15] particle swarmoptimization (PSO) [16 17] optimized evolutionary algo-rithm [18 19] and dynamic scheduling algorithm based onreal-time constrains [20] Although good scheduling resultscould be attained when these approaches are applied in taskpartitioning andmapping in practice the inherent defects ofthese algorithms easily result in many drawbacks during theoperation for example the convergence speed is slow in thelate stage of genetic algorithm and in the early stage of antcolony algorithm the inadequate coverage of all collectionswill lead to disparity between its result and the optimumvalue particle swarm optimization is vulnerable to involvinglocal optimization problems

Meanwhile in the aspects of NoC topology throughsilicon via (TSV) technology [21] and optical interconnectiontechnology [22 23] havemade possible higher IP core densitywider bandwidth less power consumption and smaller sizeon integrated circuit chips However the resource occupancyand power consumption brought byNoCmust be consideredIn order to decline the NoC occupancy of limited resourceand further decrease power consumption various kinds ofheterogeneousNoC topology are designed [24ndash26] to suit dif-ferentiated needs for network transmission delay and band-width of different types of PEs Currently most algorithmshave not taken the effect of heterogeneous topology on systemperformance into consideration If PEs of different types inthe premise of balanced power consumption are mapped toreasonable area according to performance requirement anddata transmission delay are minimized the performance ofsystem could be greatly improved

Based on the analysis above the whole design processis divided into two stages As shown in Figure 1 the first

stage is task dividing and scheduling When the improvedtask model could faithfully reflect the real intertask relationthe local optimum question of particle swarm algorithm issolved and the optimized PSO algorithm is used to divide abig task into proper granular-sized small tasks featuring highcohesion and low coupling according to traffic and callingrelationship There exits high parallelism among these smalltasks Then assign these small tasks to corresponding PEaccording to the task nature and generate communicationdiagram to achieve the first step with expected less transfercost as well as the least task execution time Then the processcomes to the IP mapping stage In this stage by referring tocommunication diagram and the performance disparity anddelay information of topology of NoC the PEs are reasonablymapped into switching node of NoC so as to achieve leastnetwork transmission delay with less resource occupancy andeven power consumption and less resource pieces so that thesystem performance could avoid fluctuation when new tasksneed scheduling

The rest of the paper is organized as follows Section 3shows the detailed description of task dividing and schedul-ing Section 4 illustrates the process of IP mapping A com-parative experiment result is shown in Section 5 Section 6concludes the paper

3 Task Dividing and Scheduling

Although the types and quantities of PEs integrated in hetero-geneous multicore system based on NoC are expanding thesize of application task varies and the current task schedulingalgorithm often assign and map the task in accordance withthe numbers of utilizable PEs which to some tasks of smallsize may result into problems on one hand as the tasks aredivided into subtasks of extremely small size communica-tions among subtaskswould become overfrequentwhichmaylead to prolonged task execution time on the other handinadequate utilization of the performance of PEs may resultinto increased system power consumption and reduce overallsystem efficiency

This paper superimposes tasks on a PE until the com-puting resource of the PE is occupied at an appropriateratio (settings are based on the performance requirement ofsystem as well as PEs) and then new PEs are added Theapproach not only ensures that tasks are divided into subtasksof appropriate size but also ensures that every PE invoked isefficiently used thus bringing the best overall performance

31 Task Model A task could be divided into 119873 subtasksamong which there exits certain execution sequence orcontrol logic and these subtasks are processed by119872 (119898 types119898 le 119872) PEs Assuming that the processing time of 119898 typesof PEs for every subtask communication overhead amongPEs and amount of data transmission among interdependentsubtasks are known the task on heterogeneous multicore canbe abstracted into a quintuple

DAG = (119881 119864TypePCU 119862) (1)

Mathematical Problems in Engineering 3

2

4

5 5

6

7

8

8 1012

14

1415

16

18

20

20

25

30

t1

t2

t3

t4

t5

t6

t7

t7

t8

PE1 PE1PE2

PE2PE3 PE4

PE3

PE4

S1 S1

S2S2

Figure 1 Two stages of task scheduling and IP mapping

(1) 119881 task node-set inDAGapplication that is the vertexV isin 119881means that V is a subtask in119881 And the numberof subtasks in DAG application is 119873

(2) 119864 the frontier set in DAG application that is 119890119894119895

isin 119864

means that there exits data communication betweenV119894and V119895 the direction of arrow indicates the direction

of data transmission(3) Type (V) the type of the task For instance we can use

1 2 3 to represent different computing types Inaddition the type-set of tasks correspondswith that ofPEs which means that a task could only be scheduledto PEmatching its typeThis could be expressed by thematrix 119863 = 119889

119894119895 where the lines represent the tasks

the columns represent the PEs element 119889119894119895

= infin

represents task V119894which cannot be executed in 119875

119895and

119889119894119895

= 119886 represents task V119894which can be executed in 119875

119895

with the execution time of 119886(4) PCU the running cost of every type of PE per unit

time in which element PCU119903(1 lt 119903 lt 119898) represents

the running cost of 119903th type of PE per unit time(5) 119862 the collection of the communication overhead

of directed edge 119862119894119895

represents the transfer cost ofsubtasks V

119894and V

119895when they pass the directed edge

119890119894119895 When V

119894and V119895are scheduled to the same PE 119862

119894119895

equals zero

The target of task dividing and scheduling is to find aproper strategy of assigning and scheduling while meetingtask processing sequence and resource limitationwhich couldassign 119873 subtasks to PEs with proper amount and schedulethe execution order of every subtask in a reasonable mannerthus achieving minimum completion time of overall taskwith every task suiting the dependency graph Based on taskmodel an improved particle swarm algorithm is used toconduct computation

32 Coding and Decoding The resource occupation of everysubtask is encoded by indirect encoding The encodinglength depends on the amount of subtasks Every particlecorresponds to a certain task assigning strategy

Assume there exits 119873 subtasks which are encoded bysequential encoding in a task and 119872 PEs available whichare classified into 119898 types For example when 119873 = 10119898 = 3 particle (3 2 1 1 3 2 1 2 3 3) is a feasible scheduling

scheme the particle is encoded as shown in Table 1 and asshown in Table 2 by decoding the particle we can acquirethe assigning condition of subtasks in every type of PEThen as shown in Table 3 after assigning the subtasks PEsof reasonable amount are assigned to every type of PE inaccordance with the processing ability and the total amountof tasks to be processed

It follows from the task model that the running time ofevery subtask in different PEs is already knownThe runningtime on every type of PE is defined as

Sub TFT =

119899

sum

119894=1

119879119894119903 (2)

119879119894119903

represents the running time of subtask 119894 on the 119903thtype of PE and 119899 represents the amount of subtasks assignedto 119903th type of PE The execution time of the entire task isobtained as follows

TFT =

119896

Max119903=1

Sub TFT119903 (3)

The overall operation cost is given as

Run Cost =119896

sum

119903=1

Sub TFT119903sdot PCU

119903 (4)

Assuming that the task set in the119898th type of PE is119881119898and

the task set assigned to 119899th type of PE is 119881119899 the transfer cost

between PE119898and PE

119899is defined as

Tran Cost119898119899

= sum

forall119894119895

119862119894119895 (V

119894isin 119881119898 V119895isin 119881119899) (5)

The overall transfer cost is obtained as follows

Tran Cost = sum

forall119898 = 119899

Tran Cost119898119899

(6)

33 Initialization and Fitness Function Assuming that thepopulation size is 119904 amount of subtasks is 119873 and amountof types of PEs is 119898 the description of initialization of thepopulation can be as follows among the randomly generated119904 particles the position of 119894th particle is represented by vector119909119894= (1199091198941 1199091198942 119909

119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in which

119909119894119895

(1 le 119909119894119895

le 119898) represents that in the 119894th particle task 119895 isassigned to PEof119909

119894119895type for operation velocity is represented

4 Mathematical Problems in Engineering

Table 1 Example of particle coding

Subtask number 1 2 3 4 5 6 7 8 9 10Type of PE 3 2 1 1 3 2 1 2 3 3

Table 2 Example of decoding

Type of PE Subtask number1 3 4 72 2 6 83 1 5 9 10

Table 3 Task dividing

Type of PE Number of PE Subtask number1 1 3 4 72 2 2 6 83 3 1 53 4 9 10

by vector V119894= (V1198941 V1198942 V

119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in

which minus119898 le V119894119895le 119898

The fitness function of time is defined as

Fit Time (119894) =1

TFT119894

(1 le 119894 le 119904) (7)

where TFT119894represents the overall completion time of the 119894th

particle the fitness function of cost is obtained as follows

Fit Cost (119894) =1

Run Cost119894+ Tran Cost

119894

(1 le 119894 le 119904) (8)

The overall fitness function is obtained as follows

Fitness = Fit Time (119894) + Fit Cost (119894) (9)

The algorithm will select particles with higher fitnessvalue so that it could provide excellent basis for generatingexcellent particles of the next generation

34 Position and Velocity Updating In every iteration theparticle would update its velocity and position by (10)in accordance with its optimal historical position and theoptimal position of the population Only when the currentposition has better adaptive value comparing to its historicaloptimal position would the historical position be replaced bythe current position

V119896+1119894119889

= 119908119896sdot V119896119894+ 1198881sdot 1199031sdot (119875 best

119894minus 119909119896

119894)

+ 1198882sdot 1199032sdot (119866 best

119894minus 119909119896

119894)

119909119896+1

119894= 119909119896

119894+ V119896119894

(10)

119875 best119894is the best position experienced by 119894th particle

119866 best119894is the best position experienced by all particles in

the population119908119896is significant for balancing the algorithms

capability of global and local searching and the paper adoptsthe decreasing inertia weight as follows

119908119896=

119908start (119908start minus 119908end) (Gen minus 119896)

Gen (11)

119908start and 119908end represent respectively the initial inertiaweight and the inertia weight whenmaximum iteration timesGen is reached 119896 is the current iterations By adopting theinertia weight above an algorithm with strong global searchcapability in the early stage of iteration and more accuratelocal search capability in the late stage can be gotten

35 Flow of Algorithm

(1) Randomly initialize the position and velocity of theparticle swarm based on the description in ldquoInitial-ization and Fitness Functionrdquo

(2) Compute the velocity and position of every particle(3) Compute the fitness value of every particle and set

119875 best119894and 119866 best

119894

(4) If 119875 best119894and 119866 best

119894remain unchanged after many

iterations or the algorithm reached maximum iter-ations output the optimum solution end the algo-rithm and go to step 6

(5) Go to step 2(6) Assign PEs of reasonable amount to every type of PE

in accordance with the processing ability and totalamount of tasks to be processed

4 IP Mapping

After task dividing and scheduling the IP communicationdiagram is formed In the multicore system based on NoCthe further need is how to reasonablymap these PEs intoNoCnodes and minimize the network transmission delay duringthe task execution under conditions that the resources areless occupied and energy consumption is balancedThis is thequestion of IP mapping

There are often two orientations in IP mapping either tominimize the internal communication cost or to minimizethe external communication cost [27 28] Both orientationshave their pros and cons the former might lead to increasedcompetition among external resources and add more com-putation overhead later in mapping when increasing useratio of system resource the later tends to arrange surplusresources well and successfully decreases competition ofexternal resources with little changes in computation over-head However as each local mapping area is incomplete itproduces only second-best mapping solutions thus under-mining the global mapping optimization While designing

Mathematical Problems in Engineering 5

an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above

In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration

The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows

41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where

(1) 119875 represents the set of PEs in the communicationdiagram that is 119901

119894isin 119875 is a PE with execution task

(2) 119864 represents frontier set in DAG application thatis 119890119894119895

isin 119864 indicates that there exits data exchangebetween 119901i and 119901

119895

(3) 119862 represents communication cost in undirected edgeand 119862

119894119895represents the total communication data

between 119901119894and 119901

119895

It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping

Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already

Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix

42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows

Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102)

is defined as

MD (119894 119895) =10038161003816100381610038161199091 minus 119909

2

1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910

2

1003816100381610038161003816 (12)

Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102) is

defined as

ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)

Definition 3 Communication cost in mapped area is ob-tained as follows

Com cost = sum

forall119862119894119895isin119862

119862119894119895sdotMD (119871 (119901

119894) 119871 (119875

119895)) (14)

in which 119862119894119895

represents the total communication traf-fic between 119875

119894and 119875

119895in communication diagram and

MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped

position on topology between 119875119894and 119875

119895

The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results

The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ

1 ℎ2 ℎ

119894 with high communication

requirement the sequence is |ℎ1| ge |ℎ

2| ge sdot sdot sdot ge |ℎ

119894|

according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897

1 1198972 119897119894 without

high communication requirement the sequence is |1198971| ge

|1198972| ge sdot sdot sdot ge |119897

119894| according to amount of PEs contained The

execution steps of mapping algorithm are as follows

(1) Start mapping computation from collection ℎ1

choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping

Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area

(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree

6 Mathematical Problems in Engineering

1

21 3 4

21 3 4

41

4

5 6

2

1

3

4

5

6

7 8

5 6 7 8

9 10 11 12

1314 15

1613 1316 16

17

18 19

20 17

17

20

20

2122

2122

2324

2324

25262526 27282728

29 30 31 32

29 32

29 32

Y

X

(a) (b) (c)

Figure 2 Topology and its expression by matrix

21 3

2

223

3

4

2

4

5

4 5

6

2

1

3

4

5

6

7

6

7

6 7

8

X

Y

21

1

3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3

3 32 32

1

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

32

4 5 6

2

1

3

4

5

6

7 8

X

Y

middot middot middotP2 P1

Figure 3 Description of mapping process

Exec

utio

n tim

e (m

s)

9 subtasks 16 subtasks 25 subtasksTask scale

GAACO

PSOOPSO

10k8k6k4k2k

Figure 4 Comparison of algorithm velocity

(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped

(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area

(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped

Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area

The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area

5 Experiment and Simulation

The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4

Mathematical Problems in Engineering 7

GA

ACO

PSO

OPSO

8

4

Aver

age p

acke

t del

ay (c

lock

cycle

s)

9 PEs 16 PEs 25 PEsTask scale

(a)

40e + 007

30e + 007

20e + 007

10e + 007

Pow

er co

nsum

ptio

n

GA

ACO

PSO

OPSO

9 PEs 16 PEs 25 PEsTask scale

(b)

Figure 5 Comparison of mapping effect

The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling

6 Conclusion

In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005

[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009

[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013

[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013

[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002

[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006

[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990

[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994

[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988

[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992

[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994

[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE

8 Mathematical Problems in Engineering

Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013

[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010

[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013

[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010

[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008

[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013

[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007

[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005

[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010

[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006

[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000

[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009

[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011

[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012

[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012

[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008

[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

2 Mathematical Problems in Engineering

between tasks are usually neglected (2) as the interdepen-dence among tasks is complex the model only abstractedthe calling relationship between subtasks with the resultthat other factors cannot be fully reflected and that transfercosts among different PEs are inadequately considered Thescheduling order designed by these models is not satisfactoryin practical operation so that continuous recalculation andadjustment are required during the system operation whichinevitably brings additional burden to the system and posesthreats to operating efficiency

In addition in terms of the time of scheduling decisiontask scheduling can be divided into static scheduling anddynamic scheduling Static scheduling means that the com-piler makes scheduling decision at compiling time for exam-ple list-based algorithms [5 6] clustering algorithms [7ndash9] and duplication-based algorithms [10 11] However staticscheduling model has some drawbacks as the model is anapproximation of communication and execution time amongprocessors it might disagree with the actual implementationof the program or even produce poor scheduling results

Dynamic scheduling means that a scheduler needs toschedule tasks to appropriate processors for the implemen-tation according to their performance and in a real-time wayso that the various requirements for the system can be metResearch in this areamainly employ heuristic algorithm suchas genetic algorithm (GA) [12] and ant-colony-based opti-mization (ACO) [13 14] heuristic task scheduling dynamicscheduling algorithm based on task pool [15] particle swarmoptimization (PSO) [16 17] optimized evolutionary algo-rithm [18 19] and dynamic scheduling algorithm based onreal-time constrains [20] Although good scheduling resultscould be attained when these approaches are applied in taskpartitioning andmapping in practice the inherent defects ofthese algorithms easily result in many drawbacks during theoperation for example the convergence speed is slow in thelate stage of genetic algorithm and in the early stage of antcolony algorithm the inadequate coverage of all collectionswill lead to disparity between its result and the optimumvalue particle swarm optimization is vulnerable to involvinglocal optimization problems

Meanwhile in the aspects of NoC topology throughsilicon via (TSV) technology [21] and optical interconnectiontechnology [22 23] havemade possible higher IP core densitywider bandwidth less power consumption and smaller sizeon integrated circuit chips However the resource occupancyand power consumption brought byNoCmust be consideredIn order to decline the NoC occupancy of limited resourceand further decrease power consumption various kinds ofheterogeneousNoC topology are designed [24ndash26] to suit dif-ferentiated needs for network transmission delay and band-width of different types of PEs Currently most algorithmshave not taken the effect of heterogeneous topology on systemperformance into consideration If PEs of different types inthe premise of balanced power consumption are mapped toreasonable area according to performance requirement anddata transmission delay are minimized the performance ofsystem could be greatly improved

Based on the analysis above the whole design processis divided into two stages As shown in Figure 1 the first

stage is task dividing and scheduling When the improvedtask model could faithfully reflect the real intertask relationthe local optimum question of particle swarm algorithm issolved and the optimized PSO algorithm is used to divide abig task into proper granular-sized small tasks featuring highcohesion and low coupling according to traffic and callingrelationship There exits high parallelism among these smalltasks Then assign these small tasks to corresponding PEaccording to the task nature and generate communicationdiagram to achieve the first step with expected less transfercost as well as the least task execution time Then the processcomes to the IP mapping stage In this stage by referring tocommunication diagram and the performance disparity anddelay information of topology of NoC the PEs are reasonablymapped into switching node of NoC so as to achieve leastnetwork transmission delay with less resource occupancy andeven power consumption and less resource pieces so that thesystem performance could avoid fluctuation when new tasksneed scheduling

The rest of the paper is organized as follows Section 3shows the detailed description of task dividing and schedul-ing Section 4 illustrates the process of IP mapping A com-parative experiment result is shown in Section 5 Section 6concludes the paper

3 Task Dividing and Scheduling

Although the types and quantities of PEs integrated in hetero-geneous multicore system based on NoC are expanding thesize of application task varies and the current task schedulingalgorithm often assign and map the task in accordance withthe numbers of utilizable PEs which to some tasks of smallsize may result into problems on one hand as the tasks aredivided into subtasks of extremely small size communica-tions among subtaskswould become overfrequentwhichmaylead to prolonged task execution time on the other handinadequate utilization of the performance of PEs may resultinto increased system power consumption and reduce overallsystem efficiency

This paper superimposes tasks on a PE until the com-puting resource of the PE is occupied at an appropriateratio (settings are based on the performance requirement ofsystem as well as PEs) and then new PEs are added Theapproach not only ensures that tasks are divided into subtasksof appropriate size but also ensures that every PE invoked isefficiently used thus bringing the best overall performance

31 Task Model A task could be divided into 119873 subtasksamong which there exits certain execution sequence orcontrol logic and these subtasks are processed by119872 (119898 types119898 le 119872) PEs Assuming that the processing time of 119898 typesof PEs for every subtask communication overhead amongPEs and amount of data transmission among interdependentsubtasks are known the task on heterogeneous multicore canbe abstracted into a quintuple

DAG = (119881 119864TypePCU 119862) (1)

Mathematical Problems in Engineering 3

2

4

5 5

6

7

8

8 1012

14

1415

16

18

20

20

25

30

t1

t2

t3

t4

t5

t6

t7

t7

t8

PE1 PE1PE2

PE2PE3 PE4

PE3

PE4

S1 S1

S2S2

Figure 1 Two stages of task scheduling and IP mapping

(1) 119881 task node-set inDAGapplication that is the vertexV isin 119881means that V is a subtask in119881 And the numberof subtasks in DAG application is 119873

(2) 119864 the frontier set in DAG application that is 119890119894119895

isin 119864

means that there exits data communication betweenV119894and V119895 the direction of arrow indicates the direction

of data transmission(3) Type (V) the type of the task For instance we can use

1 2 3 to represent different computing types Inaddition the type-set of tasks correspondswith that ofPEs which means that a task could only be scheduledto PEmatching its typeThis could be expressed by thematrix 119863 = 119889

119894119895 where the lines represent the tasks

the columns represent the PEs element 119889119894119895

= infin

represents task V119894which cannot be executed in 119875

119895and

119889119894119895

= 119886 represents task V119894which can be executed in 119875

119895

with the execution time of 119886(4) PCU the running cost of every type of PE per unit

time in which element PCU119903(1 lt 119903 lt 119898) represents

the running cost of 119903th type of PE per unit time(5) 119862 the collection of the communication overhead

of directed edge 119862119894119895

represents the transfer cost ofsubtasks V

119894and V

119895when they pass the directed edge

119890119894119895 When V

119894and V119895are scheduled to the same PE 119862

119894119895

equals zero

The target of task dividing and scheduling is to find aproper strategy of assigning and scheduling while meetingtask processing sequence and resource limitationwhich couldassign 119873 subtasks to PEs with proper amount and schedulethe execution order of every subtask in a reasonable mannerthus achieving minimum completion time of overall taskwith every task suiting the dependency graph Based on taskmodel an improved particle swarm algorithm is used toconduct computation

32 Coding and Decoding The resource occupation of everysubtask is encoded by indirect encoding The encodinglength depends on the amount of subtasks Every particlecorresponds to a certain task assigning strategy

Assume there exits 119873 subtasks which are encoded bysequential encoding in a task and 119872 PEs available whichare classified into 119898 types For example when 119873 = 10119898 = 3 particle (3 2 1 1 3 2 1 2 3 3) is a feasible scheduling

scheme the particle is encoded as shown in Table 1 and asshown in Table 2 by decoding the particle we can acquirethe assigning condition of subtasks in every type of PEThen as shown in Table 3 after assigning the subtasks PEsof reasonable amount are assigned to every type of PE inaccordance with the processing ability and the total amountof tasks to be processed

It follows from the task model that the running time ofevery subtask in different PEs is already knownThe runningtime on every type of PE is defined as

Sub TFT =

119899

sum

119894=1

119879119894119903 (2)

119879119894119903

represents the running time of subtask 119894 on the 119903thtype of PE and 119899 represents the amount of subtasks assignedto 119903th type of PE The execution time of the entire task isobtained as follows

TFT =

119896

Max119903=1

Sub TFT119903 (3)

The overall operation cost is given as

Run Cost =119896

sum

119903=1

Sub TFT119903sdot PCU

119903 (4)

Assuming that the task set in the119898th type of PE is119881119898and

the task set assigned to 119899th type of PE is 119881119899 the transfer cost

between PE119898and PE

119899is defined as

Tran Cost119898119899

= sum

forall119894119895

119862119894119895 (V

119894isin 119881119898 V119895isin 119881119899) (5)

The overall transfer cost is obtained as follows

Tran Cost = sum

forall119898 = 119899

Tran Cost119898119899

(6)

33 Initialization and Fitness Function Assuming that thepopulation size is 119904 amount of subtasks is 119873 and amountof types of PEs is 119898 the description of initialization of thepopulation can be as follows among the randomly generated119904 particles the position of 119894th particle is represented by vector119909119894= (1199091198941 1199091198942 119909

119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in which

119909119894119895

(1 le 119909119894119895

le 119898) represents that in the 119894th particle task 119895 isassigned to PEof119909

119894119895type for operation velocity is represented

4 Mathematical Problems in Engineering

Table 1 Example of particle coding

Subtask number 1 2 3 4 5 6 7 8 9 10Type of PE 3 2 1 1 3 2 1 2 3 3

Table 2 Example of decoding

Type of PE Subtask number1 3 4 72 2 6 83 1 5 9 10

Table 3 Task dividing

Type of PE Number of PE Subtask number1 1 3 4 72 2 2 6 83 3 1 53 4 9 10

by vector V119894= (V1198941 V1198942 V

119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in

which minus119898 le V119894119895le 119898

The fitness function of time is defined as

Fit Time (119894) =1

TFT119894

(1 le 119894 le 119904) (7)

where TFT119894represents the overall completion time of the 119894th

particle the fitness function of cost is obtained as follows

Fit Cost (119894) =1

Run Cost119894+ Tran Cost

119894

(1 le 119894 le 119904) (8)

The overall fitness function is obtained as follows

Fitness = Fit Time (119894) + Fit Cost (119894) (9)

The algorithm will select particles with higher fitnessvalue so that it could provide excellent basis for generatingexcellent particles of the next generation

34 Position and Velocity Updating In every iteration theparticle would update its velocity and position by (10)in accordance with its optimal historical position and theoptimal position of the population Only when the currentposition has better adaptive value comparing to its historicaloptimal position would the historical position be replaced bythe current position

V119896+1119894119889

= 119908119896sdot V119896119894+ 1198881sdot 1199031sdot (119875 best

119894minus 119909119896

119894)

+ 1198882sdot 1199032sdot (119866 best

119894minus 119909119896

119894)

119909119896+1

119894= 119909119896

119894+ V119896119894

(10)

119875 best119894is the best position experienced by 119894th particle

119866 best119894is the best position experienced by all particles in

the population119908119896is significant for balancing the algorithms

capability of global and local searching and the paper adoptsthe decreasing inertia weight as follows

119908119896=

119908start (119908start minus 119908end) (Gen minus 119896)

Gen (11)

119908start and 119908end represent respectively the initial inertiaweight and the inertia weight whenmaximum iteration timesGen is reached 119896 is the current iterations By adopting theinertia weight above an algorithm with strong global searchcapability in the early stage of iteration and more accuratelocal search capability in the late stage can be gotten

35 Flow of Algorithm

(1) Randomly initialize the position and velocity of theparticle swarm based on the description in ldquoInitial-ization and Fitness Functionrdquo

(2) Compute the velocity and position of every particle(3) Compute the fitness value of every particle and set

119875 best119894and 119866 best

119894

(4) If 119875 best119894and 119866 best

119894remain unchanged after many

iterations or the algorithm reached maximum iter-ations output the optimum solution end the algo-rithm and go to step 6

(5) Go to step 2(6) Assign PEs of reasonable amount to every type of PE

in accordance with the processing ability and totalamount of tasks to be processed

4 IP Mapping

After task dividing and scheduling the IP communicationdiagram is formed In the multicore system based on NoCthe further need is how to reasonablymap these PEs intoNoCnodes and minimize the network transmission delay duringthe task execution under conditions that the resources areless occupied and energy consumption is balancedThis is thequestion of IP mapping

There are often two orientations in IP mapping either tominimize the internal communication cost or to minimizethe external communication cost [27 28] Both orientationshave their pros and cons the former might lead to increasedcompetition among external resources and add more com-putation overhead later in mapping when increasing useratio of system resource the later tends to arrange surplusresources well and successfully decreases competition ofexternal resources with little changes in computation over-head However as each local mapping area is incomplete itproduces only second-best mapping solutions thus under-mining the global mapping optimization While designing

Mathematical Problems in Engineering 5

an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above

In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration

The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows

41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where

(1) 119875 represents the set of PEs in the communicationdiagram that is 119901

119894isin 119875 is a PE with execution task

(2) 119864 represents frontier set in DAG application thatis 119890119894119895

isin 119864 indicates that there exits data exchangebetween 119901i and 119901

119895

(3) 119862 represents communication cost in undirected edgeand 119862

119894119895represents the total communication data

between 119901119894and 119901

119895

It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping

Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already

Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix

42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows

Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102)

is defined as

MD (119894 119895) =10038161003816100381610038161199091 minus 119909

2

1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910

2

1003816100381610038161003816 (12)

Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102) is

defined as

ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)

Definition 3 Communication cost in mapped area is ob-tained as follows

Com cost = sum

forall119862119894119895isin119862

119862119894119895sdotMD (119871 (119901

119894) 119871 (119875

119895)) (14)

in which 119862119894119895

represents the total communication traf-fic between 119875

119894and 119875

119895in communication diagram and

MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped

position on topology between 119875119894and 119875

119895

The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results

The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ

1 ℎ2 ℎ

119894 with high communication

requirement the sequence is |ℎ1| ge |ℎ

2| ge sdot sdot sdot ge |ℎ

119894|

according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897

1 1198972 119897119894 without

high communication requirement the sequence is |1198971| ge

|1198972| ge sdot sdot sdot ge |119897

119894| according to amount of PEs contained The

execution steps of mapping algorithm are as follows

(1) Start mapping computation from collection ℎ1

choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping

Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area

(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree

6 Mathematical Problems in Engineering

1

21 3 4

21 3 4

41

4

5 6

2

1

3

4

5

6

7 8

5 6 7 8

9 10 11 12

1314 15

1613 1316 16

17

18 19

20 17

17

20

20

2122

2122

2324

2324

25262526 27282728

29 30 31 32

29 32

29 32

Y

X

(a) (b) (c)

Figure 2 Topology and its expression by matrix

21 3

2

223

3

4

2

4

5

4 5

6

2

1

3

4

5

6

7

6

7

6 7

8

X

Y

21

1

3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3

3 32 32

1

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

32

4 5 6

2

1

3

4

5

6

7 8

X

Y

middot middot middotP2 P1

Figure 3 Description of mapping process

Exec

utio

n tim

e (m

s)

9 subtasks 16 subtasks 25 subtasksTask scale

GAACO

PSOOPSO

10k8k6k4k2k

Figure 4 Comparison of algorithm velocity

(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped

(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area

(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped

Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area

The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area

5 Experiment and Simulation

The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4

Mathematical Problems in Engineering 7

GA

ACO

PSO

OPSO

8

4

Aver

age p

acke

t del

ay (c

lock

cycle

s)

9 PEs 16 PEs 25 PEsTask scale

(a)

40e + 007

30e + 007

20e + 007

10e + 007

Pow

er co

nsum

ptio

n

GA

ACO

PSO

OPSO

9 PEs 16 PEs 25 PEsTask scale

(b)

Figure 5 Comparison of mapping effect

The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling

6 Conclusion

In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005

[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009

[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013

[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013

[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002

[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006

[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990

[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994

[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988

[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992

[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994

[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE

8 Mathematical Problems in Engineering

Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013

[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010

[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013

[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010

[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008

[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013

[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007

[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005

[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010

[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006

[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000

[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009

[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011

[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012

[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012

[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008

[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

Mathematical Problems in Engineering 3

2

4

5 5

6

7

8

8 1012

14

1415

16

18

20

20

25

30

t1

t2

t3

t4

t5

t6

t7

t7

t8

PE1 PE1PE2

PE2PE3 PE4

PE3

PE4

S1 S1

S2S2

Figure 1 Two stages of task scheduling and IP mapping

(1) 119881 task node-set inDAGapplication that is the vertexV isin 119881means that V is a subtask in119881 And the numberof subtasks in DAG application is 119873

(2) 119864 the frontier set in DAG application that is 119890119894119895

isin 119864

means that there exits data communication betweenV119894and V119895 the direction of arrow indicates the direction

of data transmission(3) Type (V) the type of the task For instance we can use

1 2 3 to represent different computing types Inaddition the type-set of tasks correspondswith that ofPEs which means that a task could only be scheduledto PEmatching its typeThis could be expressed by thematrix 119863 = 119889

119894119895 where the lines represent the tasks

the columns represent the PEs element 119889119894119895

= infin

represents task V119894which cannot be executed in 119875

119895and

119889119894119895

= 119886 represents task V119894which can be executed in 119875

119895

with the execution time of 119886(4) PCU the running cost of every type of PE per unit

time in which element PCU119903(1 lt 119903 lt 119898) represents

the running cost of 119903th type of PE per unit time(5) 119862 the collection of the communication overhead

of directed edge 119862119894119895

represents the transfer cost ofsubtasks V

119894and V

119895when they pass the directed edge

119890119894119895 When V

119894and V119895are scheduled to the same PE 119862

119894119895

equals zero

The target of task dividing and scheduling is to find aproper strategy of assigning and scheduling while meetingtask processing sequence and resource limitationwhich couldassign 119873 subtasks to PEs with proper amount and schedulethe execution order of every subtask in a reasonable mannerthus achieving minimum completion time of overall taskwith every task suiting the dependency graph Based on taskmodel an improved particle swarm algorithm is used toconduct computation

32 Coding and Decoding The resource occupation of everysubtask is encoded by indirect encoding The encodinglength depends on the amount of subtasks Every particlecorresponds to a certain task assigning strategy

Assume there exits 119873 subtasks which are encoded bysequential encoding in a task and 119872 PEs available whichare classified into 119898 types For example when 119873 = 10119898 = 3 particle (3 2 1 1 3 2 1 2 3 3) is a feasible scheduling

scheme the particle is encoded as shown in Table 1 and asshown in Table 2 by decoding the particle we can acquirethe assigning condition of subtasks in every type of PEThen as shown in Table 3 after assigning the subtasks PEsof reasonable amount are assigned to every type of PE inaccordance with the processing ability and the total amountof tasks to be processed

It follows from the task model that the running time ofevery subtask in different PEs is already knownThe runningtime on every type of PE is defined as

Sub TFT =

119899

sum

119894=1

119879119894119903 (2)

119879119894119903

represents the running time of subtask 119894 on the 119903thtype of PE and 119899 represents the amount of subtasks assignedto 119903th type of PE The execution time of the entire task isobtained as follows

TFT =

119896

Max119903=1

Sub TFT119903 (3)

The overall operation cost is given as

Run Cost =119896

sum

119903=1

Sub TFT119903sdot PCU

119903 (4)

Assuming that the task set in the119898th type of PE is119881119898and

the task set assigned to 119899th type of PE is 119881119899 the transfer cost

between PE119898and PE

119899is defined as

Tran Cost119898119899

= sum

forall119894119895

119862119894119895 (V

119894isin 119881119898 V119895isin 119881119899) (5)

The overall transfer cost is obtained as follows

Tran Cost = sum

forall119898 = 119899

Tran Cost119898119899

(6)

33 Initialization and Fitness Function Assuming that thepopulation size is 119904 amount of subtasks is 119873 and amountof types of PEs is 119898 the description of initialization of thepopulation can be as follows among the randomly generated119904 particles the position of 119894th particle is represented by vector119909119894= (1199091198941 1199091198942 119909

119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in which

119909119894119895

(1 le 119909119894119895

le 119898) represents that in the 119894th particle task 119895 isassigned to PEof119909

119894119895type for operation velocity is represented

4 Mathematical Problems in Engineering

Table 1 Example of particle coding

Subtask number 1 2 3 4 5 6 7 8 9 10Type of PE 3 2 1 1 3 2 1 2 3 3

Table 2 Example of decoding

Type of PE Subtask number1 3 4 72 2 6 83 1 5 9 10

Table 3 Task dividing

Type of PE Number of PE Subtask number1 1 3 4 72 2 2 6 83 3 1 53 4 9 10

by vector V119894= (V1198941 V1198942 V

119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in

which minus119898 le V119894119895le 119898

The fitness function of time is defined as

Fit Time (119894) =1

TFT119894

(1 le 119894 le 119904) (7)

where TFT119894represents the overall completion time of the 119894th

particle the fitness function of cost is obtained as follows

Fit Cost (119894) =1

Run Cost119894+ Tran Cost

119894

(1 le 119894 le 119904) (8)

The overall fitness function is obtained as follows

Fitness = Fit Time (119894) + Fit Cost (119894) (9)

The algorithm will select particles with higher fitnessvalue so that it could provide excellent basis for generatingexcellent particles of the next generation

34 Position and Velocity Updating In every iteration theparticle would update its velocity and position by (10)in accordance with its optimal historical position and theoptimal position of the population Only when the currentposition has better adaptive value comparing to its historicaloptimal position would the historical position be replaced bythe current position

V119896+1119894119889

= 119908119896sdot V119896119894+ 1198881sdot 1199031sdot (119875 best

119894minus 119909119896

119894)

+ 1198882sdot 1199032sdot (119866 best

119894minus 119909119896

119894)

119909119896+1

119894= 119909119896

119894+ V119896119894

(10)

119875 best119894is the best position experienced by 119894th particle

119866 best119894is the best position experienced by all particles in

the population119908119896is significant for balancing the algorithms

capability of global and local searching and the paper adoptsthe decreasing inertia weight as follows

119908119896=

119908start (119908start minus 119908end) (Gen minus 119896)

Gen (11)

119908start and 119908end represent respectively the initial inertiaweight and the inertia weight whenmaximum iteration timesGen is reached 119896 is the current iterations By adopting theinertia weight above an algorithm with strong global searchcapability in the early stage of iteration and more accuratelocal search capability in the late stage can be gotten

35 Flow of Algorithm

(1) Randomly initialize the position and velocity of theparticle swarm based on the description in ldquoInitial-ization and Fitness Functionrdquo

(2) Compute the velocity and position of every particle(3) Compute the fitness value of every particle and set

119875 best119894and 119866 best

119894

(4) If 119875 best119894and 119866 best

119894remain unchanged after many

iterations or the algorithm reached maximum iter-ations output the optimum solution end the algo-rithm and go to step 6

(5) Go to step 2(6) Assign PEs of reasonable amount to every type of PE

in accordance with the processing ability and totalamount of tasks to be processed

4 IP Mapping

After task dividing and scheduling the IP communicationdiagram is formed In the multicore system based on NoCthe further need is how to reasonablymap these PEs intoNoCnodes and minimize the network transmission delay duringthe task execution under conditions that the resources areless occupied and energy consumption is balancedThis is thequestion of IP mapping

There are often two orientations in IP mapping either tominimize the internal communication cost or to minimizethe external communication cost [27 28] Both orientationshave their pros and cons the former might lead to increasedcompetition among external resources and add more com-putation overhead later in mapping when increasing useratio of system resource the later tends to arrange surplusresources well and successfully decreases competition ofexternal resources with little changes in computation over-head However as each local mapping area is incomplete itproduces only second-best mapping solutions thus under-mining the global mapping optimization While designing

Mathematical Problems in Engineering 5

an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above

In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration

The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows

41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where

(1) 119875 represents the set of PEs in the communicationdiagram that is 119901

119894isin 119875 is a PE with execution task

(2) 119864 represents frontier set in DAG application thatis 119890119894119895

isin 119864 indicates that there exits data exchangebetween 119901i and 119901

119895

(3) 119862 represents communication cost in undirected edgeand 119862

119894119895represents the total communication data

between 119901119894and 119901

119895

It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping

Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already

Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix

42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows

Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102)

is defined as

MD (119894 119895) =10038161003816100381610038161199091 minus 119909

2

1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910

2

1003816100381610038161003816 (12)

Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102) is

defined as

ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)

Definition 3 Communication cost in mapped area is ob-tained as follows

Com cost = sum

forall119862119894119895isin119862

119862119894119895sdotMD (119871 (119901

119894) 119871 (119875

119895)) (14)

in which 119862119894119895

represents the total communication traf-fic between 119875

119894and 119875

119895in communication diagram and

MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped

position on topology between 119875119894and 119875

119895

The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results

The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ

1 ℎ2 ℎ

119894 with high communication

requirement the sequence is |ℎ1| ge |ℎ

2| ge sdot sdot sdot ge |ℎ

119894|

according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897

1 1198972 119897119894 without

high communication requirement the sequence is |1198971| ge

|1198972| ge sdot sdot sdot ge |119897

119894| according to amount of PEs contained The

execution steps of mapping algorithm are as follows

(1) Start mapping computation from collection ℎ1

choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping

Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area

(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree

6 Mathematical Problems in Engineering

1

21 3 4

21 3 4

41

4

5 6

2

1

3

4

5

6

7 8

5 6 7 8

9 10 11 12

1314 15

1613 1316 16

17

18 19

20 17

17

20

20

2122

2122

2324

2324

25262526 27282728

29 30 31 32

29 32

29 32

Y

X

(a) (b) (c)

Figure 2 Topology and its expression by matrix

21 3

2

223

3

4

2

4

5

4 5

6

2

1

3

4

5

6

7

6

7

6 7

8

X

Y

21

1

3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3

3 32 32

1

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

32

4 5 6

2

1

3

4

5

6

7 8

X

Y

middot middot middotP2 P1

Figure 3 Description of mapping process

Exec

utio

n tim

e (m

s)

9 subtasks 16 subtasks 25 subtasksTask scale

GAACO

PSOOPSO

10k8k6k4k2k

Figure 4 Comparison of algorithm velocity

(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped

(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area

(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped

Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area

The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area

5 Experiment and Simulation

The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4

Mathematical Problems in Engineering 7

GA

ACO

PSO

OPSO

8

4

Aver

age p

acke

t del

ay (c

lock

cycle

s)

9 PEs 16 PEs 25 PEsTask scale

(a)

40e + 007

30e + 007

20e + 007

10e + 007

Pow

er co

nsum

ptio

n

GA

ACO

PSO

OPSO

9 PEs 16 PEs 25 PEsTask scale

(b)

Figure 5 Comparison of mapping effect

The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling

6 Conclusion

In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005

[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009

[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013

[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013

[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002

[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006

[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990

[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994

[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988

[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992

[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994

[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE

8 Mathematical Problems in Engineering

Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013

[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010

[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013

[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010

[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008

[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013

[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007

[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005

[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010

[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006

[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000

[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009

[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011

[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012

[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012

[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008

[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

4 Mathematical Problems in Engineering

Table 1 Example of particle coding

Subtask number 1 2 3 4 5 6 7 8 9 10Type of PE 3 2 1 1 3 2 1 2 3 3

Table 2 Example of decoding

Type of PE Subtask number1 3 4 72 2 6 83 1 5 9 10

Table 3 Task dividing

Type of PE Number of PE Subtask number1 1 3 4 72 2 2 6 83 3 1 53 4 9 10

by vector V119894= (V1198941 V1198942 V

119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in

which minus119898 le V119894119895le 119898

The fitness function of time is defined as

Fit Time (119894) =1

TFT119894

(1 le 119894 le 119904) (7)

where TFT119894represents the overall completion time of the 119894th

particle the fitness function of cost is obtained as follows

Fit Cost (119894) =1

Run Cost119894+ Tran Cost

119894

(1 le 119894 le 119904) (8)

The overall fitness function is obtained as follows

Fitness = Fit Time (119894) + Fit Cost (119894) (9)

The algorithm will select particles with higher fitnessvalue so that it could provide excellent basis for generatingexcellent particles of the next generation

34 Position and Velocity Updating In every iteration theparticle would update its velocity and position by (10)in accordance with its optimal historical position and theoptimal position of the population Only when the currentposition has better adaptive value comparing to its historicaloptimal position would the historical position be replaced bythe current position

V119896+1119894119889

= 119908119896sdot V119896119894+ 1198881sdot 1199031sdot (119875 best

119894minus 119909119896

119894)

+ 1198882sdot 1199032sdot (119866 best

119894minus 119909119896

119894)

119909119896+1

119894= 119909119896

119894+ V119896119894

(10)

119875 best119894is the best position experienced by 119894th particle

119866 best119894is the best position experienced by all particles in

the population119908119896is significant for balancing the algorithms

capability of global and local searching and the paper adoptsthe decreasing inertia weight as follows

119908119896=

119908start (119908start minus 119908end) (Gen minus 119896)

Gen (11)

119908start and 119908end represent respectively the initial inertiaweight and the inertia weight whenmaximum iteration timesGen is reached 119896 is the current iterations By adopting theinertia weight above an algorithm with strong global searchcapability in the early stage of iteration and more accuratelocal search capability in the late stage can be gotten

35 Flow of Algorithm

(1) Randomly initialize the position and velocity of theparticle swarm based on the description in ldquoInitial-ization and Fitness Functionrdquo

(2) Compute the velocity and position of every particle(3) Compute the fitness value of every particle and set

119875 best119894and 119866 best

119894

(4) If 119875 best119894and 119866 best

119894remain unchanged after many

iterations or the algorithm reached maximum iter-ations output the optimum solution end the algo-rithm and go to step 6

(5) Go to step 2(6) Assign PEs of reasonable amount to every type of PE

in accordance with the processing ability and totalamount of tasks to be processed

4 IP Mapping

After task dividing and scheduling the IP communicationdiagram is formed In the multicore system based on NoCthe further need is how to reasonablymap these PEs intoNoCnodes and minimize the network transmission delay duringthe task execution under conditions that the resources areless occupied and energy consumption is balancedThis is thequestion of IP mapping

There are often two orientations in IP mapping either tominimize the internal communication cost or to minimizethe external communication cost [27 28] Both orientationshave their pros and cons the former might lead to increasedcompetition among external resources and add more com-putation overhead later in mapping when increasing useratio of system resource the later tends to arrange surplusresources well and successfully decreases competition ofexternal resources with little changes in computation over-head However as each local mapping area is incomplete itproduces only second-best mapping solutions thus under-mining the global mapping optimization While designing

Mathematical Problems in Engineering 5

an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above

In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration

The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows

41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where

(1) 119875 represents the set of PEs in the communicationdiagram that is 119901

119894isin 119875 is a PE with execution task

(2) 119864 represents frontier set in DAG application thatis 119890119894119895

isin 119864 indicates that there exits data exchangebetween 119901i and 119901

119895

(3) 119862 represents communication cost in undirected edgeand 119862

119894119895represents the total communication data

between 119901119894and 119901

119895

It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping

Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already

Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix

42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows

Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102)

is defined as

MD (119894 119895) =10038161003816100381610038161199091 minus 119909

2

1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910

2

1003816100381610038161003816 (12)

Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102) is

defined as

ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)

Definition 3 Communication cost in mapped area is ob-tained as follows

Com cost = sum

forall119862119894119895isin119862

119862119894119895sdotMD (119871 (119901

119894) 119871 (119875

119895)) (14)

in which 119862119894119895

represents the total communication traf-fic between 119875

119894and 119875

119895in communication diagram and

MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped

position on topology between 119875119894and 119875

119895

The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results

The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ

1 ℎ2 ℎ

119894 with high communication

requirement the sequence is |ℎ1| ge |ℎ

2| ge sdot sdot sdot ge |ℎ

119894|

according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897

1 1198972 119897119894 without

high communication requirement the sequence is |1198971| ge

|1198972| ge sdot sdot sdot ge |119897

119894| according to amount of PEs contained The

execution steps of mapping algorithm are as follows

(1) Start mapping computation from collection ℎ1

choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping

Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area

(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree

6 Mathematical Problems in Engineering

1

21 3 4

21 3 4

41

4

5 6

2

1

3

4

5

6

7 8

5 6 7 8

9 10 11 12

1314 15

1613 1316 16

17

18 19

20 17

17

20

20

2122

2122

2324

2324

25262526 27282728

29 30 31 32

29 32

29 32

Y

X

(a) (b) (c)

Figure 2 Topology and its expression by matrix

21 3

2

223

3

4

2

4

5

4 5

6

2

1

3

4

5

6

7

6

7

6 7

8

X

Y

21

1

3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3

3 32 32

1

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

32

4 5 6

2

1

3

4

5

6

7 8

X

Y

middot middot middotP2 P1

Figure 3 Description of mapping process

Exec

utio

n tim

e (m

s)

9 subtasks 16 subtasks 25 subtasksTask scale

GAACO

PSOOPSO

10k8k6k4k2k

Figure 4 Comparison of algorithm velocity

(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped

(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area

(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped

Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area

The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area

5 Experiment and Simulation

The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4

Mathematical Problems in Engineering 7

GA

ACO

PSO

OPSO

8

4

Aver

age p

acke

t del

ay (c

lock

cycle

s)

9 PEs 16 PEs 25 PEsTask scale

(a)

40e + 007

30e + 007

20e + 007

10e + 007

Pow

er co

nsum

ptio

n

GA

ACO

PSO

OPSO

9 PEs 16 PEs 25 PEsTask scale

(b)

Figure 5 Comparison of mapping effect

The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling

6 Conclusion

In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005

[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009

[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013

[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013

[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002

[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006

[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990

[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994

[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988

[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992

[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994

[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE

8 Mathematical Problems in Engineering

Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013

[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010

[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013

[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010

[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008

[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013

[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007

[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005

[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010

[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006

[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000

[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009

[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011

[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012

[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012

[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008

[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

Mathematical Problems in Engineering 5

an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above

In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration

The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows

41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where

(1) 119875 represents the set of PEs in the communicationdiagram that is 119901

119894isin 119875 is a PE with execution task

(2) 119864 represents frontier set in DAG application thatis 119890119894119895

isin 119864 indicates that there exits data exchangebetween 119901i and 119901

119895

(3) 119862 represents communication cost in undirected edgeand 119862

119894119895represents the total communication data

between 119901119894and 119901

119895

It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping

Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already

Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix

42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows

Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102)

is defined as

MD (119894 119895) =10038161003816100381610038161199091 minus 119909

2

1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910

2

1003816100381610038161003816 (12)

Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875

119894(1199091 1199101) and 119875

119895(1199092 1199102) is

defined as

ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)

Definition 3 Communication cost in mapped area is ob-tained as follows

Com cost = sum

forall119862119894119895isin119862

119862119894119895sdotMD (119871 (119901

119894) 119871 (119875

119895)) (14)

in which 119862119894119895

represents the total communication traf-fic between 119875

119894and 119875

119895in communication diagram and

MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped

position on topology between 119875119894and 119875

119895

The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results

The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ

1 ℎ2 ℎ

119894 with high communication

requirement the sequence is |ℎ1| ge |ℎ

2| ge sdot sdot sdot ge |ℎ

119894|

according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897

1 1198972 119897119894 without

high communication requirement the sequence is |1198971| ge

|1198972| ge sdot sdot sdot ge |119897

119894| according to amount of PEs contained The

execution steps of mapping algorithm are as follows

(1) Start mapping computation from collection ℎ1

choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping

Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area

(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree

6 Mathematical Problems in Engineering

1

21 3 4

21 3 4

41

4

5 6

2

1

3

4

5

6

7 8

5 6 7 8

9 10 11 12

1314 15

1613 1316 16

17

18 19

20 17

17

20

20

2122

2122

2324

2324

25262526 27282728

29 30 31 32

29 32

29 32

Y

X

(a) (b) (c)

Figure 2 Topology and its expression by matrix

21 3

2

223

3

4

2

4

5

4 5

6

2

1

3

4

5

6

7

6

7

6 7

8

X

Y

21

1

3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3

3 32 32

1

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

32

4 5 6

2

1

3

4

5

6

7 8

X

Y

middot middot middotP2 P1

Figure 3 Description of mapping process

Exec

utio

n tim

e (m

s)

9 subtasks 16 subtasks 25 subtasksTask scale

GAACO

PSOOPSO

10k8k6k4k2k

Figure 4 Comparison of algorithm velocity

(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped

(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area

(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped

Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area

The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area

5 Experiment and Simulation

The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4

Mathematical Problems in Engineering 7

GA

ACO

PSO

OPSO

8

4

Aver

age p

acke

t del

ay (c

lock

cycle

s)

9 PEs 16 PEs 25 PEsTask scale

(a)

40e + 007

30e + 007

20e + 007

10e + 007

Pow

er co

nsum

ptio

n

GA

ACO

PSO

OPSO

9 PEs 16 PEs 25 PEsTask scale

(b)

Figure 5 Comparison of mapping effect

The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling

6 Conclusion

In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005

[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009

[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013

[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013

[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002

[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006

[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990

[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994

[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988

[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992

[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994

[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE

8 Mathematical Problems in Engineering

Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013

[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010

[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013

[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010

[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008

[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013

[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007

[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005

[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010

[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006

[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000

[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009

[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011

[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012

[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012

[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008

[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

6 Mathematical Problems in Engineering

1

21 3 4

21 3 4

41

4

5 6

2

1

3

4

5

6

7 8

5 6 7 8

9 10 11 12

1314 15

1613 1316 16

17

18 19

20 17

17

20

20

2122

2122

2324

2324

25262526 27282728

29 30 31 32

29 32

29 32

Y

X

(a) (b) (c)

Figure 2 Topology and its expression by matrix

21 3

2

223

3

4

2

4

5

4 5

6

2

1

3

4

5

6

7

6

7

6 7

8

X

Y

21

1

3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3 4 5 6

2

1

3

4

5

6

7 8

X

Y

21 3

3 32 32

1

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

2

223

3

2

4

6 7

4

5

5

1

1

1

1

32

4 5 6

2

1

3

4

5

6

7 8

X

Y

middot middot middotP2 P1

Figure 3 Description of mapping process

Exec

utio

n tim

e (m

s)

9 subtasks 16 subtasks 25 subtasksTask scale

GAACO

PSOOPSO

10k8k6k4k2k

Figure 4 Comparison of algorithm velocity

(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped

(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area

(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped

Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area

The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area

5 Experiment and Simulation

The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4

Mathematical Problems in Engineering 7

GA

ACO

PSO

OPSO

8

4

Aver

age p

acke

t del

ay (c

lock

cycle

s)

9 PEs 16 PEs 25 PEsTask scale

(a)

40e + 007

30e + 007

20e + 007

10e + 007

Pow

er co

nsum

ptio

n

GA

ACO

PSO

OPSO

9 PEs 16 PEs 25 PEsTask scale

(b)

Figure 5 Comparison of mapping effect

The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling

6 Conclusion

In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005

[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009

[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013

[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013

[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002

[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006

[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990

[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994

[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988

[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992

[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994

[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE

8 Mathematical Problems in Engineering

Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013

[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010

[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013

[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010

[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008

[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013

[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007

[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005

[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010

[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006

[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000

[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009

[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011

[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012

[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012

[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008

[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

Mathematical Problems in Engineering 7

GA

ACO

PSO

OPSO

8

4

Aver

age p

acke

t del

ay (c

lock

cycle

s)

9 PEs 16 PEs 25 PEsTask scale

(a)

40e + 007

30e + 007

20e + 007

10e + 007

Pow

er co

nsum

ptio

n

GA

ACO

PSO

OPSO

9 PEs 16 PEs 25 PEsTask scale

(b)

Figure 5 Comparison of mapping effect

The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling

6 Conclusion

In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005

[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009

[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013

[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013

[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002

[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006

[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990

[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994

[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988

[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992

[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994

[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE

8 Mathematical Problems in Engineering

Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013

[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010

[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013

[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010

[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008

[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013

[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007

[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005

[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010

[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006

[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000

[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009

[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011

[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012

[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012

[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008

[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

8 Mathematical Problems in Engineering

Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013

[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010

[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013

[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010

[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008

[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013

[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007

[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005

[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010

[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006

[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000

[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009

[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011

[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012

[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012

[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008

[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Effective Task Scheduling and IP Mapping ...downloads.hindawi.com/journals/mpe/2014/202748.pdf · Research Article Effective Task Scheduling and IP Mapping Algorithm

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of