Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Research ArticleEffective Task Scheduling and IP Mapping Algorithm forHeterogeneous NoC-Based MPSoC
Peng-Fei Yang and Quan Wang
School of Computer Xidian University Xirsquoan 710071 China
Correspondence should be addressed to Peng-Fei Yang yangppf163com
Received 8 May 2014 Accepted 17 June 2014 Published 10 July 2014
Academic Editor Yuping Wang
Copyright copy 2014 P-F Yang and Q Wang This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
Quality of task scheduling is critical to define the network communication efficiency and the performance of the entire NoC-(Network-on-Chip-) based MPSoC (multiprocessor System-on-Chip) In this paper the NoC-based MPSoC design process isfavorably divided into two steps that is scheduling subtasks to processing elements (PEs) of appropriate type and quantity andthen mapping these PEs onto the switching nodes of NoC topology When the task model is improved so that it reflects betterthe real intertask relations optimized particle swarm optimization (PSO) is utilized to achieve the first step with expected lesstask running and transfer cost as well as the least task execution time By referring to the topology of NoC and the resultantcommunication diagram of the first step the second step is done with the minimal expected network transmission delay as wellas less resource consumption and even power consumptionThe comparative experiments have shown the preferable resource andpower consumption of the algorithm when it is actually adopted in a system design
1 Introduction
The development of integrated circuit has provided strongsupport for the integration of multiple processing ele-ments (PEs) in single chip and the on-chip communicationbetween cores has developed from bus-based approach totwo-dimensional and three- dimensional Network-on-Chip(NoC) The network-based highly parallel System-on-Chip(SoC) structure has become the inevitable choice for nextgeneration of complex computer architecture [1] Neverthe-less the dramatic increase of PEs that can be integrated andthe size of executable tasks have brought new problems andchallenges to systematic design among which the dividingand scheduling of the task and IP mapping have become thefocus of systematic study
The NoC-based task scheduling and IP mapping onthe basis of given tasks type and amount of PEs availableand topology of NoC assign tasks to suitable PEs mapthe PEs to reasonable network topology improve as muchsystemefficiency as possiblewhile thewhole systemmeets thepower consumption and delay requirements Its significanceincludes the following (1) it serves as the bridge between
applications and architecture and determines the task imple-mentation processing performance and efficiency in archi-tecture (2) as heterogeneous multicore architecture usuallyassociates with particular field efficient task scheduling couldacquire support applications in specific fields and (3) as thesize of tasks and multicore system architecture is increasingefficient division ofmappingwill help improve the quality andefficiency of exploring mapping space and thereby improvethe performance and efficiency of the entire SoC
2 Related Work
Current research seldom distinguishes between task schedul-ing and IPmapping detailedly and themodeling and analysisis conducted providing that a PE only performs a subtask(in some algorithms subtasks are simplistic and consideredto be PEs) That is to say the task will be abstracted toa simple form of task model which just gives the callingrelationship between subtasks based on the above informa-tion the scheduling algorithm will allocate as little uptimeas possible [2ndash4] The approach has many drawbacks (1) theheterogeneous nature of NoCs and the communication delay
Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 202748 8 pageshttpdxdoiorg1011552014202748
2 Mathematical Problems in Engineering
between tasks are usually neglected (2) as the interdepen-dence among tasks is complex the model only abstractedthe calling relationship between subtasks with the resultthat other factors cannot be fully reflected and that transfercosts among different PEs are inadequately considered Thescheduling order designed by these models is not satisfactoryin practical operation so that continuous recalculation andadjustment are required during the system operation whichinevitably brings additional burden to the system and posesthreats to operating efficiency
In addition in terms of the time of scheduling decisiontask scheduling can be divided into static scheduling anddynamic scheduling Static scheduling means that the com-piler makes scheduling decision at compiling time for exam-ple list-based algorithms [5 6] clustering algorithms [7ndash9] and duplication-based algorithms [10 11] However staticscheduling model has some drawbacks as the model is anapproximation of communication and execution time amongprocessors it might disagree with the actual implementationof the program or even produce poor scheduling results
Dynamic scheduling means that a scheduler needs toschedule tasks to appropriate processors for the implemen-tation according to their performance and in a real-time wayso that the various requirements for the system can be metResearch in this areamainly employ heuristic algorithm suchas genetic algorithm (GA) [12] and ant-colony-based opti-mization (ACO) [13 14] heuristic task scheduling dynamicscheduling algorithm based on task pool [15] particle swarmoptimization (PSO) [16 17] optimized evolutionary algo-rithm [18 19] and dynamic scheduling algorithm based onreal-time constrains [20] Although good scheduling resultscould be attained when these approaches are applied in taskpartitioning andmapping in practice the inherent defects ofthese algorithms easily result in many drawbacks during theoperation for example the convergence speed is slow in thelate stage of genetic algorithm and in the early stage of antcolony algorithm the inadequate coverage of all collectionswill lead to disparity between its result and the optimumvalue particle swarm optimization is vulnerable to involvinglocal optimization problems
Meanwhile in the aspects of NoC topology throughsilicon via (TSV) technology [21] and optical interconnectiontechnology [22 23] havemade possible higher IP core densitywider bandwidth less power consumption and smaller sizeon integrated circuit chips However the resource occupancyand power consumption brought byNoCmust be consideredIn order to decline the NoC occupancy of limited resourceand further decrease power consumption various kinds ofheterogeneousNoC topology are designed [24ndash26] to suit dif-ferentiated needs for network transmission delay and band-width of different types of PEs Currently most algorithmshave not taken the effect of heterogeneous topology on systemperformance into consideration If PEs of different types inthe premise of balanced power consumption are mapped toreasonable area according to performance requirement anddata transmission delay are minimized the performance ofsystem could be greatly improved
Based on the analysis above the whole design processis divided into two stages As shown in Figure 1 the first
stage is task dividing and scheduling When the improvedtask model could faithfully reflect the real intertask relationthe local optimum question of particle swarm algorithm issolved and the optimized PSO algorithm is used to divide abig task into proper granular-sized small tasks featuring highcohesion and low coupling according to traffic and callingrelationship There exits high parallelism among these smalltasks Then assign these small tasks to corresponding PEaccording to the task nature and generate communicationdiagram to achieve the first step with expected less transfercost as well as the least task execution time Then the processcomes to the IP mapping stage In this stage by referring tocommunication diagram and the performance disparity anddelay information of topology of NoC the PEs are reasonablymapped into switching node of NoC so as to achieve leastnetwork transmission delay with less resource occupancy andeven power consumption and less resource pieces so that thesystem performance could avoid fluctuation when new tasksneed scheduling
The rest of the paper is organized as follows Section 3shows the detailed description of task dividing and schedul-ing Section 4 illustrates the process of IP mapping A com-parative experiment result is shown in Section 5 Section 6concludes the paper
3 Task Dividing and Scheduling
Although the types and quantities of PEs integrated in hetero-geneous multicore system based on NoC are expanding thesize of application task varies and the current task schedulingalgorithm often assign and map the task in accordance withthe numbers of utilizable PEs which to some tasks of smallsize may result into problems on one hand as the tasks aredivided into subtasks of extremely small size communica-tions among subtaskswould become overfrequentwhichmaylead to prolonged task execution time on the other handinadequate utilization of the performance of PEs may resultinto increased system power consumption and reduce overallsystem efficiency
This paper superimposes tasks on a PE until the com-puting resource of the PE is occupied at an appropriateratio (settings are based on the performance requirement ofsystem as well as PEs) and then new PEs are added Theapproach not only ensures that tasks are divided into subtasksof appropriate size but also ensures that every PE invoked isefficiently used thus bringing the best overall performance
31 Task Model A task could be divided into 119873 subtasksamong which there exits certain execution sequence orcontrol logic and these subtasks are processed by119872 (119898 types119898 le 119872) PEs Assuming that the processing time of 119898 typesof PEs for every subtask communication overhead amongPEs and amount of data transmission among interdependentsubtasks are known the task on heterogeneous multicore canbe abstracted into a quintuple
DAG = (119881 119864TypePCU 119862) (1)
Mathematical Problems in Engineering 3
2
4
5 5
6
7
8
8 1012
14
1415
16
18
20
20
25
30
t1
t2
t3
t4
t5
t6
t7
t7
t8
PE1 PE1PE2
PE2PE3 PE4
PE3
PE4
S1 S1
S2S2
Figure 1 Two stages of task scheduling and IP mapping
(1) 119881 task node-set inDAGapplication that is the vertexV isin 119881means that V is a subtask in119881 And the numberof subtasks in DAG application is 119873
(2) 119864 the frontier set in DAG application that is 119890119894119895
isin 119864
means that there exits data communication betweenV119894and V119895 the direction of arrow indicates the direction
of data transmission(3) Type (V) the type of the task For instance we can use
1 2 3 to represent different computing types Inaddition the type-set of tasks correspondswith that ofPEs which means that a task could only be scheduledto PEmatching its typeThis could be expressed by thematrix 119863 = 119889
119894119895 where the lines represent the tasks
the columns represent the PEs element 119889119894119895
= infin
represents task V119894which cannot be executed in 119875
119895and
119889119894119895
= 119886 represents task V119894which can be executed in 119875
119895
with the execution time of 119886(4) PCU the running cost of every type of PE per unit
time in which element PCU119903(1 lt 119903 lt 119898) represents
the running cost of 119903th type of PE per unit time(5) 119862 the collection of the communication overhead
of directed edge 119862119894119895
represents the transfer cost ofsubtasks V
119894and V
119895when they pass the directed edge
119890119894119895 When V
119894and V119895are scheduled to the same PE 119862
119894119895
equals zero
The target of task dividing and scheduling is to find aproper strategy of assigning and scheduling while meetingtask processing sequence and resource limitationwhich couldassign 119873 subtasks to PEs with proper amount and schedulethe execution order of every subtask in a reasonable mannerthus achieving minimum completion time of overall taskwith every task suiting the dependency graph Based on taskmodel an improved particle swarm algorithm is used toconduct computation
32 Coding and Decoding The resource occupation of everysubtask is encoded by indirect encoding The encodinglength depends on the amount of subtasks Every particlecorresponds to a certain task assigning strategy
Assume there exits 119873 subtasks which are encoded bysequential encoding in a task and 119872 PEs available whichare classified into 119898 types For example when 119873 = 10119898 = 3 particle (3 2 1 1 3 2 1 2 3 3) is a feasible scheduling
scheme the particle is encoded as shown in Table 1 and asshown in Table 2 by decoding the particle we can acquirethe assigning condition of subtasks in every type of PEThen as shown in Table 3 after assigning the subtasks PEsof reasonable amount are assigned to every type of PE inaccordance with the processing ability and the total amountof tasks to be processed
It follows from the task model that the running time ofevery subtask in different PEs is already knownThe runningtime on every type of PE is defined as
Sub TFT =
119899
sum
119894=1
119879119894119903 (2)
119879119894119903
represents the running time of subtask 119894 on the 119903thtype of PE and 119899 represents the amount of subtasks assignedto 119903th type of PE The execution time of the entire task isobtained as follows
TFT =
119896
Max119903=1
Sub TFT119903 (3)
The overall operation cost is given as
Run Cost =119896
sum
119903=1
Sub TFT119903sdot PCU
119903 (4)
Assuming that the task set in the119898th type of PE is119881119898and
the task set assigned to 119899th type of PE is 119881119899 the transfer cost
between PE119898and PE
119899is defined as
Tran Cost119898119899
= sum
forall119894119895
119862119894119895 (V
119894isin 119881119898 V119895isin 119881119899) (5)
The overall transfer cost is obtained as follows
Tran Cost = sum
forall119898 = 119899
Tran Cost119898119899
(6)
33 Initialization and Fitness Function Assuming that thepopulation size is 119904 amount of subtasks is 119873 and amountof types of PEs is 119898 the description of initialization of thepopulation can be as follows among the randomly generated119904 particles the position of 119894th particle is represented by vector119909119894= (1199091198941 1199091198942 119909
119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in which
119909119894119895
(1 le 119909119894119895
le 119898) represents that in the 119894th particle task 119895 isassigned to PEof119909
119894119895type for operation velocity is represented
4 Mathematical Problems in Engineering
Table 1 Example of particle coding
Subtask number 1 2 3 4 5 6 7 8 9 10Type of PE 3 2 1 1 3 2 1 2 3 3
Table 2 Example of decoding
Type of PE Subtask number1 3 4 72 2 6 83 1 5 9 10
Table 3 Task dividing
Type of PE Number of PE Subtask number1 1 3 4 72 2 2 6 83 3 1 53 4 9 10
by vector V119894= (V1198941 V1198942 V
119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in
which minus119898 le V119894119895le 119898
The fitness function of time is defined as
Fit Time (119894) =1
TFT119894
(1 le 119894 le 119904) (7)
where TFT119894represents the overall completion time of the 119894th
particle the fitness function of cost is obtained as follows
Fit Cost (119894) =1
Run Cost119894+ Tran Cost
119894
(1 le 119894 le 119904) (8)
The overall fitness function is obtained as follows
Fitness = Fit Time (119894) + Fit Cost (119894) (9)
The algorithm will select particles with higher fitnessvalue so that it could provide excellent basis for generatingexcellent particles of the next generation
34 Position and Velocity Updating In every iteration theparticle would update its velocity and position by (10)in accordance with its optimal historical position and theoptimal position of the population Only when the currentposition has better adaptive value comparing to its historicaloptimal position would the historical position be replaced bythe current position
V119896+1119894119889
= 119908119896sdot V119896119894+ 1198881sdot 1199031sdot (119875 best
119894minus 119909119896
119894)
+ 1198882sdot 1199032sdot (119866 best
119894minus 119909119896
119894)
119909119896+1
119894= 119909119896
119894+ V119896119894
(10)
119875 best119894is the best position experienced by 119894th particle
119866 best119894is the best position experienced by all particles in
the population119908119896is significant for balancing the algorithms
capability of global and local searching and the paper adoptsthe decreasing inertia weight as follows
119908119896=
119908start (119908start minus 119908end) (Gen minus 119896)
Gen (11)
119908start and 119908end represent respectively the initial inertiaweight and the inertia weight whenmaximum iteration timesGen is reached 119896 is the current iterations By adopting theinertia weight above an algorithm with strong global searchcapability in the early stage of iteration and more accuratelocal search capability in the late stage can be gotten
35 Flow of Algorithm
(1) Randomly initialize the position and velocity of theparticle swarm based on the description in ldquoInitial-ization and Fitness Functionrdquo
(2) Compute the velocity and position of every particle(3) Compute the fitness value of every particle and set
119875 best119894and 119866 best
119894
(4) If 119875 best119894and 119866 best
119894remain unchanged after many
iterations or the algorithm reached maximum iter-ations output the optimum solution end the algo-rithm and go to step 6
(5) Go to step 2(6) Assign PEs of reasonable amount to every type of PE
in accordance with the processing ability and totalamount of tasks to be processed
4 IP Mapping
After task dividing and scheduling the IP communicationdiagram is formed In the multicore system based on NoCthe further need is how to reasonablymap these PEs intoNoCnodes and minimize the network transmission delay duringthe task execution under conditions that the resources areless occupied and energy consumption is balancedThis is thequestion of IP mapping
There are often two orientations in IP mapping either tominimize the internal communication cost or to minimizethe external communication cost [27 28] Both orientationshave their pros and cons the former might lead to increasedcompetition among external resources and add more com-putation overhead later in mapping when increasing useratio of system resource the later tends to arrange surplusresources well and successfully decreases competition ofexternal resources with little changes in computation over-head However as each local mapping area is incomplete itproduces only second-best mapping solutions thus under-mining the global mapping optimization While designing
Mathematical Problems in Engineering 5
an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above
In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration
The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows
41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where
(1) 119875 represents the set of PEs in the communicationdiagram that is 119901
119894isin 119875 is a PE with execution task
(2) 119864 represents frontier set in DAG application thatis 119890119894119895
isin 119864 indicates that there exits data exchangebetween 119901i and 119901
119895
(3) 119862 represents communication cost in undirected edgeand 119862
119894119895represents the total communication data
between 119901119894and 119901
119895
It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping
Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already
Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix
42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows
Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102)
is defined as
MD (119894 119895) =10038161003816100381610038161199091 minus 119909
2
1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910
2
1003816100381610038161003816 (12)
Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102) is
defined as
ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)
Definition 3 Communication cost in mapped area is ob-tained as follows
Com cost = sum
forall119862119894119895isin119862
119862119894119895sdotMD (119871 (119901
119894) 119871 (119875
119895)) (14)
in which 119862119894119895
represents the total communication traf-fic between 119875
119894and 119875
119895in communication diagram and
MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped
position on topology between 119875119894and 119875
119895
The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results
The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ
1 ℎ2 ℎ
119894 with high communication
requirement the sequence is |ℎ1| ge |ℎ
2| ge sdot sdot sdot ge |ℎ
119894|
according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897
1 1198972 119897119894 without
high communication requirement the sequence is |1198971| ge
|1198972| ge sdot sdot sdot ge |119897
119894| according to amount of PEs contained The
execution steps of mapping algorithm are as follows
(1) Start mapping computation from collection ℎ1
choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping
Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area
(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree
6 Mathematical Problems in Engineering
1
21 3 4
21 3 4
41
4
5 6
2
1
3
4
5
6
7 8
5 6 7 8
9 10 11 12
1314 15
1613 1316 16
17
18 19
20 17
17
20
20
2122
2122
2324
2324
25262526 27282728
29 30 31 32
29 32
29 32
Y
X
(a) (b) (c)
Figure 2 Topology and its expression by matrix
21 3
2
223
3
4
2
4
5
4 5
6
2
1
3
4
5
6
7
6
7
6 7
8
X
Y
21
1
3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3
3 32 32
1
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
32
4 5 6
2
1
3
4
5
6
7 8
X
Y
middot middot middotP2 P1
Figure 3 Description of mapping process
Exec
utio
n tim
e (m
s)
9 subtasks 16 subtasks 25 subtasksTask scale
GAACO
PSOOPSO
10k8k6k4k2k
Figure 4 Comparison of algorithm velocity
(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped
(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area
(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped
Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area
The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area
5 Experiment and Simulation
The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4
Mathematical Problems in Engineering 7
GA
ACO
PSO
OPSO
8
4
Aver
age p
acke
t del
ay (c
lock
cycle
s)
9 PEs 16 PEs 25 PEsTask scale
(a)
40e + 007
30e + 007
20e + 007
10e + 007
Pow
er co
nsum
ptio
n
GA
ACO
PSO
OPSO
9 PEs 16 PEs 25 PEsTask scale
(b)
Figure 5 Comparison of mapping effect
The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling
6 Conclusion
In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005
[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009
[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013
[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013
[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002
[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006
[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990
[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994
[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988
[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992
[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994
[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE
8 Mathematical Problems in Engineering
Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013
[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010
[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013
[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010
[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008
[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013
[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007
[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005
[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010
[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006
[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000
[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009
[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011
[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012
[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012
[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008
[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
2 Mathematical Problems in Engineering
between tasks are usually neglected (2) as the interdepen-dence among tasks is complex the model only abstractedthe calling relationship between subtasks with the resultthat other factors cannot be fully reflected and that transfercosts among different PEs are inadequately considered Thescheduling order designed by these models is not satisfactoryin practical operation so that continuous recalculation andadjustment are required during the system operation whichinevitably brings additional burden to the system and posesthreats to operating efficiency
In addition in terms of the time of scheduling decisiontask scheduling can be divided into static scheduling anddynamic scheduling Static scheduling means that the com-piler makes scheduling decision at compiling time for exam-ple list-based algorithms [5 6] clustering algorithms [7ndash9] and duplication-based algorithms [10 11] However staticscheduling model has some drawbacks as the model is anapproximation of communication and execution time amongprocessors it might disagree with the actual implementationof the program or even produce poor scheduling results
Dynamic scheduling means that a scheduler needs toschedule tasks to appropriate processors for the implemen-tation according to their performance and in a real-time wayso that the various requirements for the system can be metResearch in this areamainly employ heuristic algorithm suchas genetic algorithm (GA) [12] and ant-colony-based opti-mization (ACO) [13 14] heuristic task scheduling dynamicscheduling algorithm based on task pool [15] particle swarmoptimization (PSO) [16 17] optimized evolutionary algo-rithm [18 19] and dynamic scheduling algorithm based onreal-time constrains [20] Although good scheduling resultscould be attained when these approaches are applied in taskpartitioning andmapping in practice the inherent defects ofthese algorithms easily result in many drawbacks during theoperation for example the convergence speed is slow in thelate stage of genetic algorithm and in the early stage of antcolony algorithm the inadequate coverage of all collectionswill lead to disparity between its result and the optimumvalue particle swarm optimization is vulnerable to involvinglocal optimization problems
Meanwhile in the aspects of NoC topology throughsilicon via (TSV) technology [21] and optical interconnectiontechnology [22 23] havemade possible higher IP core densitywider bandwidth less power consumption and smaller sizeon integrated circuit chips However the resource occupancyand power consumption brought byNoCmust be consideredIn order to decline the NoC occupancy of limited resourceand further decrease power consumption various kinds ofheterogeneousNoC topology are designed [24ndash26] to suit dif-ferentiated needs for network transmission delay and band-width of different types of PEs Currently most algorithmshave not taken the effect of heterogeneous topology on systemperformance into consideration If PEs of different types inthe premise of balanced power consumption are mapped toreasonable area according to performance requirement anddata transmission delay are minimized the performance ofsystem could be greatly improved
Based on the analysis above the whole design processis divided into two stages As shown in Figure 1 the first
stage is task dividing and scheduling When the improvedtask model could faithfully reflect the real intertask relationthe local optimum question of particle swarm algorithm issolved and the optimized PSO algorithm is used to divide abig task into proper granular-sized small tasks featuring highcohesion and low coupling according to traffic and callingrelationship There exits high parallelism among these smalltasks Then assign these small tasks to corresponding PEaccording to the task nature and generate communicationdiagram to achieve the first step with expected less transfercost as well as the least task execution time Then the processcomes to the IP mapping stage In this stage by referring tocommunication diagram and the performance disparity anddelay information of topology of NoC the PEs are reasonablymapped into switching node of NoC so as to achieve leastnetwork transmission delay with less resource occupancy andeven power consumption and less resource pieces so that thesystem performance could avoid fluctuation when new tasksneed scheduling
The rest of the paper is organized as follows Section 3shows the detailed description of task dividing and schedul-ing Section 4 illustrates the process of IP mapping A com-parative experiment result is shown in Section 5 Section 6concludes the paper
3 Task Dividing and Scheduling
Although the types and quantities of PEs integrated in hetero-geneous multicore system based on NoC are expanding thesize of application task varies and the current task schedulingalgorithm often assign and map the task in accordance withthe numbers of utilizable PEs which to some tasks of smallsize may result into problems on one hand as the tasks aredivided into subtasks of extremely small size communica-tions among subtaskswould become overfrequentwhichmaylead to prolonged task execution time on the other handinadequate utilization of the performance of PEs may resultinto increased system power consumption and reduce overallsystem efficiency
This paper superimposes tasks on a PE until the com-puting resource of the PE is occupied at an appropriateratio (settings are based on the performance requirement ofsystem as well as PEs) and then new PEs are added Theapproach not only ensures that tasks are divided into subtasksof appropriate size but also ensures that every PE invoked isefficiently used thus bringing the best overall performance
31 Task Model A task could be divided into 119873 subtasksamong which there exits certain execution sequence orcontrol logic and these subtasks are processed by119872 (119898 types119898 le 119872) PEs Assuming that the processing time of 119898 typesof PEs for every subtask communication overhead amongPEs and amount of data transmission among interdependentsubtasks are known the task on heterogeneous multicore canbe abstracted into a quintuple
DAG = (119881 119864TypePCU 119862) (1)
Mathematical Problems in Engineering 3
2
4
5 5
6
7
8
8 1012
14
1415
16
18
20
20
25
30
t1
t2
t3
t4
t5
t6
t7
t7
t8
PE1 PE1PE2
PE2PE3 PE4
PE3
PE4
S1 S1
S2S2
Figure 1 Two stages of task scheduling and IP mapping
(1) 119881 task node-set inDAGapplication that is the vertexV isin 119881means that V is a subtask in119881 And the numberof subtasks in DAG application is 119873
(2) 119864 the frontier set in DAG application that is 119890119894119895
isin 119864
means that there exits data communication betweenV119894and V119895 the direction of arrow indicates the direction
of data transmission(3) Type (V) the type of the task For instance we can use
1 2 3 to represent different computing types Inaddition the type-set of tasks correspondswith that ofPEs which means that a task could only be scheduledto PEmatching its typeThis could be expressed by thematrix 119863 = 119889
119894119895 where the lines represent the tasks
the columns represent the PEs element 119889119894119895
= infin
represents task V119894which cannot be executed in 119875
119895and
119889119894119895
= 119886 represents task V119894which can be executed in 119875
119895
with the execution time of 119886(4) PCU the running cost of every type of PE per unit
time in which element PCU119903(1 lt 119903 lt 119898) represents
the running cost of 119903th type of PE per unit time(5) 119862 the collection of the communication overhead
of directed edge 119862119894119895
represents the transfer cost ofsubtasks V
119894and V
119895when they pass the directed edge
119890119894119895 When V
119894and V119895are scheduled to the same PE 119862
119894119895
equals zero
The target of task dividing and scheduling is to find aproper strategy of assigning and scheduling while meetingtask processing sequence and resource limitationwhich couldassign 119873 subtasks to PEs with proper amount and schedulethe execution order of every subtask in a reasonable mannerthus achieving minimum completion time of overall taskwith every task suiting the dependency graph Based on taskmodel an improved particle swarm algorithm is used toconduct computation
32 Coding and Decoding The resource occupation of everysubtask is encoded by indirect encoding The encodinglength depends on the amount of subtasks Every particlecorresponds to a certain task assigning strategy
Assume there exits 119873 subtasks which are encoded bysequential encoding in a task and 119872 PEs available whichare classified into 119898 types For example when 119873 = 10119898 = 3 particle (3 2 1 1 3 2 1 2 3 3) is a feasible scheduling
scheme the particle is encoded as shown in Table 1 and asshown in Table 2 by decoding the particle we can acquirethe assigning condition of subtasks in every type of PEThen as shown in Table 3 after assigning the subtasks PEsof reasonable amount are assigned to every type of PE inaccordance with the processing ability and the total amountof tasks to be processed
It follows from the task model that the running time ofevery subtask in different PEs is already knownThe runningtime on every type of PE is defined as
Sub TFT =
119899
sum
119894=1
119879119894119903 (2)
119879119894119903
represents the running time of subtask 119894 on the 119903thtype of PE and 119899 represents the amount of subtasks assignedto 119903th type of PE The execution time of the entire task isobtained as follows
TFT =
119896
Max119903=1
Sub TFT119903 (3)
The overall operation cost is given as
Run Cost =119896
sum
119903=1
Sub TFT119903sdot PCU
119903 (4)
Assuming that the task set in the119898th type of PE is119881119898and
the task set assigned to 119899th type of PE is 119881119899 the transfer cost
between PE119898and PE
119899is defined as
Tran Cost119898119899
= sum
forall119894119895
119862119894119895 (V
119894isin 119881119898 V119895isin 119881119899) (5)
The overall transfer cost is obtained as follows
Tran Cost = sum
forall119898 = 119899
Tran Cost119898119899
(6)
33 Initialization and Fitness Function Assuming that thepopulation size is 119904 amount of subtasks is 119873 and amountof types of PEs is 119898 the description of initialization of thepopulation can be as follows among the randomly generated119904 particles the position of 119894th particle is represented by vector119909119894= (1199091198941 1199091198942 119909
119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in which
119909119894119895
(1 le 119909119894119895
le 119898) represents that in the 119894th particle task 119895 isassigned to PEof119909
119894119895type for operation velocity is represented
4 Mathematical Problems in Engineering
Table 1 Example of particle coding
Subtask number 1 2 3 4 5 6 7 8 9 10Type of PE 3 2 1 1 3 2 1 2 3 3
Table 2 Example of decoding
Type of PE Subtask number1 3 4 72 2 6 83 1 5 9 10
Table 3 Task dividing
Type of PE Number of PE Subtask number1 1 3 4 72 2 2 6 83 3 1 53 4 9 10
by vector V119894= (V1198941 V1198942 V
119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in
which minus119898 le V119894119895le 119898
The fitness function of time is defined as
Fit Time (119894) =1
TFT119894
(1 le 119894 le 119904) (7)
where TFT119894represents the overall completion time of the 119894th
particle the fitness function of cost is obtained as follows
Fit Cost (119894) =1
Run Cost119894+ Tran Cost
119894
(1 le 119894 le 119904) (8)
The overall fitness function is obtained as follows
Fitness = Fit Time (119894) + Fit Cost (119894) (9)
The algorithm will select particles with higher fitnessvalue so that it could provide excellent basis for generatingexcellent particles of the next generation
34 Position and Velocity Updating In every iteration theparticle would update its velocity and position by (10)in accordance with its optimal historical position and theoptimal position of the population Only when the currentposition has better adaptive value comparing to its historicaloptimal position would the historical position be replaced bythe current position
V119896+1119894119889
= 119908119896sdot V119896119894+ 1198881sdot 1199031sdot (119875 best
119894minus 119909119896
119894)
+ 1198882sdot 1199032sdot (119866 best
119894minus 119909119896
119894)
119909119896+1
119894= 119909119896
119894+ V119896119894
(10)
119875 best119894is the best position experienced by 119894th particle
119866 best119894is the best position experienced by all particles in
the population119908119896is significant for balancing the algorithms
capability of global and local searching and the paper adoptsthe decreasing inertia weight as follows
119908119896=
119908start (119908start minus 119908end) (Gen minus 119896)
Gen (11)
119908start and 119908end represent respectively the initial inertiaweight and the inertia weight whenmaximum iteration timesGen is reached 119896 is the current iterations By adopting theinertia weight above an algorithm with strong global searchcapability in the early stage of iteration and more accuratelocal search capability in the late stage can be gotten
35 Flow of Algorithm
(1) Randomly initialize the position and velocity of theparticle swarm based on the description in ldquoInitial-ization and Fitness Functionrdquo
(2) Compute the velocity and position of every particle(3) Compute the fitness value of every particle and set
119875 best119894and 119866 best
119894
(4) If 119875 best119894and 119866 best
119894remain unchanged after many
iterations or the algorithm reached maximum iter-ations output the optimum solution end the algo-rithm and go to step 6
(5) Go to step 2(6) Assign PEs of reasonable amount to every type of PE
in accordance with the processing ability and totalamount of tasks to be processed
4 IP Mapping
After task dividing and scheduling the IP communicationdiagram is formed In the multicore system based on NoCthe further need is how to reasonablymap these PEs intoNoCnodes and minimize the network transmission delay duringthe task execution under conditions that the resources areless occupied and energy consumption is balancedThis is thequestion of IP mapping
There are often two orientations in IP mapping either tominimize the internal communication cost or to minimizethe external communication cost [27 28] Both orientationshave their pros and cons the former might lead to increasedcompetition among external resources and add more com-putation overhead later in mapping when increasing useratio of system resource the later tends to arrange surplusresources well and successfully decreases competition ofexternal resources with little changes in computation over-head However as each local mapping area is incomplete itproduces only second-best mapping solutions thus under-mining the global mapping optimization While designing
Mathematical Problems in Engineering 5
an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above
In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration
The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows
41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where
(1) 119875 represents the set of PEs in the communicationdiagram that is 119901
119894isin 119875 is a PE with execution task
(2) 119864 represents frontier set in DAG application thatis 119890119894119895
isin 119864 indicates that there exits data exchangebetween 119901i and 119901
119895
(3) 119862 represents communication cost in undirected edgeand 119862
119894119895represents the total communication data
between 119901119894and 119901
119895
It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping
Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already
Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix
42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows
Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102)
is defined as
MD (119894 119895) =10038161003816100381610038161199091 minus 119909
2
1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910
2
1003816100381610038161003816 (12)
Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102) is
defined as
ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)
Definition 3 Communication cost in mapped area is ob-tained as follows
Com cost = sum
forall119862119894119895isin119862
119862119894119895sdotMD (119871 (119901
119894) 119871 (119875
119895)) (14)
in which 119862119894119895
represents the total communication traf-fic between 119875
119894and 119875
119895in communication diagram and
MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped
position on topology between 119875119894and 119875
119895
The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results
The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ
1 ℎ2 ℎ
119894 with high communication
requirement the sequence is |ℎ1| ge |ℎ
2| ge sdot sdot sdot ge |ℎ
119894|
according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897
1 1198972 119897119894 without
high communication requirement the sequence is |1198971| ge
|1198972| ge sdot sdot sdot ge |119897
119894| according to amount of PEs contained The
execution steps of mapping algorithm are as follows
(1) Start mapping computation from collection ℎ1
choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping
Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area
(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree
6 Mathematical Problems in Engineering
1
21 3 4
21 3 4
41
4
5 6
2
1
3
4
5
6
7 8
5 6 7 8
9 10 11 12
1314 15
1613 1316 16
17
18 19
20 17
17
20
20
2122
2122
2324
2324
25262526 27282728
29 30 31 32
29 32
29 32
Y
X
(a) (b) (c)
Figure 2 Topology and its expression by matrix
21 3
2
223
3
4
2
4
5
4 5
6
2
1
3
4
5
6
7
6
7
6 7
8
X
Y
21
1
3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3
3 32 32
1
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
32
4 5 6
2
1
3
4
5
6
7 8
X
Y
middot middot middotP2 P1
Figure 3 Description of mapping process
Exec
utio
n tim
e (m
s)
9 subtasks 16 subtasks 25 subtasksTask scale
GAACO
PSOOPSO
10k8k6k4k2k
Figure 4 Comparison of algorithm velocity
(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped
(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area
(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped
Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area
The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area
5 Experiment and Simulation
The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4
Mathematical Problems in Engineering 7
GA
ACO
PSO
OPSO
8
4
Aver
age p
acke
t del
ay (c
lock
cycle
s)
9 PEs 16 PEs 25 PEsTask scale
(a)
40e + 007
30e + 007
20e + 007
10e + 007
Pow
er co
nsum
ptio
n
GA
ACO
PSO
OPSO
9 PEs 16 PEs 25 PEsTask scale
(b)
Figure 5 Comparison of mapping effect
The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling
6 Conclusion
In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005
[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009
[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013
[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013
[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002
[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006
[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990
[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994
[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988
[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992
[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994
[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE
8 Mathematical Problems in Engineering
Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013
[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010
[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013
[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010
[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008
[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013
[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007
[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005
[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010
[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006
[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000
[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009
[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011
[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012
[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012
[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008
[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 3
2
4
5 5
6
7
8
8 1012
14
1415
16
18
20
20
25
30
t1
t2
t3
t4
t5
t6
t7
t7
t8
PE1 PE1PE2
PE2PE3 PE4
PE3
PE4
S1 S1
S2S2
Figure 1 Two stages of task scheduling and IP mapping
(1) 119881 task node-set inDAGapplication that is the vertexV isin 119881means that V is a subtask in119881 And the numberof subtasks in DAG application is 119873
(2) 119864 the frontier set in DAG application that is 119890119894119895
isin 119864
means that there exits data communication betweenV119894and V119895 the direction of arrow indicates the direction
of data transmission(3) Type (V) the type of the task For instance we can use
1 2 3 to represent different computing types Inaddition the type-set of tasks correspondswith that ofPEs which means that a task could only be scheduledto PEmatching its typeThis could be expressed by thematrix 119863 = 119889
119894119895 where the lines represent the tasks
the columns represent the PEs element 119889119894119895
= infin
represents task V119894which cannot be executed in 119875
119895and
119889119894119895
= 119886 represents task V119894which can be executed in 119875
119895
with the execution time of 119886(4) PCU the running cost of every type of PE per unit
time in which element PCU119903(1 lt 119903 lt 119898) represents
the running cost of 119903th type of PE per unit time(5) 119862 the collection of the communication overhead
of directed edge 119862119894119895
represents the transfer cost ofsubtasks V
119894and V
119895when they pass the directed edge
119890119894119895 When V
119894and V119895are scheduled to the same PE 119862
119894119895
equals zero
The target of task dividing and scheduling is to find aproper strategy of assigning and scheduling while meetingtask processing sequence and resource limitationwhich couldassign 119873 subtasks to PEs with proper amount and schedulethe execution order of every subtask in a reasonable mannerthus achieving minimum completion time of overall taskwith every task suiting the dependency graph Based on taskmodel an improved particle swarm algorithm is used toconduct computation
32 Coding and Decoding The resource occupation of everysubtask is encoded by indirect encoding The encodinglength depends on the amount of subtasks Every particlecorresponds to a certain task assigning strategy
Assume there exits 119873 subtasks which are encoded bysequential encoding in a task and 119872 PEs available whichare classified into 119898 types For example when 119873 = 10119898 = 3 particle (3 2 1 1 3 2 1 2 3 3) is a feasible scheduling
scheme the particle is encoded as shown in Table 1 and asshown in Table 2 by decoding the particle we can acquirethe assigning condition of subtasks in every type of PEThen as shown in Table 3 after assigning the subtasks PEsof reasonable amount are assigned to every type of PE inaccordance with the processing ability and the total amountof tasks to be processed
It follows from the task model that the running time ofevery subtask in different PEs is already knownThe runningtime on every type of PE is defined as
Sub TFT =
119899
sum
119894=1
119879119894119903 (2)
119879119894119903
represents the running time of subtask 119894 on the 119903thtype of PE and 119899 represents the amount of subtasks assignedto 119903th type of PE The execution time of the entire task isobtained as follows
TFT =
119896
Max119903=1
Sub TFT119903 (3)
The overall operation cost is given as
Run Cost =119896
sum
119903=1
Sub TFT119903sdot PCU
119903 (4)
Assuming that the task set in the119898th type of PE is119881119898and
the task set assigned to 119899th type of PE is 119881119899 the transfer cost
between PE119898and PE
119899is defined as
Tran Cost119898119899
= sum
forall119894119895
119862119894119895 (V
119894isin 119881119898 V119895isin 119881119899) (5)
The overall transfer cost is obtained as follows
Tran Cost = sum
forall119898 = 119899
Tran Cost119898119899
(6)
33 Initialization and Fitness Function Assuming that thepopulation size is 119904 amount of subtasks is 119873 and amountof types of PEs is 119898 the description of initialization of thepopulation can be as follows among the randomly generated119904 particles the position of 119894th particle is represented by vector119909119894= (1199091198941 1199091198942 119909
119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in which
119909119894119895
(1 le 119909119894119895
le 119898) represents that in the 119894th particle task 119895 isassigned to PEof119909
119894119895type for operation velocity is represented
4 Mathematical Problems in Engineering
Table 1 Example of particle coding
Subtask number 1 2 3 4 5 6 7 8 9 10Type of PE 3 2 1 1 3 2 1 2 3 3
Table 2 Example of decoding
Type of PE Subtask number1 3 4 72 2 6 83 1 5 9 10
Table 3 Task dividing
Type of PE Number of PE Subtask number1 1 3 4 72 2 2 6 83 3 1 53 4 9 10
by vector V119894= (V1198941 V1198942 V
119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in
which minus119898 le V119894119895le 119898
The fitness function of time is defined as
Fit Time (119894) =1
TFT119894
(1 le 119894 le 119904) (7)
where TFT119894represents the overall completion time of the 119894th
particle the fitness function of cost is obtained as follows
Fit Cost (119894) =1
Run Cost119894+ Tran Cost
119894
(1 le 119894 le 119904) (8)
The overall fitness function is obtained as follows
Fitness = Fit Time (119894) + Fit Cost (119894) (9)
The algorithm will select particles with higher fitnessvalue so that it could provide excellent basis for generatingexcellent particles of the next generation
34 Position and Velocity Updating In every iteration theparticle would update its velocity and position by (10)in accordance with its optimal historical position and theoptimal position of the population Only when the currentposition has better adaptive value comparing to its historicaloptimal position would the historical position be replaced bythe current position
V119896+1119894119889
= 119908119896sdot V119896119894+ 1198881sdot 1199031sdot (119875 best
119894minus 119909119896
119894)
+ 1198882sdot 1199032sdot (119866 best
119894minus 119909119896
119894)
119909119896+1
119894= 119909119896
119894+ V119896119894
(10)
119875 best119894is the best position experienced by 119894th particle
119866 best119894is the best position experienced by all particles in
the population119908119896is significant for balancing the algorithms
capability of global and local searching and the paper adoptsthe decreasing inertia weight as follows
119908119896=
119908start (119908start minus 119908end) (Gen minus 119896)
Gen (11)
119908start and 119908end represent respectively the initial inertiaweight and the inertia weight whenmaximum iteration timesGen is reached 119896 is the current iterations By adopting theinertia weight above an algorithm with strong global searchcapability in the early stage of iteration and more accuratelocal search capability in the late stage can be gotten
35 Flow of Algorithm
(1) Randomly initialize the position and velocity of theparticle swarm based on the description in ldquoInitial-ization and Fitness Functionrdquo
(2) Compute the velocity and position of every particle(3) Compute the fitness value of every particle and set
119875 best119894and 119866 best
119894
(4) If 119875 best119894and 119866 best
119894remain unchanged after many
iterations or the algorithm reached maximum iter-ations output the optimum solution end the algo-rithm and go to step 6
(5) Go to step 2(6) Assign PEs of reasonable amount to every type of PE
in accordance with the processing ability and totalamount of tasks to be processed
4 IP Mapping
After task dividing and scheduling the IP communicationdiagram is formed In the multicore system based on NoCthe further need is how to reasonablymap these PEs intoNoCnodes and minimize the network transmission delay duringthe task execution under conditions that the resources areless occupied and energy consumption is balancedThis is thequestion of IP mapping
There are often two orientations in IP mapping either tominimize the internal communication cost or to minimizethe external communication cost [27 28] Both orientationshave their pros and cons the former might lead to increasedcompetition among external resources and add more com-putation overhead later in mapping when increasing useratio of system resource the later tends to arrange surplusresources well and successfully decreases competition ofexternal resources with little changes in computation over-head However as each local mapping area is incomplete itproduces only second-best mapping solutions thus under-mining the global mapping optimization While designing
Mathematical Problems in Engineering 5
an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above
In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration
The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows
41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where
(1) 119875 represents the set of PEs in the communicationdiagram that is 119901
119894isin 119875 is a PE with execution task
(2) 119864 represents frontier set in DAG application thatis 119890119894119895
isin 119864 indicates that there exits data exchangebetween 119901i and 119901
119895
(3) 119862 represents communication cost in undirected edgeand 119862
119894119895represents the total communication data
between 119901119894and 119901
119895
It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping
Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already
Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix
42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows
Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102)
is defined as
MD (119894 119895) =10038161003816100381610038161199091 minus 119909
2
1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910
2
1003816100381610038161003816 (12)
Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102) is
defined as
ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)
Definition 3 Communication cost in mapped area is ob-tained as follows
Com cost = sum
forall119862119894119895isin119862
119862119894119895sdotMD (119871 (119901
119894) 119871 (119875
119895)) (14)
in which 119862119894119895
represents the total communication traf-fic between 119875
119894and 119875
119895in communication diagram and
MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped
position on topology between 119875119894and 119875
119895
The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results
The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ
1 ℎ2 ℎ
119894 with high communication
requirement the sequence is |ℎ1| ge |ℎ
2| ge sdot sdot sdot ge |ℎ
119894|
according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897
1 1198972 119897119894 without
high communication requirement the sequence is |1198971| ge
|1198972| ge sdot sdot sdot ge |119897
119894| according to amount of PEs contained The
execution steps of mapping algorithm are as follows
(1) Start mapping computation from collection ℎ1
choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping
Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area
(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree
6 Mathematical Problems in Engineering
1
21 3 4
21 3 4
41
4
5 6
2
1
3
4
5
6
7 8
5 6 7 8
9 10 11 12
1314 15
1613 1316 16
17
18 19
20 17
17
20
20
2122
2122
2324
2324
25262526 27282728
29 30 31 32
29 32
29 32
Y
X
(a) (b) (c)
Figure 2 Topology and its expression by matrix
21 3
2
223
3
4
2
4
5
4 5
6
2
1
3
4
5
6
7
6
7
6 7
8
X
Y
21
1
3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3
3 32 32
1
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
32
4 5 6
2
1
3
4
5
6
7 8
X
Y
middot middot middotP2 P1
Figure 3 Description of mapping process
Exec
utio
n tim
e (m
s)
9 subtasks 16 subtasks 25 subtasksTask scale
GAACO
PSOOPSO
10k8k6k4k2k
Figure 4 Comparison of algorithm velocity
(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped
(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area
(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped
Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area
The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area
5 Experiment and Simulation
The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4
Mathematical Problems in Engineering 7
GA
ACO
PSO
OPSO
8
4
Aver
age p
acke
t del
ay (c
lock
cycle
s)
9 PEs 16 PEs 25 PEsTask scale
(a)
40e + 007
30e + 007
20e + 007
10e + 007
Pow
er co
nsum
ptio
n
GA
ACO
PSO
OPSO
9 PEs 16 PEs 25 PEsTask scale
(b)
Figure 5 Comparison of mapping effect
The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling
6 Conclusion
In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005
[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009
[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013
[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013
[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002
[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006
[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990
[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994
[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988
[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992
[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994
[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE
8 Mathematical Problems in Engineering
Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013
[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010
[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013
[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010
[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008
[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013
[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007
[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005
[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010
[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006
[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000
[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009
[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011
[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012
[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012
[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008
[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
4 Mathematical Problems in Engineering
Table 1 Example of particle coding
Subtask number 1 2 3 4 5 6 7 8 9 10Type of PE 3 2 1 1 3 2 1 2 3 3
Table 2 Example of decoding
Type of PE Subtask number1 3 4 72 2 6 83 1 5 9 10
Table 3 Task dividing
Type of PE Number of PE Subtask number1 1 3 4 72 2 2 6 83 3 1 53 4 9 10
by vector V119894= (V1198941 V1198942 V
119894119899) (1 le 119894 le 119904 1 le 119899 le 119873) in
which minus119898 le V119894119895le 119898
The fitness function of time is defined as
Fit Time (119894) =1
TFT119894
(1 le 119894 le 119904) (7)
where TFT119894represents the overall completion time of the 119894th
particle the fitness function of cost is obtained as follows
Fit Cost (119894) =1
Run Cost119894+ Tran Cost
119894
(1 le 119894 le 119904) (8)
The overall fitness function is obtained as follows
Fitness = Fit Time (119894) + Fit Cost (119894) (9)
The algorithm will select particles with higher fitnessvalue so that it could provide excellent basis for generatingexcellent particles of the next generation
34 Position and Velocity Updating In every iteration theparticle would update its velocity and position by (10)in accordance with its optimal historical position and theoptimal position of the population Only when the currentposition has better adaptive value comparing to its historicaloptimal position would the historical position be replaced bythe current position
V119896+1119894119889
= 119908119896sdot V119896119894+ 1198881sdot 1199031sdot (119875 best
119894minus 119909119896
119894)
+ 1198882sdot 1199032sdot (119866 best
119894minus 119909119896
119894)
119909119896+1
119894= 119909119896
119894+ V119896119894
(10)
119875 best119894is the best position experienced by 119894th particle
119866 best119894is the best position experienced by all particles in
the population119908119896is significant for balancing the algorithms
capability of global and local searching and the paper adoptsthe decreasing inertia weight as follows
119908119896=
119908start (119908start minus 119908end) (Gen minus 119896)
Gen (11)
119908start and 119908end represent respectively the initial inertiaweight and the inertia weight whenmaximum iteration timesGen is reached 119896 is the current iterations By adopting theinertia weight above an algorithm with strong global searchcapability in the early stage of iteration and more accuratelocal search capability in the late stage can be gotten
35 Flow of Algorithm
(1) Randomly initialize the position and velocity of theparticle swarm based on the description in ldquoInitial-ization and Fitness Functionrdquo
(2) Compute the velocity and position of every particle(3) Compute the fitness value of every particle and set
119875 best119894and 119866 best
119894
(4) If 119875 best119894and 119866 best
119894remain unchanged after many
iterations or the algorithm reached maximum iter-ations output the optimum solution end the algo-rithm and go to step 6
(5) Go to step 2(6) Assign PEs of reasonable amount to every type of PE
in accordance with the processing ability and totalamount of tasks to be processed
4 IP Mapping
After task dividing and scheduling the IP communicationdiagram is formed In the multicore system based on NoCthe further need is how to reasonablymap these PEs intoNoCnodes and minimize the network transmission delay duringthe task execution under conditions that the resources areless occupied and energy consumption is balancedThis is thequestion of IP mapping
There are often two orientations in IP mapping either tominimize the internal communication cost or to minimizethe external communication cost [27 28] Both orientationshave their pros and cons the former might lead to increasedcompetition among external resources and add more com-putation overhead later in mapping when increasing useratio of system resource the later tends to arrange surplusresources well and successfully decreases competition ofexternal resources with little changes in computation over-head However as each local mapping area is incomplete itproduces only second-best mapping solutions thus under-mining the global mapping optimization While designing
Mathematical Problems in Engineering 5
an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above
In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration
The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows
41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where
(1) 119875 represents the set of PEs in the communicationdiagram that is 119901
119894isin 119875 is a PE with execution task
(2) 119864 represents frontier set in DAG application thatis 119890119894119895
isin 119864 indicates that there exits data exchangebetween 119901i and 119901
119895
(3) 119862 represents communication cost in undirected edgeand 119862
119894119895represents the total communication data
between 119901119894and 119901
119895
It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping
Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already
Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix
42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows
Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102)
is defined as
MD (119894 119895) =10038161003816100381610038161199091 minus 119909
2
1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910
2
1003816100381610038161003816 (12)
Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102) is
defined as
ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)
Definition 3 Communication cost in mapped area is ob-tained as follows
Com cost = sum
forall119862119894119895isin119862
119862119894119895sdotMD (119871 (119901
119894) 119871 (119875
119895)) (14)
in which 119862119894119895
represents the total communication traf-fic between 119875
119894and 119875
119895in communication diagram and
MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped
position on topology between 119875119894and 119875
119895
The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results
The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ
1 ℎ2 ℎ
119894 with high communication
requirement the sequence is |ℎ1| ge |ℎ
2| ge sdot sdot sdot ge |ℎ
119894|
according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897
1 1198972 119897119894 without
high communication requirement the sequence is |1198971| ge
|1198972| ge sdot sdot sdot ge |119897
119894| according to amount of PEs contained The
execution steps of mapping algorithm are as follows
(1) Start mapping computation from collection ℎ1
choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping
Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area
(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree
6 Mathematical Problems in Engineering
1
21 3 4
21 3 4
41
4
5 6
2
1
3
4
5
6
7 8
5 6 7 8
9 10 11 12
1314 15
1613 1316 16
17
18 19
20 17
17
20
20
2122
2122
2324
2324
25262526 27282728
29 30 31 32
29 32
29 32
Y
X
(a) (b) (c)
Figure 2 Topology and its expression by matrix
21 3
2
223
3
4
2
4
5
4 5
6
2
1
3
4
5
6
7
6
7
6 7
8
X
Y
21
1
3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3
3 32 32
1
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
32
4 5 6
2
1
3
4
5
6
7 8
X
Y
middot middot middotP2 P1
Figure 3 Description of mapping process
Exec
utio
n tim
e (m
s)
9 subtasks 16 subtasks 25 subtasksTask scale
GAACO
PSOOPSO
10k8k6k4k2k
Figure 4 Comparison of algorithm velocity
(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped
(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area
(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped
Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area
The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area
5 Experiment and Simulation
The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4
Mathematical Problems in Engineering 7
GA
ACO
PSO
OPSO
8
4
Aver
age p
acke
t del
ay (c
lock
cycle
s)
9 PEs 16 PEs 25 PEsTask scale
(a)
40e + 007
30e + 007
20e + 007
10e + 007
Pow
er co
nsum
ptio
n
GA
ACO
PSO
OPSO
9 PEs 16 PEs 25 PEsTask scale
(b)
Figure 5 Comparison of mapping effect
The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling
6 Conclusion
In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005
[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009
[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013
[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013
[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002
[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006
[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990
[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994
[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988
[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992
[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994
[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE
8 Mathematical Problems in Engineering
Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013
[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010
[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013
[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010
[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008
[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013
[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007
[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005
[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010
[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006
[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000
[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009
[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011
[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012
[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012
[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008
[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 5
an IP mapping algorithm it is necessary to make a carefulbalance between the two orientations above
In the meantime as described above PEs of differenttypes would have different requirements on a NoC commu-nication capability In order to save on-chip resource anddecrease system consumption various heterogeneous net-work topologies are designedTherefore during IP mappingthe matching between the communication requirements andon-chip communication capability entails comprehensiveconsideration
The paper based on the property of PEs to be mappedand the characteristics of distribution of transmission capa-bility on topology maps the PEs of high communicationrequirement to high-capability area balances communica-tion cost internal with that external and achieves on-chipcommunication of system by minimum transmission delayand less resource occupancyThemapping algorithm consistsof two parts the expression of the network topology by two-dimensional matrix and the IP mappingThey are detailed asfollows
41 IP Communication Diagram and NoC Topology Thecommunication diagram can be abstracted into a tripleCDAG = (119875 119864 119862) where
(1) 119875 represents the set of PEs in the communicationdiagram that is 119901
119894isin 119875 is a PE with execution task
(2) 119864 represents frontier set in DAG application thatis 119890119894119895
isin 119864 indicates that there exits data exchangebetween 119901i and 119901
119895
(3) 119862 represents communication cost in undirected edgeand 119862
119894119895represents the total communication data
between 119901119894and 119901
119895
It is complicated to express NoC topology directly espe-cially three-dimensional NoC Nevertheless twodimension-al matrix expresses topology well and many properties ofmatrix could also be applied to topology computationTherefore the paper expresses topology by two-dimensionalmatrix before IP mapping
Three-dimensional mesh topology can be taken as anexample Shown in Figure 2(a) is a 4lowast4lowast2 three-dimensionalNoC topology the red vertices represent bottom switchingnodes and the black ones represent upper switching nodesFigure 2(b) is its two-dimensional expansion diagram bywhich we can be free of the complexity in studying the three-dimensional topology For the convenience of expression andcomputation the position of nodes in expansion diagram isexpressed by matrix The position of nodes in Figure 2(b)can be seen in Figure 2(c) There may exist areas wherecommunication transmission capability is higher than thatof others to fulfill the higher communication requirement ofsome PEs as shown in Figure 2(c) the green areas representareas in which there exist switching nodes with highercommunication performance For the integrity of matrixexpression areas without switching nodes are filled withshadow in the later computing nodes in these areas areassumed to be assigned out already
Through the approach above there forms one-to-onecorrespondence between the position of every node in three-dimensional NoC topology and that of every element inmatrix IP mapping conducts computing optimization on thebasis of matrix
42 IP Mapping Before introducing the concrete algorithmthree parameters are given as follows
Definition 1 Manhattan Distance MD(119894 119895) in a plane theManhattan Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102)
is defined as
MD (119894 119895) =10038161003816100381610038161199091 minus 119909
2
1003816100381610038161003816 +10038161003816100381610038161199101 minus 119910
2
1003816100381610038161003816 (12)
Definition 2 Euclidean Distance ED(119894 119895) in a plane theEuclidean Distance between point 119875
119894(1199091 1199101) and 119875
119895(1199092 1199102) is
defined as
ED (119894 119895) = radic(1199091minus 1199092)2+ (1199101minus 1199102)2 (13)
Definition 3 Communication cost in mapped area is ob-tained as follows
Com cost = sum
forall119862119894119895isin119862
119862119894119895sdotMD (119871 (119901
119894) 119871 (119875
119895)) (14)
in which 119862119894119895
represents the total communication traf-fic between 119875
119894and 119875
119895in communication diagram and
MD(119871(119901119894) 119871(119875119895)) represents Manhattan Distance of mapped
position on topology between 119875119894and 119875
119895
The target of the algorithm is to map PEs with highcommunication requirement to topology area with highcommunication capability and find out a mapping schemewhich has minimum Com cost in the results
The algorithm divides communication diagram into col-lections 119867 and 119871 according to whether or not includedPEs need to be mapped in area with high capability In thecollection 119867 = ℎ
1 ℎ2 ℎ
119894 with high communication
requirement the sequence is |ℎ1| ge |ℎ
2| ge sdot sdot sdot ge |ℎ
119894|
according to the amount of PEs with high communicationrequirement in the collection 119871 = 119897
1 1198972 119897119894 without
high communication requirement the sequence is |1198971| ge
|1198972| ge sdot sdot sdot ge |119897
119894| according to amount of PEs contained The
execution steps of mapping algorithm are as follows
(1) Start mapping computation from collection ℎ1
choose communication area with high communica-tion capability which could contain the minimumset of PEs with high communication requirement inℎ1on topology as the beginning area of mapping
Name the mapped PEs as assigned area and namethe occupied switching nodes area on topology asmapped area
(2) Start from the PE with maximum communicationtraffic (sum of input and output) and map it to theswitching node in the area of high communicationcapability whose available neighboring nodes numberis nearest to PE node degree
6 Mathematical Problems in Engineering
1
21 3 4
21 3 4
41
4
5 6
2
1
3
4
5
6
7 8
5 6 7 8
9 10 11 12
1314 15
1613 1316 16
17
18 19
20 17
17
20
20
2122
2122
2324
2324
25262526 27282728
29 30 31 32
29 32
29 32
Y
X
(a) (b) (c)
Figure 2 Topology and its expression by matrix
21 3
2
223
3
4
2
4
5
4 5
6
2
1
3
4
5
6
7
6
7
6 7
8
X
Y
21
1
3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3
3 32 32
1
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
32
4 5 6
2
1
3
4
5
6
7 8
X
Y
middot middot middotP2 P1
Figure 3 Description of mapping process
Exec
utio
n tim
e (m
s)
9 subtasks 16 subtasks 25 subtasksTask scale
GAACO
PSOOPSO
10k8k6k4k2k
Figure 4 Comparison of algorithm velocity
(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped
(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area
(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped
Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area
The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area
5 Experiment and Simulation
The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4
Mathematical Problems in Engineering 7
GA
ACO
PSO
OPSO
8
4
Aver
age p
acke
t del
ay (c
lock
cycle
s)
9 PEs 16 PEs 25 PEsTask scale
(a)
40e + 007
30e + 007
20e + 007
10e + 007
Pow
er co
nsum
ptio
n
GA
ACO
PSO
OPSO
9 PEs 16 PEs 25 PEsTask scale
(b)
Figure 5 Comparison of mapping effect
The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling
6 Conclusion
In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005
[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009
[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013
[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013
[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002
[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006
[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990
[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994
[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988
[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992
[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994
[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE
8 Mathematical Problems in Engineering
Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013
[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010
[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013
[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010
[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008
[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013
[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007
[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005
[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010
[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006
[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000
[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009
[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011
[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012
[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012
[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008
[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
6 Mathematical Problems in Engineering
1
21 3 4
21 3 4
41
4
5 6
2
1
3
4
5
6
7 8
5 6 7 8
9 10 11 12
1314 15
1613 1316 16
17
18 19
20 17
17
20
20
2122
2122
2324
2324
25262526 27282728
29 30 31 32
29 32
29 32
Y
X
(a) (b) (c)
Figure 2 Topology and its expression by matrix
21 3
2
223
3
4
2
4
5
4 5
6
2
1
3
4
5
6
7
6
7
6 7
8
X
Y
21
1
3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3 4 5 6
2
1
3
4
5
6
7 8
X
Y
21 3
3 32 32
1
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
2
223
3
2
4
6 7
4
5
5
1
1
1
1
32
4 5 6
2
1
3
4
5
6
7 8
X
Y
middot middot middotP2 P1
Figure 3 Description of mapping process
Exec
utio
n tim
e (m
s)
9 subtasks 16 subtasks 25 subtasksTask scale
GAACO
PSOOPSO
10k8k6k4k2k
Figure 4 Comparison of algorithm velocity
(3) Choose the node which has maximum communica-tion data with assigned area as the next PE to bemapped
(4) Correspond the PE to switching node which hasminimum Manhattan Distance with mapped area Ifmore than one node meet requirement choose thenode whose available neighboring nodes number isnearest to PE node degree if there are still morethan one node then choose the switching node whichhas minimum Euclidean Distance from the center ofmapped area
(5) Repeat step 3 and step 4 until all PEs are mapped andstart algorithm of another PE diagram to be mapped
Figure 3 is the simple description of mapping process InIP communication diagram the red PEs represent PEs withhigh communication requirement and blue area representsassigned area in the topology the green area represents areaof switching nodes with high communication capability andarea encircled by red line represents mapped area
The mapping algorithm arranges PEs with direct com-munication relationship to neighboring nodes ensuring theroad between source node anddestination node to be shortestwithout any conflicts with other transmission roads thusminimizing the delay in the whole mapping area
5 Experiment and Simulation
The comparison and evaluation on the performance ofdesigned algorithm are given from two aspects The first oneis the velocity efficiency itself of task dividing and schedulingalgorithm By computing tasks of the same size according toGA ACO PSO and algorithm in this paper respectively andcomparing the running time we can prove the efficiency ofalgorithm This part is conducted in Matlab with iterationsbeing 200 times the comparison of time required for runningalgorithms is shown in Figure 4
Mathematical Problems in Engineering 7
GA
ACO
PSO
OPSO
8
4
Aver
age p
acke
t del
ay (c
lock
cycle
s)
9 PEs 16 PEs 25 PEsTask scale
(a)
40e + 007
30e + 007
20e + 007
10e + 007
Pow
er co
nsum
ptio
n
GA
ACO
PSO
OPSO
9 PEs 16 PEs 25 PEsTask scale
(b)
Figure 5 Comparison of mapping effect
The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling
6 Conclusion
In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005
[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009
[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013
[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013
[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002
[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006
[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990
[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994
[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988
[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992
[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994
[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE
8 Mathematical Problems in Engineering
Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013
[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010
[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013
[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010
[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008
[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013
[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007
[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005
[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010
[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006
[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000
[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009
[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011
[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012
[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012
[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008
[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 7
GA
ACO
PSO
OPSO
8
4
Aver
age p
acke
t del
ay (c
lock
cycle
s)
9 PEs 16 PEs 25 PEsTask scale
(a)
40e + 007
30e + 007
20e + 007
10e + 007
Pow
er co
nsum
ptio
n
GA
ACO
PSO
OPSO
9 PEs 16 PEs 25 PEsTask scale
(b)
Figure 5 Comparison of mapping effect
The other one is the comparison on actual mapping effect(Figure 5) By comparing the operation of different schedul-ing results from the above algorithms in NoC simulationenvironment and computing the delay of power consumptionof system respectively we can prove the superiority of thealgorithm of this paper in scheduling
6 Conclusion
In this paper the task scheduling model is further improvedand the operating cost per time unit is employed as uni-form measurement for PEs of different types and simplifiesalgorithm task dividing and scheduling and IP mapping arehandled separately so that the resultant algorithm schedulingis more efficient and truthful The target of scheduling notonly considers the total time spent but also considers the timecost and resource cost during the task running so as to achievecomprehensive optimization of system performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] C Addo-Quaye ldquoThermal-aware mapping and placement for3-D NoC designsrdquo in Proceedings of the IEEE International SOCConference pp 25ndash28 September 2005
[2] A K SinghW Jigang A Prakash and T Srikanthan ldquoMappingalgorithms forNoC-based heterogeneousMPSoCplatformsrdquo inProceedings of the 12th Euromicro Conference on Digital SystemDesign ArchitecturesMethods and Tools (DSD rsquo09) pp 133ndash140August 2009
[3] K Ganeshpure and S Kundu ldquoOn runtime task graph extrac-tion in MPSoCrdquo in Proceedings of the IEEE Computer SocietyAnnual Symposium on VLSI pp 171ndash176 IEEE 2013
[4] Y Z Tei M N Marsono N Shaikh-Husin and Y W HauldquoNetwork partitioning and GA heuristic crossover for NoCapplication mappingrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS rsquo13) pp 1228ndash1231Beijing China May 2013
[5] HTopcuoglu SHariri andMWu ldquoPerformance-effective andlow-complexity task scheduling for heterogeneous computingrdquoIEEE Transactions on Parallel and Distributed Systems vol 13no 3 pp 260ndash274 2002
[6] M I Daoud and N Kharma ldquoEfficient compile-time taskscheduling for heterogeneous distributed computing systemsrdquoin Proceedings of the 12th International Conference on Paralleland Distributed Systems (ICPADS rsquo06) vol 1 pp 11ndash19 IEEEMinneapolis Minnesota July 2006
[7] M Wu and D D Gajski ldquoHypertool a programming aid formessage-passing systemsrdquo IEEE Transactions on Parallel andDistributed Systems vol 1 no 3 pp 330ndash343 1990
[8] T Yang and A Gerasoulis ldquoDSC scheduling parallel tasks onan unbounded number of processorsrdquo IEEE Transactions onParallel and Distributed Systems vol 5 no 9 pp 951ndash967 1994
[9] S J Kim and J C Browne ldquoA general approach to mappingof parallel computation upon multiprocessor architecturesrdquo inProceedings of the International Conference on Parallel Process-ing vol 2 pp 1ndash8 1988
[10] Y-C Chung and S Ranka ldquoApplications and performance anal-ysis of a compile-time optimization approach for list schedulingalgorithms on distributed memory multiprocessorsrdquo in Super-computing pp 512ndash521 1992
[11] I Ahmad and Y Kwok ldquoA new approach to scheduling parallelprograms using task duplicationrdquo in Proceedings of the Interna-tional Conference on Parallel Processing vol 2 pp 47ndash51 1994
[12] M Sayuti and L S Indrusiak ldquoReal-time low-power taskmapping in networks-on-chiprdquo in Proceedings of the IEEE
8 Mathematical Problems in Engineering
Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013
[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010
[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013
[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010
[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008
[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013
[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007
[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005
[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010
[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006
[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000
[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009
[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011
[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012
[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012
[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008
[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
8 Mathematical Problems in Engineering
Computer Society Annual Symposium on VLSI (ISVLSI rsquo13) pp14ndash19 2013
[13] F Ferrandi P L Lanzi C Pilato D Sciuto and A TumeoldquoAnt colony heuristic for mapping and scheduling tasks andcommunications on heterogeneous embedded systemsrdquo IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems vol 29 no 6 pp 911ndash924 2010
[14] L S Junior N Nedjah and L de Macedo Mourelle ldquoCOapproach in static routing for network-on-chips with 3D meshtopologyrdquo in Proceedings of the IEEE Fourth Latin AmericanSymposium onCircuits and Systems (LASCAS rsquo13) pp 1ndash4 IEEECusco Peru February 2013
[15] RHoffmannA Prell andT Rauber ldquoDynamic task schedulingand load balancing on cell processorsrdquo in Proceedings of the18th Euromicro Conference on Parallel Distributed andNetwork-Based Processing (PDP rsquo10) pp 205ndash212 February 2010
[16] M B Abdelhalim ldquoTask assignment for heterogeneous mul-tiprocessors using re-excited particle swarm optimizationrdquo inProceedings of the International Conference on Computer andElectrical Engineering (ICCEE rsquo08) pp 23ndash27 PhuketThailandDecember 2008
[17] M S Sidhu P Thulasiraman and R K Thulasiram ldquoA load-rebalance PSO heuristic for task matching in heterogeneouscomputing systemsrdquo in Proceedings of the IEEE Symposium onSwarm Intelligence (SIS rsquo13) pp 180ndash187 IEEE Singapore April2013
[18] Y Wang and C Dang ldquoAn evolutionary algorithm for globaloptimization based on level-set evolution and latin squaresrdquoIEEE Transactions on Evolutionary Computation vol 11 no 5pp 579ndash595 2007
[19] Y-P Wang Y-C Jiao and H Li ldquoAn evolutionary algorithmfor solving nonlinear bilevel programming based on a newconstraint-handling schemerdquo IEEE Transactions on SystemsMan and Cybernetics C Applications and Reviews vol 35 no2 pp 221ndash232 2005
[20] O Arnold and G Fettweis ldquoPower aware heterogeneousMPSoCwith dynamic task scheduling and increased data local-ity for multiple applicationsrdquo in Proceedings of the InternationalConference on Embedded Computer Systems (SAMOS 10) pp110ndash117 2010
[21] G DeMicheli and L BeniniNetworks on Chips Technology andTools Academic Press 2006
[22] D A B Miller ldquoRationale and challenges for optical intercon-nects to electronic chipsrdquo Proceedings of the IEEE vol 88 no 6pp 728ndash749 2000
[23] D A B Miller ldquoDevice requirements for optical interconnectsto silicon chipsrdquo Proceedings of the IEEE vol 97 no 7 pp 1166ndash1185 2009
[24] M O Agyeman and A Ahmadinia ldquoOptimising heteroge-neous 3D networks-on-chiprdquo in Proceedings of the 6th IEEEInternational Symposium on Parallel Computing in ElectricalEngineering (PARELEC 11) pp 25ndash30 April 2011
[25] Y Ye J Xu X Wu W Zhang W Liu and M NikdastldquoA torus-based hierarchical optical-electronic network-on-chipfor multiprocessor system-on-chiprdquo ACM Journal on EmergingTechnologies in Computing Systems vol 8 no 1 article 5 2012
[26] HA Khouzani S Koohi and SHessabi ldquoFully contention-freeoptical NoC based on wavelenght routingrdquo in Proceedings of the16thCSI International SymposiumonComputer Architecture andDigital Systems (CADS rsquo12) pp 81ndash86 May 2012
[27] C Chou and R Marculescu ldquoUser-aware dynamic task allo-cation in networks-on-chiprdquo in Proceedings of the DesignAutomation and Test in Europe (DATE rsquo08) vol 1ndash3 pp 1074ndash1079 March 2008
[28] C Chou and R Marculescu ldquoRun-time task allocation con-sidering user behavior in embedded multiprocessor networks-on-chiprdquo IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems vol 29 no 1 pp 78ndash91 2010
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of