Research Article Implementation of Membrane Algorithms on GPUdownloads.hindawi.com/journals/jam/2014/307617.pdf · Membrane computing is an emergent branch of natural computing initiated

Research ArticleImplementation of Membrane Algorithms on GPU

Xingyi Zhang1 Bangju Wang1 Zhuanlian Ding1 Jin Tang1 and Juanjuan He2

1 Key Lab of Intelligent Computing and Signal Processing of Ministry of Education School of Computer Science and TechnologyAnhui University Hefei 230039 China

2 Key Laboratory of Image Processing and Intelligent Control School of Automation Huazhong University of Science and TechnologyWuhan Hubei 430074 China

Correspondence should be addressed to Jin Tang ahhftanggmailcom

Received 7 May 2014 Accepted 25 June 2014 Published 10 July 2014

Academic Editor Quanke Pan

Copyright copy 2014 Xingyi Zhang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Membrane algorithms are a new class of parallel algorithms which attempt to incorporate some components of membranecomputing models for designing efficient optimization algorithms such as the structure of the models and the way ofcommunication between cells Although the importance of the parallelism of such algorithms has been well recognized membranealgorithms were usually implemented on the serial computing device central processing unit (CPU) which makes the algorithmsunable to work in an efficient way In this work we consider the implementation ofmembrane algorithms on the parallel computingdevice graphics processing unit (GPU) In such implementation all cells of membrane algorithms can work simultaneouslyExperimental results on two classical intractable problems the point set matching problem and TSP show that the GPUimplementation of membrane algorithms is much more efficient than CPU implementation in terms of runtime especially forsolving problems with a high complexity

1 Introduction

Membrane computing is an emergent branch of naturalcomputing initiated by Paun in 2000 [1] with the aimof abstracting innovative computing models or ideas fromthe living cells and higher order structures of living cellssuch as tissues and organs The obtained models called119875 systems are distributed and parallel computing devicesMost variants of 119875 systems were proved to be computa-tionally complete (equivalent to Turing machines or otherequivalent computing devices we also say that 119875 systemsare universal) as number computing devices [2ndash4] languagegenerators [5 6] and function computing devices [7 8]For general information on this area please refer to theHandbook of Membrane Computing [9] and for the up-to-date information refer to the membrane computing websitehttpppagepsystemseu119875 systems have been proved to be a rich framework for

handling many problems related to computing Such systemscan theoretically solve presumably intractable problems ina feasible time (solving NP-complete problems [10ndash12] or

even PSPACE-complete problems [13]) Actually 119875 systemscan also provide some new ideas for designing optimizationalgorithms to obtain approximate solutions to the intractableproblems [14ndash22] The optimization algorithms inspired by119875 systems are usually called membrane algorithms (someresearchers also call membrane algorithms 119875 systems basedoptimization algorithms) The first membrane algorithmwas proposed by Nishida in 2004 [14] where the nestedstructure and the communication mechanism between cellswere brought from 119875 systems Experimental results showedthat such an algorithm is effective and efficient for solving theintractable problem traveling salesman problem (TSP) [14]Since then many membrane algorithms have been proposedfor solving various optimization problems such as knapsackproblem [15] point set matching problem [16] numericaloptimization problem [17]multiobjective optimization prob-lem [18 19] DNA sequence design problem [20] and manypractical problems [21 22] The dynamic behavior analysis ofmembrane algorithms indicated that a membrane algorithmhas a stronger capacity of balancing exploration and exploita-tion than its counterpart evolutionary algorithm in order to

Hindawi Publishing CorporationJournal of Applied MathematicsVolume 2014 Article ID 307617 7 pageshttpdxdoiorg1011552014307617

2 Journal of Applied Mathematics

prevent premature convergence that might occur [23] Weshould stress that all membrane algorithmsmentioned abovecan work in a parallel way in the sense that the evolutionof each cell can be performed simultaneously Although theimportance of the parallelism of such algorithms has beenwell recognized membrane algorithms were usually imple-mented on the serial computing deviceCPUwhichmakes thealgorithms unable towork as expected in amore efficient wayTo achieve this aim this paper considers the implementationof membrane algorithms on parallel computing devices

The graphics processing units (GPUs) are a kind ofcomputing devices with high parallelism on numericaloperations where massively parallel processors can supportseveral thousands of concurrent threads The computationalpower of GPUs has turned them into attractive platformsfor general-purpose scientific and engineering applicationsespecially for tackling large scale numerical computing prob-lems [24] In this work we will consider the implementationofmembrane algorithms onGPUwith an attempt tomake themembrane algorithms work in a parallel way Under the GPUimplementation of membrane algorithms presented here thework of a cell ofmembrane algorithms is achieved by a threadof GPU by which all the cells of a membrane algorithmcan evolve simultaneously through the concurrent threadsof GPU The GPU implementation ensures that membranealgorithms can work in a more efficient way in the sensethat each operation of membrane algorithms is performedin parallel as much as possible Experimental results ontwo classical intractable problems the point set matchingproblem and TSP show that the GPU implementation ofmembrane algorithms is effective and efficient Comparedwith the CPU implementation the GPU implementation ofmembrane algorithms consumes much less runtime for deal-ing with the intractable problems especially for large scalenumerical intractable problems Software with a friendlyinterface is also developed for GPU implementation ofmembrane algorithms which can provide a convenient toolfor the users to solve the intractable problems We shouldstress that the GPU implementation of P systems also existsfor example in Sevilla and Spain [25 26] A parallel simulatorfor membrane computing models on GPU called PMCGPUcan be found in the website [27]

The rest of the paper is organized as follows In Section 2we present the GPU implementation procedure ofmembranealgorithms Experimental results on the point set match-ing problem and TSP are presented in Section 3 Section 4presents software for GPU implementation of membranealgorithms Finally conclusions and remarks are presented inSection 5

2 Implementation of MembraneAlgorithms on GPU

In this section we will present a GPU implementationprocedure of membrane algorithms We first review its CPUimplementation procedure

Let us recall that a membrane algorithm with a nestedstructure mainly consists of the following four operations

Begin

Yes

Update solutions

From cell 1 tocell m alternatively

Output the best solution in region 1

End

No


Termination conditions


Yes

Yes

Yes

No

No

No

Exchange solutions

Select solutions

Initialize parameters and solutions


Yes

No

Figure 1 The CPU implementation procedure of membrane algo-rithms

(1) initialize solutions in each of the119898 cells(2) update solutions in each of the119898 cells(3) exchange solutions between adjacent cells(4) select good solutions in each of the119898 cells

Figure 1 illustrates the CPU implementation procedureof membrane algorithms with a nested structure Under theCPU implementation ofmembrane algorithms one cell of thealgorithms starts to work only after the work of another cell isfinished during the four operations This means that the cellswill evolve one by one during each of the four operations

Therefore each of the four operations is achieved in asequential manner under the CPU implementation In fact itis not difficult to find that the four operations can be achievedin a parallel manner in the sense that the 119898 cells evolvesimultaneously if a parallel computing device is used The

Journal of Applied Mathematics 3

GPU implementation of membrane algorithms can achievesuch a parallelism of the four operations

Briefly a GPU consists of hundreds of blocks and eachblock can support several thousands of concurrent threadsA GPU should work with the help of host CPU Data canbe transferred between threads in the same block throughthe shared memory in the block But the transferred datashould be very little due to the small size of the sharedmemory Data cannot be directly transferred between threadsin different blocks but only through the host Figure 2 depictsthe structure of a GPU The proposed GPU implementationprocedure of membrane algorithms with a nested structure isshown in Figure 3Under theGPU implementation presentedhere the 119898 cells will work in parallel instead of one by oneduring the four operations The parallelism of the 119898 cells isachieved based on an idea as follows a thread of a GPU doesthe work of a cell of membrane algorithms so the 119898 cellscan work in parallel through the concurrent threads Thismeans that GPU has to create the same number of threadsas that of cells used by membrane algorithms during theimplementation on GPU Note that in the GPU implementa-tion procedure of membrane algorithms a synchronizationfunction has been used after each of the four operationsUnder the implementation on GPU the work of each threadmay not be finished at the same time so this function willensure that the next operation ofmembrane algorithms startsonly after the work of each of the 119898 cells is finished forthe four operations The parallel work of the cells under theimplementation on GPU enables membrane algorithms towork in a more efficient way in terms of runtime whichwill be illustrated by simulation experiments in the followingsection

3 Experimental Results and Analysis

In this section we will evaluate the performance of GPUimplementation of membrane algorithms through two classi-cal intractable problems the point set matching problem andTSP It is useful for readers to have some familiarity with suchtwo problems so we here briefly recall them

A point set matching problem can be formulated asfollows Given two sets119875 = 119901

1 119901

119898 and119876 = 119902

1 119902

119899

find a map 119891 1198751015840 rarr 1198761015840 (1198751015840 sube 119875 1198761015840 sube 119876) such that thefollowing matching

Gobj

= sum

119901119894119901119895isin119875

10038161003816100381610038161003816119889 (119901119894 119901119895) minus 119889 (119891 (119901

119894) 119891 (119901

119895))10038161003816100381610038161003816+ 119896 (119899

119901minus 1198991199011015840)

(1)

objective value Gobj is minimized where 119889(119901 119902) is theEuclidean distance between points 119901 and 119902 119899

119901 1198991199011015840 are the

sizes of 119875 and 1198751015840 and 119896 is a penalty factorThe TSP raises the following question Given a list of 119899

cities 119901119894 1 le 119894 le 119899 find a route 119901

11989411199011198942sdot sdot sdot 1199011198941198991199011198941(ie this route

visits each city exactly once and returns to the origin city)such that the total distance Dist of this route is minimized

Dist =119899minus1

sum

119896=1

119889 (119901119894119896 119901119894119896+1) + 119889 (119901

119894119899 1199011198941) (2)

where119889(119901119894 119901119895) is the Euclidean distance between cities119901

119894and

119901119895The membrane algorithm proposed in [16] will be

adopted to test the performance of GPU implementationThenumber of generations is set as 100 the updating number ofeach cell is 8 in one generation and the other parameters arethe same as those suggested in [16] Each test is executed 5times All simulations reported in this work are conductedon a PC with a 340GHz Intel Core i7-2600K CPUWindowsXP Professional SP3 64 bit operating system and NVIDIAGeForce GTX 560 Ti graphics card (it uses the GF114 GPUwhich offers a maximum of 384 cores) Note that the runtimeof GPU implementation reported in the following subsec-tions contains all times that implementing a membranealgorithm onGPU takes including data reorganization host-to-GPU data transfer and GPU-to-host data transfer

31 Experiments on the Point Set Matching Problem In theexperiments the point set matching problem is created asfollows a point set 119875 consisting of 119899 points is generatedrandomly in the region (119909 119910) | 0 le 119909 119910 le 256radic11989910and the observed119876 is generated by adding theGaussian noisewith a variance 120590 = 10 to set 119875

Figure 4 presents the runtime of CPU and GPU imple-mentation of the membrane algorithm having 200 cellson the point set matching problem with different sizes ofpoint set As shown in Figure 4 the GPU implementationof membrane algorithm is effective and efficient The GPUimplementation outperforms the CPU implementation onthe point set matching problem in the sense that it consumesmuch less runtime than CPU implementation especially forpoint set with a large size The runtime of CPU and thatof GPU implementation will both increase as the size ofpoint set increasesThe runtime of CPU implementation willdramatically increase with an increment of the size of pointset while the increment on runtime of GPU implementationis quite slight with an increment of the size of point setNote that membrane algorithm will perform initializingupdating exchanging and selecting operations for the pointset matching problem in each cell and the cells used by themembrane algorithm will be performed in parallel underthe GPU implementation instead of one by one under theCPU implementation So the runtime will not increasedramatically as the size of point set increases

Table 1 shows the runtime of CPU and GPU implemen-tation of the membrane algorithm with different numbersof cells on the point set matching problem having 200points The GPU implementation of membrane algorithmoutperforms the CPU implementation in the sense that it canmake membrane algorithmwith a large number of cells workin a more efficient way The runtime of CPU implementationwill dramatically increase as the number of cells used by themembrane algorithm increases while the implementation


Global memory

Constant memory

Texture memory

Host

Shared memory

Registers

Localmemory

middot middot middot

Block

GPU

Registers

Thread

Localmemory

Thread middot middot middot

Shared memory

Registers

Localmemory


Block

Registers

Thread

Localmemory

Thread

Figure 2 The structure of GPU

Begin


Update solutions simultaneously



End

Yes

Exchange solutions simultaneously

Select solutions simultaneously

Cell 1 Cell mmiddot middot middot

middot middot middot Cell mCell 1

middot middot middot Cell m

Execute synchronization function



No

Cell 1

Cell 1



Figure 3 The GPU implementation procedure of membranealgorithms

Table 1 Runtime (s) of CPU and GPU implementation of themembrane algorithm with different numbers of cells on point setmatching problem having 200 points

Number of cells PlatformCPU GPU

100 107844 10388150 159706 10470200 208239 10530250 268992 10655300 320236 10741350 359905 10938400 415670 11113450 447167 11564500 498842 11566

16000

14000

12000

10000

8000

6000

4000

2000

0

50 100 150 200 250 300 350 400 450 500

Tim

e (s)

Size of point set

CPUGPU

Figure 4 Runtime (s) of CPU and GPU implementation of themembrane algorithm having 200 cells on the point set matchingproblem with different sizes of point set

on GPU almost has the same runtime as the number ofcells increases The slight increment on runtime of GPUimplementation is partially caused by the increasing overheadgenerated by the control of the synchronization of cells as thenumber of cells increases

32 Experiments on the TSP In the experiments each TSPis chosen from the TSP benchmark problems in TSPLIB pro-posed by Reinelt [28] These benchmark problems have beenwidely used for testing the performance of an optimizationalgorithm

Figure 5 shows the runtime of CPU and GPU implemen-tation of the membrane algorithm having 200 cells on theTSP with different numbers of cities It is not difficult to findthat on TSP there is a similar result as that of the point setmatching problem The GPU implementation of membranealgorithms outperforms the CPU implementation on the TSPin terms of runtime The GPU implementation will consumemuch less runtime than CPU implementation especially onTSP with a large number of cities


CPUGPU

30

25

20

15

10

5

0

51 100 150 200 318 493 666 1000 1291

Tim

e (s)

The number of cities

Figure 5 Runtime (s) of CPU and GPU implementation ofthe membrane algorithm having 200 cells on TSP with differentnumbers of cities

Table 2 Runtime (s) of CPU and GPU implementation of themembrane algorithm with different numbers of cells on TSPbenchmark problem with 493 cities


100 513 224150 766 234200 1017 241250 1275 250300 1531 256350 1775 263400 2044 269450 2283 277500 2556 283

Table 2 shows the runtime of CPU and GPU implemen-tation of the membrane algorithm with different numbersof cells on the TSP benchmark problem with 493 cities Asshown in Table 2 on TSP the runtime of CPU implementa-tion of membrane algorithms will increase with an incrementof the number of cells while the implementation on GPUwill almost consume the same runtime as the number ofcells increases Different from the case of point set matchingproblems the runtime of CPU implementation of membranealgorithms on TSP does not dramatically increase with anincrement of the number of cellsThis result is mainly causedby the fact that the time consumed by each cell is quitelittle since the operations in each cell hold a low complexityfor the TSP with 200 cities So the total runtime of CPUimplementation will only increase slightly with an incrementof the number of cells

Compared with the simulation results on the point setmatching problem we can find that the GPU implementationof membrane algorithms is quite efficient for tackling largescale intractable problems while it is not good at dealing with

Figure 6 An interface of the software for GPU implementation ofmembrane algorithms with a nested structure

intractable problems with a small size The reason for thisis that the cells of membrane algorithms working in parallelcan save only a little runtime under GPU implementationsince the computational complexity in each cell is very lowfor intractable problems with a small size At the same timeunder GPU implementation the data should be transferredrepeatedly between GPU and host which will consume someadditional runtime Therefore the GPU implementation ofmembrane algorithms will not work in a more efficient waythan CPU implementation for solving intractable problemswith a small size

4 Software with a Friendly Interface

For the GPU implementation of membrane algorithmswe have to write a new programming code for differentintractable problems which is quite inconvenient for theusers Therefore software with a friendly interface (termedMAGPU) has been developed which can provide the userswith a convenient tool to implement membrane algorithmswith a nested structure on GPU This software can be foundon the website httpsgithubcomwarmheart0magpu

This software mainly contains the following functions(1) the setting of a few parameters including the generationnumber number of cells and updating number of each cellin one generation (2) the choice of experimental data andupdating strategy of each cell (it also allows the users to definetheir own updating strategy) (3) the saving and showingof experimental results (4) the saving and comparing ofruntimes of several experiments (5) the calculation of a fewmeasurement indexes such as mean and variance Figure 6presents an interface of the software for GPU implementationof membrane algorithms with a nested structure

5 Conclusions and Remarks

In this paper the implementation of membrane algorithmson a parallel computing device GPU was carried out TheGPU implementation can achieve the parallelism of mem-brane algorithms which makes membrane algorithms workin a more efficient way in terms of runtime Experimental


results on the point set matching problem and TSP showthat the GPU implementation of membrane algorithmsoutperforms the CPU implementation in the sense that itconsumes much less runtime

Although the GPU implementation of membrane algo-rithms has shown a good performance in terms of runtimemany problems remain to be solved for the GPU imple-mentation presented in this work Among these problemsan interesting one is to further reduce the runtime of GPUimplementation of membrane algorithms In the presentedGPU implementation the data transfer between host andGPU will be performed a large number of times which takesa lot of runtimeTherefore a possible solution is to reduce thenumber of data transfer times between host CPU andGPU Itis conjectured that the runtime of GPU implementation canbe greatly reduced by improving the GPU implementationprocedure such that only a small number of date transfertimes are performed between host and GPU

In order to provide the users with a convenient tool toimplement membrane algorithms on GPU software with afriendly interface has been developed This software is onlya simple version and many functions need to be improved oraddedwhich is an interestingwork that deserves to be furtherinvestigated

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural Sci-ence Foundation of China (61033003 91130034 6127215261202011 and 61320106005) and the Natural Science Foun-dation of Anhui Higher Education Institutions of China(KJ2012A010 and KJ2013A007)

References

[1] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[2] L Pan and G Paun ldquoSpiking neural P systems with anti-spikesrdquo International Journal of Computers Communicationsand Control vol 4 no 3 pp 273ndash282 2009

[3] L Pan and G Paun ldquoSpiking neural P systems an improvednormal formrdquo Theoretical Computer Science vol 411 no 6 pp906ndash918 2010

[4] J Wang H J Hoogeboom L Pan G Paun and M J Perez-Jimenez ldquoSpiking neural P systems with weightsrdquo NeuralComputation vol 22 no 10 pp 2615ndash2646 2010

[5] X Zhang X Zeng and L Pan ldquoOn string languages generatedby spiking neural P systems with exhaustive use of rulesrdquoNatural Computing vol 7 no 4 pp 535ndash549 2008

[6] H Chen M Ionescu T Ishdorj and G Paun ldquoSpiking neuralP systems with extended rules universality and languagesrdquoNatural Computing vol 7 no 2 pp 147ndash166 2008

[7] A Paun and G Paun ldquoSmall universal spiking neural Psystemsrdquo BioSystems vol 90 no 1 pp 48ndash60 2007

[8] X Zhang X Zeng andL Pan ldquoSmaller universal spiking neuralP systemsrdquo Fundamenta Informaticae vol 87 no 1 pp 117ndash1362008

[9] G Paun G Rozenberg and A Salomaa Handbook of Mem-brane Computing Oxford University Press Oxford UK 2010

[10] X Zhang S Wang Y Niu and L Pan ldquoTissue P systems withcell separation Attacking the partition problemrdquo Science ChinaInformation Sciences vol 54 no 2 pp 293ndash304 2011

[11] T Song X Wang and H Zheng ldquoTime-free solution toHamilton path problems using P systems with 119889-divisionrdquoJournal of Applied Mathematics vol 2013 Article ID 975798 7pages 2013

[12] X Liu Z Li J Suo Y a Ju and X Zeng ldquoSolvingmultidimen-sional 0-1 knapsack problem with time-free tissue P systemsrdquoJournal of Applied Mathematics vol 2014 Article ID 372768 6pages 2014

[13] T Ishdorj A Leporati L Pan X Zeng and X ZhangldquoDeterministic solutions toQSAT andQ3SAT by spiking neuralP systems with pre-computed resourcesrdquoTheoretical ComputerScience vol 411 no 25 pp 2345ndash2358 2010

[14] T Y Nishida ldquoAn approximate algorithm for NP-completeoptimization problems exploiting P systemsrdquo in Proceedingsof the Brainstorming Workshop on Uncertainty in MembraneComputing pp 185ndash192 Palma Spain 2004

[15] G Zhang M Gheorghe and C Wu ldquoA quantum-inspired evo-lutionary algorithm based on P systems for knapsack problemrdquoFundamenta Informaticae vol 87 no 1 pp 93ndash116 2008

[16] J Tang Z Ding X Zhang and B Luo ldquoMembrane computingmodel based algorithm for point set matchingrdquo Infrared andLaser Engineering vol 42 no 5 pp 1388ndash1394 2013

[17] J Cheng G Zhang and X Zeng ldquoA novel membrane algorithmbased on differential evolution for numerical optimizationrdquoInternational Journal of Unconventional Computing vol 7 no3 pp 159ndash183 2011

[18] H Liang R He N Wang and Y Xie ldquoP systems based multi-objective optimization algorithmrdquo Progress in Natural ScienceEnglish Edition vol 17 no 4 pp 458ndash465 2007

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] J H Xiao X Y Zhang and J Xu ldquoA membrane evolutionaryalgorithm for DNA sequence design in DNA computingrdquoChinese Science Bulletin vol 57 no 6 pp 698ndash706 2012

[21] G Zhang C Liu and H Rong ldquoAnalyzing radar emitter sig-nals with membrane algorithmsrdquo Mathematical and ComputerModelling vol 52 no 11-12 pp 1997ndash2010 2010

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on differential evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquoApplied Soft Computing Journal vol 13 no3 pp 1528ndash1542 2013

[23] G Zhang J Cheng and M Gheorghe ldquoDynamic behavioranalysis of membrane-inspired evolutionary algorithmsrdquo Inter-national Journal of Computers Communications amp Control vol9 no 9 pp 227ndash242 2014

[24] A Ruiz M Ujaldon J A Andrades et al ldquoThe GPU onbiomedical image processing for color and phenotype analysisrdquoin Proceedings of the 7th IEEE International Conference onBioinformatics andBioengineering pp 1124ndash1128NewYorkNYUSA January 2007


[25] F G C Cabarle H N Adorna M A Martinez-Del-Amor andM J Perez-Jimenez ldquoImproving GPU simulations of spikingneural P systemsrdquo Romanian Journal of Information Science andTechnology vol 15 no 1 pp 5ndash20 2012

[26] J M Cecilia J M Garcıa G D Guerrero M A Martınez-del-Amor I Perez-Hurtado and M J Perez-Jimenez ldquoSimulatinga P system based efficient solution to SAT by using GPUsrdquoTheJournal of Logic and Algebraic Programming vol 79 no 6 pp317ndash325 2010

[27] The parallel simulators PMCGPU for membrane computing onthe GPU httpsourceforgenetprojectspmcgpu

[28] G Reinelt TSPLIB httpwwwiwruni-heidelbergdegroupscomoptsoftwareTSPLIB95tsp

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Discrete MathematicsJournal of


Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


prevent premature convergence that might occur [23] Weshould stress that all membrane algorithmsmentioned abovecan work in a parallel way in the sense that the evolutionof each cell can be performed simultaneously Although theimportance of the parallelism of such algorithms has beenwell recognized membrane algorithms were usually imple-mented on the serial computing deviceCPUwhichmakes thealgorithms unable towork as expected in amore efficient wayTo achieve this aim this paper considers the implementationof membrane algorithms on parallel computing devices

The graphics processing units (GPUs) are a kind ofcomputing devices with high parallelism on numericaloperations where massively parallel processors can supportseveral thousands of concurrent threads The computationalpower of GPUs has turned them into attractive platformsfor general-purpose scientific and engineering applicationsespecially for tackling large scale numerical computing prob-lems [24] In this work we will consider the implementationofmembrane algorithms onGPUwith an attempt tomake themembrane algorithms work in a parallel way Under the GPUimplementation of membrane algorithms presented here thework of a cell ofmembrane algorithms is achieved by a threadof GPU by which all the cells of a membrane algorithmcan evolve simultaneously through the concurrent threadsof GPU The GPU implementation ensures that membranealgorithms can work in a more efficient way in the sensethat each operation of membrane algorithms is performedin parallel as much as possible Experimental results ontwo classical intractable problems the point set matchingproblem and TSP show that the GPU implementation ofmembrane algorithms is effective and efficient Comparedwith the CPU implementation the GPU implementation ofmembrane algorithms consumes much less runtime for deal-ing with the intractable problems especially for large scalenumerical intractable problems Software with a friendlyinterface is also developed for GPU implementation ofmembrane algorithms which can provide a convenient toolfor the users to solve the intractable problems We shouldstress that the GPU implementation of P systems also existsfor example in Sevilla and Spain [25 26] A parallel simulatorfor membrane computing models on GPU called PMCGPUcan be found in the website [27]

The rest of the paper is organized as follows In Section 2we present the GPU implementation procedure ofmembranealgorithms Experimental results on the point set match-ing problem and TSP are presented in Section 3 Section 4presents software for GPU implementation of membranealgorithms Finally conclusions and remarks are presented inSection 5

2 Implementation of MembraneAlgorithms on GPU

In this section we will present a GPU implementationprocedure of membrane algorithms We first review its CPUimplementation procedure

Let us recall that a membrane algorithm with a nestedstructure mainly consists of the following four operations

Begin

Yes

Update solutions



End

No




Yes

Yes

Yes

No

No

No

Exchange solutions

Select solutions



Yes

No

Figure 1 The CPU implementation procedure of membrane algo-rithms

(1) initialize solutions in each of the119898 cells(2) update solutions in each of the119898 cells(3) exchange solutions between adjacent cells(4) select good solutions in each of the119898 cells

Figure 1 illustrates the CPU implementation procedureof membrane algorithms with a nested structure Under theCPU implementation ofmembrane algorithms one cell of thealgorithms starts to work only after the work of another cell isfinished during the four operations This means that the cellswill evolve one by one during each of the four operations

Therefore each of the four operations is achieved in asequential manner under the CPU implementation In fact itis not difficult to find that the four operations can be achievedin a parallel manner in the sense that the 119898 cells evolvesimultaneously if a parallel computing device is used The







1 119901

119898 and119876 = 119902

1 119902

119899


Gobj

= sum

119901119894119901119895isin119875

10038161003816100381610038161003816119889 (119901119894 119901119895) minus 119889 (119891 (119901

119894) 119891 (119901

119895))10038161003816100381610038161003816+ 119896 (119899

119901minus 1198991199011015840)

(1)


119901 1198991199011015840 are the





Dist =119899minus1

sum

119896=1

119889 (119901119894119896 119901119894119896+1) + 119889 (119901

119894119899 1199011198941) (2)


119894and







Global memory

Constant memory

Texture memory

Host

Shared memory

Registers

Localmemory


Block

GPU

Registers

Thread

Localmemory


Shared memory

Registers

Localmemory


Block

Registers

Thread

Localmemory

Thread


Begin





End

Yes









No

Cell 1

Cell 1






100 107844 10388150 159706 10470200 208239 10530250 268992 10655300 320236 10741350 359905 10938400 415670 11113450 447167 11564500 498842 11566

16000

14000

12000

10000

8000

6000

4000

2000

0

50 100 150 200 250 300 350 400 450 500

Tim

e (s)

Size of point set

CPUGPU






CPUGPU

30

25

20

15

10

5

0

51 100 150 200 318 493 666 1000 1291

Tim

e (s)





100 513 224150 766 234200 1017 241250 1275 250300 1531 256350 1775 263400 2044 269450 2283 277500 2556 283
















Acknowledgments


References





































Volume 2014




Journal of











Journal of


Function Spaces






Algebra















1 119901

119898 and119876 = 119902

1 119902

119899


Gobj

= sum

119901119894119901119895isin119875

10038161003816100381610038161003816119889 (119901119894 119901119895) minus 119889 (119891 (119901

119894) 119891 (119901

119895))10038161003816100381610038161003816+ 119896 (119899

119901minus 1198991199011015840)

(1)


119901 1198991199011015840 are the





Dist =119899minus1

sum

119896=1

119889 (119901119894119896 119901119894119896+1) + 119889 (119901

119894119899 1199011198941) (2)


119894and







Global memory

Constant memory

Texture memory

Host

Shared memory

Registers

Localmemory


Block

GPU

Registers

Thread

Localmemory


Shared memory

Registers

Localmemory


Block

Registers

Thread

Localmemory

Thread


Begin





End

Yes









No

Cell 1

Cell 1






100 107844 10388150 159706 10470200 208239 10530250 268992 10655300 320236 10741350 359905 10938400 415670 11113450 447167 11564500 498842 11566

16000

14000

12000

10000

8000

6000

4000

2000

0

50 100 150 200 250 300 350 400 450 500

Tim

e (s)

Size of point set

CPUGPU






CPUGPU

30

25

20

15

10

5

0

51 100 150 200 318 493 666 1000 1291

Tim

e (s)





100 513 224150 766 234200 1017 241250 1275 250300 1531 256350 1775 263400 2044 269450 2283 277500 2556 283
















Acknowledgments


References





































Volume 2014




Journal of











Journal of


Function Spaces






Algebra










Global memory

Constant memory

Texture memory

Host

Shared memory

Registers

Localmemory


Block

GPU

Registers

Thread

Localmemory


Shared memory

Registers

Localmemory


Block

Registers

Thread

Localmemory

Thread


Begin





End

Yes









No

Cell 1

Cell 1






100 107844 10388150 159706 10470200 208239 10530250 268992 10655300 320236 10741350 359905 10938400 415670 11113450 447167 11564500 498842 11566

16000

14000

12000

10000

8000

6000

4000

2000

0

50 100 150 200 250 300 350 400 450 500

Tim

e (s)

Size of point set

CPUGPU






CPUGPU

30

25

20

15

10

5

0

51 100 150 200 318 493 666 1000 1291

Tim

e (s)





100 513 224150 766 234200 1017 241250 1275 250300 1531 256350 1775 263400 2044 269450 2283 277500 2556 283
















Acknowledgments


References





































Volume 2014




Journal of











Journal of


Function Spaces






Algebra










CPUGPU

30

25

20

15

10

5

0

51 100 150 200 318 493 666 1000 1291

Tim

e (s)





100 513 224150 766 234200 1017 241250 1275 250300 1531 256350 1775 263400 2044 269450 2283 277500 2556 283
















Acknowledgments


References





































Volume 2014




Journal of











Journal of


Function Spaces






Algebra















Acknowledgments


References





































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





















Volume 2014




Journal of











Journal of


Function Spaces






Algebra
















Volume 2014




Journal of











Journal of


Function Spaces






Algebra









Documents

Research Article Implementation of Membrane Algorithms on GPUdownloads.hindawi.com/journals/jam/2014/307617.pdf · Membrane computing is an emergent branch of natural computing initiated