6
Paralleled Genetic Algorithm for Solving the Knapsack Problem in the Cloud Javid Taheri, Shaghayegh Sharif, Xing Penju, Albert Y. Zomaya Centre for Distributed and High Performance Computing School of Information Technologies, The University of Sydney Sydney, NSW 2006, Australia Email: [email protected] Abstract—This paper proposes a Parallel Genetic Algorithm (PGA) framework to solve the Knapsack problem in the Cloud. Our PGA consists of several independent workers that cooperatively run in parallel to find optimal solutions for a Knapsack problem; we chose the Knapsack problem because it is known to be NP-Complete and can be used to motivate other cloud-based solutions for other combinatorial problems too. Although this problem has already been extensively studied in the literature, no cloud-based solution is ever presented for that –to the best of our knowledge. We used several benchmarks to validate our solutions against the optimal solution computed by dynamic programming. Results are very promising and illustrated reasonable scalability as well as speed up factor for our implementation using Microsoft Azure. Keywords-Knapsack Problem; Genetic Algorithm; Platform as a Service; Microsoft Windows Azure; I. I NTRODUCTION Combinatorial optimization includes solutions to find an optimal set of objects from a finite set of available ob- jects. These solutions have been applied to a wide range of areas involving computer science, operations research, complexity theory, cryptography and applied mathematics. Minimum spanning tree, vehicle routing, traveling salesman, eight queens’ puzzle and Knapsack have been recognized among well-known problems of combinatorial optimization. In particular, Knapsack problem –from the perspective of computer science– commonly arises also in decision-making problems. For instance, Knapsack problem is extensively studied for resource allocation algorithms in Cloud environ- ment with presence of several QoS (Quality of Service) and financial constraints [5], [11]. Knapsack problem is also extensively used in many other optimization problems including –but not limited to– net- work planning, network routing, parallel scheduling, and budgeting [7]. Because this problem is known to be NP- complete [4], it is extremely difficult to solve –finding the optimal solution– when the number of objects is increased. Due to the importance of the potential applications and general difficulty of solving large-sized Knapsack problems, several algorithms have already been developed to provide either approximate or absolute solutions; using dynamic programming, backtracking, branch and bound, and greedy approaches are among the most well-known ones [13], [2]. Despite their acceptable performance in many cases, these algorithms usually share two major drawbacks: (1) they are usually very time consuming, and/or, (2) they do not even converge to solutions with a fair distance to the optimal answer sometimes. Therefore, it motivates researchers to seek other algorithms to solve this NP-complete problem, hopefully more efficient as well as appropriate. Genetic Algorithm (GA) is a robust algorithm that was introduced as an alternative algorithm to solve Knapsack problem. Up to date, many methods have been presented to solve Knapsack using GA. Spillman used a basic GA to pro- duce high-quality solutions –i.e., either the optimal or fairly close to it– to large (10,000 elements or more) cases [13]. Ku and Lee reported “A Set-Oriented Genetic Algorithm” in solving the Knapsack Problem [8]. GA has been also applied to solve the 0-1 knapsack problem [2]. Furthermore, it has also been applied to the 0-1 multiple Knapsack problem [6]. Another study was using GA based on Greedy Strategy in the 0-1 Knapsack problem and described that the greedy algorithm can only obtain approximate solution in a certain range near the optimal solution [15]. To overcome the barriers of real-time environments in handling large-scale Knapsack problems, Parallel Genetic Algorithms (PGA) has also been considered as an appro- priate alternative. It has been shown in recent studies that PGAs have better performance [14], [12], although might be very challenging to implement using current infrastructure. It also means that only large enterprisers and/or organizations can afford expensive costs of appropriate IT infrastructure to implement such solutions. In this study, we attempt to solve the Knapsack problem with our exclusive design PGA in the Cloud, a novel technol- ogy that moves computing and data away from desktop and portable PCs into large data centers. Some works have been done which deployed GA in the Cloud environment. In [16] a simple parallel genetic algorithm (PGA) in the Cloud is presented which called SMRPGA and implemented based on Hadoop. Moreover, they also suggested a Master-slave PGA based on MapReduce (MMRPGA) of Cloud computing plat- form in [15]. However, none of these works attempt to solve any combinatorial optimization solutions with the PGA in the Cloud environment. Hence, our main contribution in this paper is to present a scalable framework that deploys PGA to solve the formidable Knapsack problem in the Cloud. 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing 978-0-7695-4841-8/12 $26.00 © 2012 IEEE DOI 10.1109/3PGCIC.2012.54 303

[IEEE 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC) - Victoria, BC, Canada (2012.11.12-2012.11.14)] 2012 Seventh International

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC) - Victoria, BC, Canada (2012.11.12-2012.11.14)] 2012 Seventh International

Paralleled Genetic Algorithm for Solving the Knapsack Problem in the Cloud

Javid Taheri, Shaghayegh Sharif, Xing Penju, Albert Y. ZomayaCentre for Distributed and High Performance Computing

School of Information Technologies, The University of SydneySydney, NSW 2006, Australia

Email: [email protected]

Abstract—This paper proposes a Parallel Genetic Algorithm(PGA) framework to solve the Knapsack problem in theCloud. Our PGA consists of several independent workers thatcooperatively run in parallel to find optimal solutions for aKnapsack problem; we chose the Knapsack problem because itis known to be NP-Complete and can be used to motivate othercloud-based solutions for other combinatorial problems too.Although this problem has already been extensively studied inthe literature, no cloud-based solution is ever presented for that–to the best of our knowledge. We used several benchmarks tovalidate our solutions against the optimal solution computedby dynamic programming. Results are very promising andillustrated reasonable scalability as well as speed up factorfor our implementation using Microsoft Azure.

Keywords-Knapsack Problem; Genetic Algorithm; Platformas a Service; Microsoft Windows Azure;

I. INTRODUCTION

Combinatorial optimization includes solutions to find anoptimal set of objects from a finite set of available ob-jects. These solutions have been applied to a wide rangeof areas involving computer science, operations research,complexity theory, cryptography and applied mathematics.Minimum spanning tree, vehicle routing, traveling salesman,eight queens’ puzzle and Knapsack have been recognizedamong well-known problems of combinatorial optimization.In particular, Knapsack problem –from the perspective ofcomputer science– commonly arises also in decision-makingproblems. For instance, Knapsack problem is extensivelystudied for resource allocation algorithms in Cloud environ-ment with presence of several QoS (Quality of Service) andfinancial constraints [5], [11].Knapsack problem is also extensively used in many other

optimization problems including –but not limited to– net-work planning, network routing, parallel scheduling, andbudgeting [7]. Because this problem is known to be NP-complete [4], it is extremely difficult to solve –finding theoptimal solution– when the number of objects is increased.Due to the importance of the potential applications andgeneral difficulty of solving large-sized Knapsack problems,several algorithms have already been developed to provideeither approximate or absolute solutions; using dynamicprogramming, backtracking, branch and bound, and greedyapproaches are among the most well-known ones [13], [2].Despite their acceptable performance in many cases, these

algorithms usually share two major drawbacks: (1) they areusually very time consuming, and/or, (2) they do not evenconverge to solutions with a fair distance to the optimalanswer sometimes. Therefore, it motivates researchers toseek other algorithms to solve this NP-complete problem,hopefully more efficient as well as appropriate.Genetic Algorithm (GA) is a robust algorithm that was

introduced as an alternative algorithm to solve Knapsackproblem. Up to date, many methods have been presented tosolve Knapsack using GA. Spillman used a basic GA to pro-duce high-quality solutions –i.e., either the optimal or fairlyclose to it– to large (10,000 elements or more) cases [13].Ku and Lee reported “A Set-Oriented Genetic Algorithm” insolving the Knapsack Problem [8]. GA has been also appliedto solve the 0-1 knapsack problem [2]. Furthermore, it hasalso been applied to the 0-1 multiple Knapsack problem [6].Another study was using GA based on Greedy Strategy inthe 0-1 Knapsack problem and described that the greedyalgorithm can only obtain approximate solution in a certainrange near the optimal solution [15].To overcome the barriers of real-time environments in

handling large-scale Knapsack problems, Parallel GeneticAlgorithms (PGA) has also been considered as an appro-priate alternative. It has been shown in recent studies thatPGAs have better performance [14], [12], although might bevery challenging to implement using current infrastructure. Italso means that only large enterprisers and/or organizationscan afford expensive costs of appropriate IT infrastructureto implement such solutions.In this study, we attempt to solve the Knapsack problem

with our exclusive design PGA in the Cloud, a novel technol-ogy that moves computing and data away from desktop andportable PCs into large data centers. Some works have beendone which deployed GA in the Cloud environment. In [16]a simple parallel genetic algorithm (PGA) in the Cloud ispresented which called SMRPGA and implemented based onHadoop. Moreover, they also suggested a Master-slave PGAbased on MapReduce (MMRPGA) of Cloud computing plat-form in [15]. However, none of these works attempt to solveany combinatorial optimization solutions with the PGA inthe Cloud environment. Hence, our main contribution in thispaper is to present a scalable framework that deploys PGAto solve the formidable Knapsack problem in the Cloud.

2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

978-0-7695-4841-8/12 $26.00 © 2012 IEEE

DOI 10.1109/3PGCIC.2012.54

303

Page 2: [IEEE 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC) - Victoria, BC, Canada (2012.11.12-2012.11.14)] 2012 Seventh International

��������

��������

��������

��������

����

����

��������

����� ����������

��������

��������

��������

��������

����

����

��������

���� ����������

���������� ��� ���

�� � ���

�������

�����

Figure 1. Flowchart of a generic GA

This paper is organized as follows. Section II introducespreliminaries of this work, including: the problem statement,basics of a generic GA, and basics of Windows Azure PaaS(platform as a service) as our Cloud environment. We willpresent our approach in Section III. Section IV includes theresult of our study; Section V presents discussion followedby conclusion in Section VI.

II. PRELIMINARIES

A. Problem statement: the Knapsack Problem

Knapsack problem, as a well-known NP problem, can bedefined as follow. Assume a Knapsack with the maximumcapacity of C, also assume a set of N items/objects (j =

1, . . . , N ), each with the profit value of Pj > 0 and weightwj > 0. The objective of the problem is to select a subsetof N items such that the total profit value of the selecteditems is maximized, while their accumulative weight doesnot exceed C. Knapsack problem can be mathematicallyformulated as follows where xj ∈ {0, 1} [5] :

MAXn∑

j=1

pj × xj s.t.

n∑

j=1

wj × xj ≤ C

B. Genetic Algorithms

Genetic Algorithm (GA) is based on the evolutionaryprocess of biological organisms in nature [9]. A GA-basedsolution always starts with a population, composed of can-didate solutions (chromosomes). After initializing, elitism,crossover and mutation operators are invoked to combinechromosomes of the current population and produce. Thesetasks are repeated until a termination condition is met. Figure1 shows a sample flowchart of a generic GA-based solution.A GA based solution requires a proper definition of the

following aspects: defining a chromosome to represent apossible solution, generating an initial population, defininga fitness function to gauge the quality of each solution,selecting chromosomes for recombination based on theirfitness, defining a mechanism for elitism, defining crossoverand mutation operators to generate offspring chromosomes,and eventually recording history of solutions for betteranalysis of the results [1].

C. Microsoft Windows Azure

Microsoft Windows Azure (Win-Azure) provides a plat-form to run applications as well as to store data on Microsoftdata centers. Software developers can use “.NET” frame-work to create their applications and deploy them on theCloud platform without buying, installing, operating and/ormaintaining their own system infrastructure [10]. Win-Azuremainly includes two parts; (1) running an application, and(2) storing the required data. In Win-Azure’s PaaS, multipleinstances of an application can be concurrently run toprovide the parallelism to properly designed Cloud-basedsolutions. Here, each instance just runs a copy of all orpart of the application code on a Windows Virtual Machine(VM) [3] regarding to its system design. Software developersdo not need to maintain these instances as Win-Azure’splatform automatically manages them. In Win-Azure’s PaaSsolutions, software developers also for further efficiently useWeb Role instances, Worker Role instances, blobs, table, andqueues to build their applications.

AWeb Role works under the Internet Information Service(IIS) and can be implemented by using ASP.NET, or othertechnologies. Through the IIS, the HTTP/HTTPs requestscan be received and processed by Web Role instances. WebRoles usually do not engage in computational parts of asolution; they usually just act as a portal so that Cloudclients can submit their jobs and download their solutions.Computational tasks are usually handled by Worker Roles.Different from a Web Role, a Worker Role does not needIIS configuration. Application data is stored and managed inWin-Azure storages where several services are automaticallyprovided. Blobs store objects such as files and documents–similar to desktop PCs. Queues provide communicationchannels among applications in Win-Azure. For instance, aWeb Role can add messages –usually job descriptions– to aqueue, while Worker Roles are listening to it. Once a WorkerRole receives a message, processes and then deletes it fromthe queue upon its completion. Using queues, Worker Rolescan also communicate with others should they desire. Tablesare used to store a set of structured entities with relevantproperties. They also provide advance functionalities suchas adding, updating, deleting, saving changes, querying theobjects, and also getting the number of objects.

III. PGA IN SOLVING THE KNAPSACK PROBLEM

As mentioned, we used Win-Azure’s PaaS to solve theKnapsack problem using our Cloud-based GA solution. Wewill refer to our approach as PGA-KS throughout this paper.

A. System Architecture

The architecture of our project is composed of four layers:Access Layer, Application Layer, Storage Layer, and Plat-form Layer. Access layer provides access to Internet users.A web page and several graphic interfaces are designed forclients to upload their files containing Knapsack problems

304

Page 3: [IEEE 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC) - Victoria, BC, Canada (2012.11.12-2012.11.14)] 2012 Seventh International

Figure 2. PGA-KS’ Structure

items as well as download their solutions. Application layerprovides basic functionalities based on client demands. Thesoftware application is designed to support high availability,scalability, reliability and fault tolerance. Storage layer storesthe application data; this layer also provides a tool for thesoftware to manage it. Platform layer includes the develop-ment tools, hardware and other facilities and maintains theenvironment to deploy the project in the Cloud.

B. PGA-KS’ structure

PGA-KS uses one Web Role, several Worker Roles, twoblobs, two queues and four tables to perform its computa-tional tasks. Figure 2 shows its structure.1) Web Roles: The only PGA-KS’ Web Role interacts

with end users through a web page and provides the fol-lowing functionalities: file uploading, invoking a solvingalgorithm, and showing the final solution. During the fileuploading stage, a file –containing information regarding aKnapsack’s capacity as well as a list of available items–is unloaded via HTTP/HTTPS to Win-Azure’s storage as ablob. Upon uploading a file, four different algorithms can beinvoked to find the solution for the uploaded Knapsack prob-lem; these are: (1) DP-KS that uses the classical dynamicprogramming approach to solve a Knapsack problem, (2)GA-KS that uses a classical serial GA to solve a Knapsackproblem; i.e., it does not use queues or table in the Cloud, (3)GA-Cloud-KS that uses GA to solve a Knapsack problem onthe Cloud platform through deploying queues, tables usingonly one Worker Role, (4) PGA-KS that is similar to GA-Cloud-KS, however using multiple Worker Roles to provideparallelism. To present the results, each solution includes thechosen items, the used capacity, the total value of the itemsin Knapsack and the total execution time for the chosenalgorithm.2) Worker Roles: Worker roles can process the uploaded

file, send messages, listen to the Job queue and Chromosomequeues, and furthermore query objects from tables. They alsogenerate chromosomes in parallel and store the final solutionin a blob. Each worker is able to solve a Knapsack problemusing any of the four possible algorithms; it picks the solving

algorithm according to the message it receives from the Jobqueue.3) Blob: A blob stores the uploaded file by users and

also stores the solution file.4) Queues: When the Web Role uploaded a file, it sends

a message to the Job queue to alert Worker Roles that a taskneeds to be accomplished. When one of the Worker Rolesreceives such message, it sends other messages –using alsothe Job queue– to inform other Worker Roles as well. Itthen handles the task and assign subtasks to other workersthrough the Chromosomes queue; other Worker Roles alwayslisten to this queue to perform tasks.5) Tables: Several tables are used in our framework. Item

Table is used to store items of the uploaded file. Thus,Worker Roles can read items from this table. ChromosomesTable is used to store chromosomes created by WorkerRoles. The information of a chromosome contains the it-erations number, chromosome ID, total weight of the chro-mosome, and the total value of the chromosomes. WorkerRoles work in parallel whilst listening to queue messages togenerate chromosomes for this table. File Table is used tostore URL link of the submitted files as well as their solutionfile along with the duration time to complete the solution.History Solution Table is used to record history of solutionsobserved from multiple generation phases of the PGA-KS.History solution is used to analyze the performance of PGA-KS.

C. PGA-KS’ Algorithm

Based on our algorithm, a submitted file is received bythe Web Role and then stored in the blob. The Web Rolesends a message to the Job queue to inform Worker Roles ofthe arrival of a new task. All Worker Roles listen to the Jobqueue until one of them receives the message and becomesthe job handler for the given task. In order to process thetask in parallel, the job handler will send messages to otherworkers to inform them that they are the normal workersand their job is to process subtasks only. Other WorkerRoles start listening to another queue, called Chromosomesqueue, and start processing the assigned subtasks. The jobhandler sends messages to Chromosomes queue to informthe workers to create the chromosome. Each message carriedthe information of the chromosomes from the last populationto be used to generate chromosomes for the new population.Algorithm 1 and Figure 3 illustrate overall procedure ofPGA-KS and how its workers run in parallel to createchromosomes for each population. In PGA-KS’ algorithmChrQueue represents Chromosomes queue.

IV. EXPERIMENTAL RESULT

To evaluate PGA-KS –as well as other three algorithms–, we used Windows Azure Cloud with assist of WindowsAzure Platform Management Tool (MMC). MMC enablesusers to easily manage their Windows Azure hosted services

305

Page 4: [IEEE 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC) - Victoria, BC, Canada (2012.11.12-2012.11.14)] 2012 Seventh International

input : Submitted File (InFile)output: Solution File (OutFile)

begin- WebRole stores InFile to a blob;- WebRole sends a message to JobQueue;- One of the WebRoles picks the message fromJobQueue and becomes JobHandler;- JobHandler send other messages into JobQueueto inform other WorkerRoles;- JobHandler initializes the HistoryTable;- JobHandler reads InFile and produce subtasks andsend to ChrQueue to generate the initial population;- Let CurPop be the initial population;- Let Itr ← 0;while (Itr < MaxNumberOfIterations) do

JobHandler produces subtasks from the CurPopand send to ChrQueue. These subtasks involveelitism, crossover and mutation operators tocombine chromosomes of the currentpopulation and produced the next one;

while (ChrQueue is not empty) doWorkerRoles perform subtasks inChrQueue;

endJobHandler collects results of all subtasks andupdate solutions in CurPop;JobHandler updates the HistoryTable;Itr←Itr+1;

endJobHandler stores results into OutFile;

endAlgorithm 1: PGA-KS’ optimization procedure

and storage accounts. The status of Web Roles, WorkerRoles, tables, blobs, and queues –including their messages–are all displayed in the MMC portal. Messages can also beeasily added to queues through MMC.

A. Testing Scenarios

DP-KS was the first algorithm we implemented to find theoptimal solution for each Knapsack case. Execution time ofsuch DP approach is related to the number of items in theset as well as the Knapsack capacity. Therefore, for caseswith very small Knapsack capacities, the DP-KS approachconverges very fast; for larger sized Knapsacks however, ittook a relatively long time to find the optimal solution. Asanother drawback, if several optimal solutions exist for agiven problem, DP-KS can only find one of them.After DP-KS, the other two algorithms and PGA-KS

were implemented to solve Knapsack problems. We solvedmany Knapsack problems to find the best combination ofcrossover probability, mutation probability, as well as thesize of the initial population. Table I presents the best

Figure 3. PGA-KS’ Parallelism

combination of parameters we empirically found. Figure 4shows a sample solution based on the set parameters inTable I for a Knapsack problem with 100 items, Knapsackcapacity of 100, and set the number of iteration to 100which is constant for all testing cases. This figure indicatesthe fitness value –equals to the total value of items putinto a Knapsack– of the best chromosome, the averagefitness value of all chromosomes in a population and thefitness value of the worst chromosomes in a populationacross multiple generations of PGA-KS. As it is shownin this figure, the fitness value of the best chromosome issignificantly improved over the first few iteration and thenalmost saturated –or at least improves in a much slower rate–after that. In this example, PGA-KS converges to a solutionwith total value of 742 –only after 1m:04s of execution–with almost 1/3 of the time DP-KS took to find the optimalsolution with total value of 752. It is also worth noting thatour GA approached reached the optimal solutions with totalvalue of 752 after 2m:05s.We also designed several Knapsack cases with different

number of items and their properties. These testing filesare all XML based documents; the number of items inXML files is varying from 20 to 1000; processing time andfinal solution for four sample Knapsack cases –among manywe generated, but not reflected here– is shown in TableII.Furthermore, to check the scalability of our approaches, we

Table IPGA-KS’ EMPIRICALLY SET PARAMETERS

Name Value

Population Size200 if Number of Items ≤ 300500 if 300 < Number of Items < 10001000 if 1000 ≤ Number of Items

Elitism Rate 10% of populationCrossover Rate 0.4Mutation Rate 0.2

306

Page 5: [IEEE 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC) - Victoria, BC, Canada (2012.11.12-2012.11.14)] 2012 Seventh International

also used different number of cores 1,2,4,8 and 16 cores tofind the optimal solution; in our implementation each coreis assigned to one Worker Role.

V. DISCUSSION AND ANALYSIS

Our close examination of the results revealed that PGA-KS’ performance can be greatly influenced by severalnumber of factors. Except the obvious number of WorkerRoles that directly reduce the execution time of PGA-KS,hidden factors –mostly related to the Win-Azure Cloudinfrastructure– could also equally increase or decrease itsperformance. We highlighted our observation through thefollowing subsections.

A. Accuracy of PGA-KS

TableII shows that PGA-KS was able to converge to avery close approximate of the optimal solution in all cases–regardless of the problem size. This table also shows therational behavior of PGA-KS in handling larger cases; as canbe seen, the larger the size of the problem, the less accuratePGA-KS.

B. Execution Time of PGA-KS

The other important columns in Table II is related tothe execution time of PGA-KS for different problem sizesand the number of Worker Roles (cores). As could also beexpected, PGA-KS took longer time when using smallernumber of cores; it also took less time for smaller sizeproblems. It is however worth noting that in many of thesecases, PGA-KS managed to find the optimal answer in muchless time, however did not stop its iterations hopping tofind better solutions. Figure 5 shows the quality of thebest chromosome (PGA-KS’ solution) for the second rowof Table II –i.e., 50 items with Knapsack capacity of 100–when 4 (Best-W4), 8(Best-W8), or 16(Best-W16) cores areused. Obviously, all these PGA-KS instances managed tofind their solutions much sooner than the termination oftheir iterations. Results of Table II can also be used tofind the speed-up factor of PGA-KS when it used largernumber of cores. Table III indicates the speed-up factors andit reveals that PGA-KS’s speed-up is much better when 4 or

��

����

����

����

����

����

���

���

����

�������� �������� �������� �������

��������

���

��� ����

����

�������������������

��� ����� ������

Figure 4. A Sample PGA-KS’ Solving History

����

����

����

����

����

����

����

����

�������� �������� �������� �������� ��������

��������

���

��� ����

����

�������������������

�� ��� �� ��� �� ����

Figure 5. Execution time for PGA-KS for the second row of Table II

Table IIIPGA-KS’ SPEED-UP

Knapsack Number of CoresItems Capacity 1 2 4 8 1620 50 1.0 1.8 3.9 6.5 9.550 100 1.0 1.7 3.3 5.4 8.9100 150 1.0 1.9 3.4 5.8 8.7500 200 1.0 1.7 2.3 4.2 8.0

8 number of cores are used; when PGA-KS used 16 coresit could only speed-up its computation by a factor of almost8. Further examination of our implementation, we found themain cause of such speed-up degradation for larger numberof cores –i.e., parallelism factor which is detailed in the nextsubsection.

C. Win-Azure PaaS influence on PGA-KS

PGA-KS inherited some of its characteristic from Win-Azure as the platform used to implement it. For example,PGA-KS is very fault tolerant in the sense that its execu-tion can never be interrupted; i.e., every time one of itsmajor components –web role, worker roles, blobs, table,and queues– fails, Win-Azure will automatically replace itwith a healthy one. Furthermore, storage facility of Win-Azure always protects submitted files as well as their resultby automatic replication. Besides these many benefits, Win-Azure also introduces several flaws that are beyond softwaredevelopers’ power to optimize and/or escape. For example,accessing blobs –read or write– can easily become one’s im-plementation bottleneck if frequently used in a development!PGA-KS was not an exception here as well, after close

examination of trace files produced during execution of ourcase studies, we realized that tables in Win-Azure are couldbe slow for our development. In fact, for large Knapsackcases, we noticed that almost 30% of times our WorkerRoles are busy reading or writing to different designed tablesto provide our solutions. We also perceived that althoughwe used two queues in our development, they rarely delayresponding their requests. However, due to this fact thatPGA-KS’ parallelism greatly depends on frequent seeking ofsub-tasks from these queues, PGA-KS’ internal mechanismof seeking sub-tasks could also greatly impact its speed-updegradation. In PGA-KS, each Worker Role seeks a sub-

307

Page 6: [IEEE 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC) - Victoria, BC, Canada (2012.11.12-2012.11.14)] 2012 Seventh International

Table IIRESULT OF PGA-KS FOR DIFFERENT KNAPSACK CASES

1 Core 2 Cores 4 Cores 8 Cores 16 CoresItems Capacity Time Solution Time Solution Time Solution Time Solution Time Solution Best Solution20 50 3:01 244 1:42 244 0:47 244 0:28 244 0:19 244 24450 100 13:24 619 7:47 621 4:02 620 2:29 624 1:30 626 626100 150 1:35:34 1128 0:49:48 1119 28:01 1201 16:32 1231 11:02 1233 1236500 200 3:14:56 2298 1:55:26 2318 1:23:44 2389 46:43 2492 24:17 2501 2545

task form the chromosome queue with time intervals of1s. Therefore, if no sub-task exists in the queue –on thatparticulate moment–, then a worker could be inefficientlyidled for 1s. Such small defects could easily degrade PGA-KS when it is run with larger degrees of parallelism; themain reason that PGA-KS with 16 cores could only solve aproblem in almost 8 times faster than a single core!

VI. CONCLUSION AND FUTURE WORK

Knapsack problem –as a well-known combinatorial opti-mization solution– is used to find the optimal configurationof many real-world problems. In this paper, we present aframework to solve Knapsack problem in the Cloud environ-ment; our solution is a scalable Parallel Genetic Algorithm(PGA-KS) designed and implemented in Win-Azure’s PaaSenvironment. Our experimental results illustrate that PGA-KS could use parallelism power of the Cloud and achievedsignificant speed-ups when executed. PGA-KS was alsoaccurate in the sense that it could find optimal solution –or fairly close to it for larger instances– of many Knapsackcases; some reflected in this paper and others just to checkthe overall performance of our algorithm.There exist several drawbacks which are enforced to our

algorithm by Win-Azure. For example, we realized that ourvery frequent access to Win-Azure tables slows down manyof our parallelized processes and resulted in almost 50% ofwhat we expected for our 16 core executions. We also noticethat queues could become bottlenecks if not programmed tobe accessed efficiently. Based on these results, we like toredesign several parts of our PGA-KS to overcome manyof these preventable issues; e.g., designs with minimumaccess to tables and less frequent access to queues. Anotherdirection for our future work is to use more efficient databasehandlers –such as SQL Azure– to store our large tables.

VII. ACKNOWLEDGMENT

The authors would like to acknowledge Azure resourcesdonated by the Microsoft Cloud Research EngagementsProject for developing PGA-KS.

REFERENCES

[1] P. Chu and J. Beasley. A genetic algorithm for the multidi-mensional knapsack problem. Journal of heuristics, 4(1):63–86, 1998.

[2] M. Hristakeva and D. Shrestha. Solving the 0/1 knapsackproblem with genetic algorithms. In Science & Math Under-graduate Research Symposium, 2004.

[3] R. Jennings. Cloud Computing with the Windows AzurePlatform. Wrox, 2010.

[4] D. Johnson and M. Garey. Computers and intractability: Aguide to the theory of np-completeness. Freeman&Co, SanFrancisco, 1979.

[5] H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack problems.Springer Verlag, 2004.

[6] S. Khuri, T. Back, and J. Heitkotter. The zero/one multipleknapsack problem and genetic algorithms. In Proceedings ofthe 1994 ACM symposium on Applied computing, SAC ’94,pages 188–193, New York, NY, USA, 1994. ACM.

[7] H. Kita, Y. Yabumoto, N. Mori, and Y. Nishikawa. Multi-objective optimization by means of the thermodynamicalgenetic algorithm. Springer Berlin / Heidelberg, 1996.

[8] S. Ku and B. Lee. A set-oriented genetic algorithm andthe knapsack problem. In Evolutionary Computation, 2001.Proceedings of the 2001 Congress on, volume 1, pages 650–654. IEEE, 2001.

[9] K. Li, G. Dai, and Q. Li. A genetic algorithm for theunbounded knapsack problem. In Machine Learning andCybernetics, 2003 International Conference on, volume 3,pages 1586–1590. IEEE, 2003.

[10] T. Redkar and T. Guidici. Windows Azure Platform. Apress,2011.

[11] A. Schrijver. Combinatorial optimization. Springer, 2003.

[12] X. Shengjun, G. Shaoyong, and B. Dongling. The analysisand research of parallel genetic algorithm. In WirelessCommunications, Networking and Mobile Computing, 2008.WiCOM’08. 4th International Conference on, pages 1–4.IEEE, 2008.

[13] R. Spillman. Solving large knapsack problems with a geneticalgorithm. In Systems, Man and Cybernetics, 1995. IntelligentSystems for the 21st Century., IEEE International Conferenceon, volume 1, pages 632–637. IEEE, 1995.

[14] Z. Wang, T. Ju, D. Cui, and X. Hei. A study of hybrid parallelgenetic algorithm model. In Natural Computation (ICNC),2011 Seventh International Conference on, volume 2, pages1038–1042. IEEE, 2011.

[15] J. Zhao, T. Huang, F. Pang, and Y. Liu. Genetic algorithmbased on greedy strategy in the 0-1 knapsack problem. InGenetic and Evolutionary Computing, 2009. WGEC’09. 3rdInternational Conference on, pages 105–107. IEEE, 2009.

[16] J. Zhao, W. Zeng, G. Li, and M. Liu. Simple parallel geneticalgorithm using cloud computing. Applied Mechanics andMaterials, 121:4151–4155, 2012.

308