4
Distributed Query Processing Plans Generation using Teacher Learner based Optimization Vikash Mishra a , Vikram Singh a a Department of Computer Engineering, National Institute of Technology, Kurukshetra, Haryana 136119, India, E-mail: [email protected], [email protected] With the growing popularity, the number of data sources and the amount of data has been growing very fast in recent years. The distribution of operational data on disperse data sources impose a challenge on processing user queries. In such database systems, the database relations required by a query to answer may be stored at multiple sites. This leads to an exponential increase in the number of possible equivalent or alternatives of a user query. Though it is not computationally reasonable to explore exhaustively all possible query plans in a large search space, thus a strategy is requisite to produce optimal query plans in distributed database systems. The query plan with most cost-effective option for query processing is measured necessary and must be generated for a given query. This paper attempts to generate such optimal query plans using a parameter less optimization technique Teaching- Learner Based Optimization (TLBO). The TLBO algorithm was experiential to go one better than the other optimization algorithms for the multi-objective unconstrained and constrained benchmark problems. Experimental comparisons of TLBO based optimal plan generation with the multi-objective genetic algorithm based distributed query plan generation algorithm shows that for higher number of relations, the TLBO based algorithm is able to generate comparatively better quality Top-K query plans. Keywords : Aggregation based Genetic Algorithm, Distributed Query Processing, Query Plans, Query Optimization, Teacher Learner Based Optimization, Top-K, Vector Evaluated Genetic Algorithm 1. INTRODUCTION Database systems of different type use vari- ous techniques to identify optimal query plans. In many application domains, end-users are more interested in the most important (top-k) query answers in potentially huge answer space [1]. A distributed database encompasses co- herent data, spread across various operational autonomous sites of a computer network [2]. With the increase in the number of users or expanding organization requirement, the size of database networks also expands. In recent years, it is observed that, in any distributed database the number of data sources and the amount of data has been growing very fast to meet the challenge of growing popularity or business expansion. A Distributed Database Management Sys- tem (DDBMS) deals with managing such dis- tributed databases. DDBMS provides access to user via a simple and unified interface over dis- perse databases through different transparency mechanism, due to which a user feel as if they were not distributed [3]. The query process- ing is a prime activity in database and it is controlled by DBMS. The performance of a DBMS is determined by its ability to process queries in an effective and efficient manner [4]. The query processing in distributed databases is more intricate, as there various parameters or constraints involve and they are affecting performance [1]. Query processing connects to many database research areas, including query optimization, indexing methods, and query languages [5], [6]. As a consequence, the impact of efficient pro- cessing of query is becoming apparent in an in- creasing number of applications. One common way to identify the top-k objects is scoring all 46 International Journal of Information Processing, 9(4), 46-60, 2015 ISSN : 0973-8215 IK International Publishing House Pvt. Ltd., New Delhi, India

Distributed Query Processing Plans Generation …4)/p5.pdfDistributed Query Processing Plans Generation using Teacher Learner based Optimization Vikash Mishra a, Vikram Singh aDepartment

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Query Processing Plans Generation …4)/p5.pdfDistributed Query Processing Plans Generation using Teacher Learner based Optimization Vikash Mishra a, Vikram Singh aDepartment

Distributed Query Processing Plans Generation using

Teacher Learner based Optimization

Vikash Mishraa, Vikram Singha

aDepartment of Computer Engineering, National Institute of Technology, Kurukshetra, Haryana136119, India, E-mail: [email protected], [email protected]

With the growing popularity, the number of data sources and the amount of data has been growing veryfast in recent years. The distribution of operational data on disperse data sources impose a challenge onprocessing user queries. In such database systems, the database relations required by a query to answermay be stored at multiple sites. This leads to an exponential increase in the number of possible equivalentor alternatives of a user query. Though it is not computationally reasonable to explore exhaustively allpossible query plans in a large search space, thus a strategy is requisite to produce optimal query plansin distributed database systems. The query plan with most cost-effective option for query processingis measured necessary and must be generated for a given query. This paper attempts to generate suchoptimal query plans using a parameter less optimization technique Teaching- Learner Based Optimization(TLBO). The TLBO algorithm was experiential to go one better than the other optimization algorithmsfor the multi-objective unconstrained and constrained benchmark problems. Experimental comparisons ofTLBO based optimal plan generation with the multi-objective genetic algorithm based distributed queryplan generation algorithm shows that for higher number of relations, the TLBO based algorithm is able togenerate comparatively better quality Top-K query plans.

Keywords : Aggregation based Genetic Algorithm, Distributed Query Processing, Query Plans, QueryOptimization, Teacher Learner Based Optimization, Top-K, Vector Evaluated Genetic Algorithm

1. INTRODUCTION

Database systems of different type use vari-ous techniques to identify optimal query plans.In many application domains, end-users aremore interested in the most important (top-k)query answers in potentially huge answer space[1]. A distributed database encompasses co-herent data, spread across various operationalautonomous sites of a computer network [2].With the increase in the number of users orexpanding organization requirement, the sizeof database networks also expands. In recentyears, it is observed that, in any distributeddatabase the number of data sources and theamount of data has been growing very fast tomeet the challenge of growing popularity orbusiness expansion.

A Distributed Database Management Sys-tem (DDBMS) deals with managing such dis-

tributed databases. DDBMS provides access touser via a simple and unified interface over dis-perse databases through different transparencymechanism, due to which a user feel as if theywere not distributed [3]. The query process-ing is a prime activity in database and it iscontrolled by DBMS. The performance of aDBMS is determined by its ability to processqueries in an effective and efficient manner [4].The query processing in distributed databasesis more intricate, as there various parametersor constraints involve and they are affectingperformance [1].

Query processing connects to many databaseresearch areas, including query optimization,indexing methods, and query languages [5], [6].As a consequence, the impact of efficient pro-cessing of query is becoming apparent in an in-creasing number of applications. One commonway to identify the top-k objects is scoring all

46

International Journal of Information Processing, 9(4), 46-60, 2015ISSN : 0973-8215IK International Publishing House Pvt. Ltd., New Delhi, India

Page 2: Distributed Query Processing Plans Generation …4)/p5.pdfDistributed Query Processing Plans Generation using Teacher Learner based Optimization Vikash Mishra a, Vikram Singh aDepartment

58 Vikash Mishra and Vikram Singh

Figure 4. Generation of Optimal Top-K query plans with QueryCost (K=5 and QC=0.6, K=10and QC=0.4, K=15 and 0.2) for different Number of Sites (Ns) and relations sizes (Nr) in num-ber of iterations or generations of Algorithm (TLBO, Multi-Objective VEGA and Aggregationbased GA with, crossover probability (Pc)= 0.8, mutation probability (Pm)=0.2, WeightQAC=0.2,WeightQLC=0.5, WeightLPC=0.3)

various criteria are used to illustrate the per-formance for optimal query plans generation.

5. CONCLUSIONS

In distributed database system data is dis-persed over the multiple sites, this distribu-tion of data is based on partition or replica-tion based due to which a given relation canbe found in more than one sites. Query pro-cessing in such environment is difficult task forquery processor, as multiple equivalent alterna-tives. Query processing in such environment,major objective design are CPU, I/O and thesite-to-site communication cost, among these,the site-to-site communication cost is the domi-nant cost. This requires an optimization mech-anism to generate optimal set of query plans toretrieve results for user query. Multi-objectiveoptimization is a very important research areain engineering studies, because real-world de-sign problems require the optimization of agroup of objectives.

Multiple, often conflicting, objectives arise nat-urally in most real-world optimization scenar-

ios. Adding more than one objective to an op-timization problem adds complexity. In thispaper, TLBO algorithm is employed to gener-ate optimal top-k query plans for proposed de-sign objectives and in unconstrained functionand its performance is compared with nature-inspired optimization technique, genetic algo-rithm (GA). The experimental results showthat the TLBO performs competitively bet-ter on the generation top-k query plans withother optimization methods. Therefore, theTLBO algorithm is effective and robust andhas a great potential for solving similar multi-objective problems. The optimality on genera-tion of query plans by other swarm based opti-mization techniques is part of our future workbased on the similar design objectives.

REFERENCES

1. Alom B M, Henskens F, Hannaford M. QueryProcessing and Optimization in DistributedDatabase Systems, in International Journalof Computer Science and Network Security,9(9):143-152, September 2009.

2. Stefano Ceri, Giuseppe Pelagatti. Dis-

Page 3: Distributed Query Processing Plans Generation …4)/p5.pdfDistributed Query Processing Plans Generation using Teacher Learner based Optimization Vikash Mishra a, Vikram Singh aDepartment

Distributed Query Processing Plans Generation using Teacher Learner based Optimization 59

tributed Databases Principles and Systems, inMcGraw-Hill Inc., 1984.

3. Tamer Ozsu M, Patrick Valduriez. Principles ofDistributed Database System, Prentice-Hall,Inc., 1999.

4. Rho S, March S T. Optimizing DistributedJoin Queries: A Genetic Algorithmic Ap-proach, Annals of Operations Research, 71:199-228, 1997.

5. Ioannidis Y E, Younkyung Kang. RandomizedAlgorithms for Optimizing Large Join Queries,ACM SIGMOD Record, 19(2):312-321, June1990.

6. Ioannidis Y E, Younkyung Kang. Left-deep vs.Bushy Trees: An Analysis of Strategy Spacesand its Implications for Query Optimization,In ACM SIGMOD ICMD Proceedings, pages168-177 May 1991.

7. Singh Vikram, Mishra Vikash. DistributedQuery Plan Generation using Aggregationbased Multi-Objective Genetic Algorithm. InACM ICTCS Proceedings, pages 1-8, Novem-ber 2014.

8. Hongbin Dong, Yiwen Liang. Genetic Algo-rithms for Large Join Query Optimization, In9th ICGEC Proceedings, pages 07-11, 2007.

9. Liu C, Yu C. Performance Issues in DistributedQuery Processing, in IEEE Transactions onParallel and Distributed Systems, 4(8):889-905,August 1993.

10. Chengwen Liu, Hao Chen, Warren Krueger. ADistributed Query Processing Strategy usingPlacement Dependency, In IEEE 12th ICDEProceedings, pages 477-484, February 1996.

11. Vijay Kumar T V, Singh V, Verma, A K.Distributed Query Processing Plans Genera-tion using Genetic Algorithm. in InternationalJournal of Computer Theory and Engineering,9(31):38-45, 2011.

12. Rao R V, Savsani V J, Vakharia D P. TeachingLearning based Optimization: An Optimiza-tion Method for Continuous Non-Linear LargeScale Problems, Information Science, 183: 1-15, 2012.

13. Rao R V, Patel V. An Elitist TeachingLearn-ing based Optimization Algorithm for SolvingComplex Constrained Optimization Problems,Int. J. Ind. Eng. Computation, 3(4):535-560,2012.

14. Niknam T, Fard A K, Baziar A. Multi-Objective Stochastic Distribution Feeder Re-configuration Problem Considering Hydrogenand Thermal Energy Production by Fuel Cell

Power Plants, Energy, pages 563-573, 2012.15. Gregory M. Genetic algorithm optimization of

Distributed Database Queries, In IEEE WorldCongress on Computational Intelligence Pro-ceedings, pages 271-276, 1998.

16. Matthias Jarke , Jurgen Koch. Query Opti-mization in Database Systems, ACM Com-puting Surveys (CSUR), 16(2):111-152, June1984.

17. Ioannidis Y E, Eugene Wong. Query Opti-mization by Simulated Annealing, In ACMSIGMOD ICMD Proceedings, pages 9-22, May1987.

18. Michael Stillger, Myra Spiliopoulou. GeneticProgramming in Database Query optimization,In First Annual Conference on Genetic Pro-gramming Proceedings, pages 28-31, July 1996.

19. Arun Swami , Anoop Gupta. Optimization ofLarge Join Queries, In ACM SIGMOD ICMDProceedings, pages 8-17, June 1988.

20. Vijay Kumar T V, Singh V, Verma, A K.Generating Distributed Query Plans using Ge-netic Algorithm, In IEEE ICDSDE Proceed-ings, pages 173-177, March 2010.

21. Vijay Kumar T V, Panikar Shina. GeneratingQuery Plans for Distributed Query Processingusing Genetic Algorithm, In Springer ICICAProceedings, pages 765-772, October 2011.

22. Mishra Vikash, Singh Vikram. Generating Op-timal Query Plans for Distributed Query Pro-cessing using Teacher-Learner Based Opti-mization, In 11th ELSEVIR ICDMW Proceed-ings, pages 181-290, August 2014.

23. Rao R V, Savsani V J, Vakharia D P. Teaching-Learning based Optimization: A Novel Methodfor Constrained Design optimization Problems.Computer-Aided Design, 43(3): 303-315, 2011.

24. Niknam T, Golestaneh F, Sadeghi M S. Multi-Objective TeachingLearning-based Optimiza-tion for Dynamic Economic Emission Dispatch,in IEEE Syst. J., 6(2):341-352, 2012.

25. Togan V. Design of Planar Steel Frames usingTeaching-Learning Based Optimization, Engi-neering Structures, 34:225-232, 2012.

26. Repinsek M C, Liu S H, Mernik L. A Noteon TeachingLearning-based Optimization Al-gorithm, Information Science, 212(1):79-93,2012.

27. Peter Bodorik, Spruce Riordon J. DistributedQuery Processing Optimization Objectives, InIEEE ICDE Proceedings, pages 320-329,1988.

28. Lin S, Kernighan B W. An Effective Heuristic

Page 4: Distributed Query Processing Plans Generation …4)/p5.pdfDistributed Query Processing Plans Generation using Teacher Learner based Optimization Vikash Mishra a, Vikram Singh aDepartment

60 Vikash Mishra and Vikram Singh

Algorithm for the Travelling-Salesman Prob-lem, Operations Research, pages 498-516, 1973.

29. Mendes R, Kennedy J, Neves J. Watch ThyNeighbour or How the Swarm can Learn fromits Environment, In IEEE Swarm IntelligenceSymposium Proceedings, pages 88-94, 2003.

Vikash Mishra has completedmasters degree in ComputerEngineering with specializationof Database Systems and appli-cations from National Instituteof Technology Kurukshetra,Haryana, (NIT) in, June 2015.He is currently working as Jun

-ior Associate Software Developer in Nagarro Soft-ware Pvt. Ltd., Gurgaon, Haryana, India. Hisareas of interest are mostly on Database Systems,Big Data Analytics, and Optimization Techniques.

Vikram Singh is AssistantProfessor in the Computer En-gineering Department, NationalInstitute of Technology, Ku-rukshetra, Haryana, India sinceyear 2012. His research interestsare in Advanced DatabaseSystems, Big Data Analytics,

Query Processing and Optimization, DataSapceSystems, Personal Information Management Sys-tesm. He is a Member of the IEEE and ACMSociety.