Click here to load reader

K-means Clustering

Embed Size (px)

DESCRIPTION

K-means Clustering. Group 15 Swathi Gurram Prajakta Purohit. Goal. To program K-means on Twister (Iterative Map-Reduce) and Hadoop (Map - Reduce) and see how the change of framework effects the implementation time. Survey. Twister Configurable long running (cacheable) map/reduce tasks - PowerPoint PPT Presentation

Citation preview

K-means Clustering

Group 15Swathi GurramPrajakta PurohitK-means ClusteringGoalTo program K-means on Twister (Iterative Map-Reduce) and Hadoop(Map - Reduce) and see how the change of framework effects the implementation time.

SurveyTwisterConfigurable long running (cacheable) map/reduce tasks Pub/sub messaging based communication/data transfers Efficient support for Iterative MapReduce computationCombine phase to collect all reduce outputs Data access via local disksSurveyHadoop: a software framework that supports data-intensive distributed applicationsUses Map- reduce programming modelit's own filesystem ( HDFS Hadoop Distributed File System based on the Google File System) which is specifically tailored for dealing with large filescan intelligently manage the distribution of processing and your files, and breaking those files down into more manageable chunks for processing

SurveyHaloop : a modified version of the Hadoop MapReduce frameworkprovide caching options for loop-invariant data accesslet users reuse major building blocks from applications' Hadoop implementationshave similar intra-job fault-tolerance mechanisms to Hadoop.HaLoop reduces query runtimes by 1.85 compared with Hadoop

K-means Clustering

K-means Clustering

Twister K-means

Hadoop K-means

Implementation TimelineWeekTaskTeam memberOct 24th Oct 31st Understand K-means algorithm and designPrajakta, SwathiNov 1st Nov 7th Implement K-means Prajakta, SwathiNov 8th Nov 21st Implement K-means on Twister and performance analysisPrajakta, SwathiNov 21st Nov 28th Optimized validation method for Kmeans algorithmPrajakta, SwathiNov 29th Dec 3rd Implement K-means on HadoopPrajakta, SwathiDec 4th Dec 5th Performance Analysis and PresentationPrajakta, SwathiDec 6th Dec 12th Final Technical report

Prajakta, SwathiValidation methods

ConclusionTwister framework is faster than Hadoop for iterative map- reduce applications.Referenceshttp://salsahpc.indiana.eduhttp://www.iterativemapreduce.org/samples.htmlhttp://hadoop.apache.org/http://en.wikipedia.org/wiki/Apache_Hadoophttp://clue.cs.washington.edu/node/14http://code.google.com/p/haloop/http://www.cs.washington.edu/homes/billhowe/pubs/HaLoop.pdf

Demo Thank you