23
Parallelizing a Co- Clustering Application with a Reduction Based Framework on Multi-Core Clusters Venkatram Ramanathan 1

Venkatram Ramanathan

  • Upload
    naeva

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Venkatram Ramanathan. Parallelizing a Co-Clustering Application with a Reduction Based Framework on Multi-Core Clusters. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental Evaluation Conclusion. - PowerPoint PPT Presentation

Citation preview

Page 1: Venkatram Ramanathan

Parallelizing a Co-Clustering Application with a Reduction

Based Framework on Multi-Core Clusters

Venkatram Ramanathan

1

Page 2: Venkatram Ramanathan

OutlineMotivation

Evolution of Multi-Core Machines and the challenges

Background: MapReduce and FREERIDE

Co-clustering on FREERIDE Experimental EvaluationConclusion

2

Page 3: Venkatram Ramanathan

Motivation - Evolution Of Multi-Core Machines

Performance Increase: Increased number of cores with lower

clock frequencies Cost Effective Scalability of performance

HPC Environments – Cluster of Multi-Cores

3

Page 4: Venkatram Ramanathan

Challenges

Multi-Level Parallelism Within Cores in a node – Shared

Memory Parallelism - Pthreads, OpenMP Within Nodes – Distributed Memory

Parallelism - MPI Achieving Programmability and Performance – Major Challenge

4

Page 5: Venkatram Ramanathan

Challenges

Possible solutionUse higher-level/restricted APIsReduction based APIs

Map-ReduceHigher-level APIProgram Cluster of Multi-Cores with 1 APIExpressive Power Considered Limited

Expressing computations using reduction-based APIs

5

Page 6: Venkatram Ramanathan

Background

MapReduceMap (in_key,in_value) ->

list(out_key,intermediate_value)Reduce(out_key,list(intermediate_value) -> list(out_value)

FREERIDEUsers explicitly declare Reduction Object

and update itMap and Reduce steps combined Each data element – processed and reduced

before next element is processed6

Page 7: Venkatram Ramanathan

MapReduce and FREERIDE: Comparison

7

Page 8: Venkatram Ramanathan

Co-clustering

Involves simultaneous clustering of rows to row clusters and columns to column clusters

Maximizes Mutual Information Uses Kullback-Leibler Divergence

x

xqxpxpqpKL ))()(log()(),(

8

Page 9: Venkatram Ramanathan

Overview of Co-clustering Algorithm – Preprocessing

9

Page 10: Venkatram Ramanathan

Overview of Co-clustering Algorithm – Iterative Procedure

10

Page 11: Venkatram Ramanathan

Parallelizing Co-clustering on FREERIDE

Input matrix and its transpose pre-computed Input matrix and transpose

Divided into files Distributed among nodes Each node - same amount of row and column data

rowCL and colCL – replicated on all nodes Initial clustering

Round robin fashion - consistency across nodes

11

Page 12: Venkatram Ramanathan

Parallelizing Preprocess Step

In Preprocessing, pX and pY – normalized by total sum

Wait till all nodes process to normalize Each node calculates pX and pY with local data Reduction object updated partial sum, pX and pY

values Accumulated partial sums - total sum pX and pY normalized

xnorm and ynorm calculated in second iteration as they need total sum

12

Page 13: Venkatram Ramanathan

Parallelizing Preprocess Step

Compressed Matrix of size #rowclusters x #colclusters, calculated with local data Sum of values of values of each row cluster

across each column cluster Final compressed matrix -sum of local

compressed matrices Local compressed matrices – updated in

reduction object Produces final compressed matrix on

accumulation Cluster Centroids calculated

13

Page 14: Venkatram Ramanathan

Parallelizing Iterative Procedure

Reassign clusteringDetermined by Kullback-Leibler divergence Reduction object updated

Compute compressed matrix Update reduction object

Column Clustering – similar Objective function – finalize Next iteration

14

Page 15: Venkatram Ramanathan

Parallelizing Co-clustering on FREERIDE

15

Page 16: Venkatram Ramanathan

Parallelizing Iterative Procedure

16

Page 17: Venkatram Ramanathan

Experimental Results

Algorithm - same for shared memory, distributed memory and hybrid parallelization

Experiments conducted 2 clusters env1

Intel Xeon E5345 Quad Core Clock Frequency 2.33 GHz Main Memory 6 GB 8 nodes

env2 AMD Opteron 8350 CPU 8 Cores Main Memory 16 GB 4 Nodes

17

Page 18: Venkatram Ramanathan

Experimental Results

2 Datasets 1 GB Dataset

Matrix Dimensions 16k x 16k 4 GB Dataset

Matrix Dimensions 32k x 32k Datasets and transpose

Split into 32 files each (row partitioning) Distributed among nodes

Number of row and column clusters: 4

18

Page 19: Venkatram Ramanathan

Experimental Results

19

Page 20: Venkatram Ramanathan

Experimental Results

20

Page 21: Venkatram Ramanathan

Experimental Results

21

Page 22: Venkatram Ramanathan

Experimental Results

Preprocessing stage – bottleneck for smaller dataset – not compute intensive

Speedup with Preprocessing : 12.17 Speedup without Preprocessing: 18.75 Preprocessing stage scales well for Larger

dataset – more computation Speedup is the same with and without

preprocessing. Speedup for larger dataset : 20.7

22

Page 23: Venkatram Ramanathan

Conclusion

FREERIDE Offers the Following Advantages: No need for loading data in custom file-systems C/C++ based frameworkMuch better performance (comparison for other

algorithms) Co-clusterings can be viewed as

generalized reduction Implementing them on FREERIDE

Speedup of 21 on 32 cores.

23