Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and...

Preview:

Citation preview

Explorations into Internet Distributed Computing

Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Project Overview

Design and implement a simple internet distributed computing framework

Compare application development for this environment with traditional parallel computing environment.

Grapevine

An Internet Distributed Computing Framework- Kunal Agrawal, Kevin Chu

What is Internet Distributed Computing?

Motivation

Supercomputers are very expensiveLarge numbers of personal computers and workstations around the world are naturally networked via the internetHuge amounts of computational resources are wasted because many computers spend most of their time idleGrowing interest in grid computing technologies

Other Distributed Computing Efforts

Internet Distributed Computing Issues

Nodes reliabilityNetwork qualityScalability SecurityCross platform portability of object codeComputing Paradigm Shift

Overview Of Grapevine

Client Application

Grapevine Server

Grapevine Volunteer

Grapevine Volunteer

Grapevine Volunteer

Grapevine Features

Written in JavaParametrized Tasks Inter-task communicationResult ReportingStatus Reporting

Un-addressed Issues

Node reliabilityLoad BalancingUn-intrusive OperationInterruption SemanticsDeadlock

Meta Classifier

- Ang Huey Ting, Li Guoliang

Classifier

Function(instance) = {True,False}Machine Learning Approach Build a model on the training set Use the model to classify new

instance

Publicly available packages : WEKA(in java), MLC++.

Meta Classifier

Assembly of classifiersGives better performanceTwo ways of generating assembly of classifiers Different training data sets Different algorithms

Voting

Building Meta Classifier

Different Train Datasets - Bagging Randomly generated ‘bags’ Selection with replacement Create different ‘flavors’ of the

training set

Different Algorithms E.g. Naïve Bayesian, Neural Net, SVM Different algorithms works well on

different training sets

Why Parallelise?

Computationally intensiveOne classifier = 0.5 hrMeta classifier (assembly of 10 classifiers)

= 10 *0.5 = 5 hr

Distributed Environment - Grapevine Build classifiers in parallel

independently Little communication required

Distributed Meta Classifiers

WEKA- machine learning package University of Waikato, New Zealand http://www.cs.waikato.ac.nz/~ml/

weka/ Implemented in Java Including most popular machine

learning tools

Distributed Meta-Classifiers on Grapevine

Distributed Bagging Generate different BagsDefine bag and Algorithm for each taskSubmit tasks to GrapevineNode build ClassifiersReceive results Perform voting

Preliminary Study

Bagging on Quick Propagation in openMP Implemented in C

Trial Domain

Benchmark corpus Reuters21578 for Text Categorization 9000+ train documents 3000+ test documents 90+ categories Perform feature selection Preprocess documents into feature

vectors

Summary

Successful internet distributed computing requires addressing many issues outside of traditional computer science Distributed computing is not for everyone