AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of...

AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing

and Storage of CReSIS Polar Data

Mentor: Je’aime Powell, Dr. Mohammad Hasan

Members: JerNettie Burney, Jean Bevins, Cedric Hall, Glenn M. Koch

Abstract

The primary focus of this research was to explore the capabilities of Hadoop as a software package to process, store and manage CReSIS polar data in a clustered environment. The investigation involved Hadoop functionality and usage through reviewed publications.The team’s research was aimed at determining if Hadoop was a viable software package to implement on the Elizabeth City State University (ECSU) Umfort computing cluster. Utilizing case studies; processing, storage, management, and job distribution methods were compared. A final determination of the benefits of Hadoop for the storing and processing of data on the Umfort cluster was then made.

INTRODUCTION

• Hadoop is a set of open source technologies

• Hadooporiginated from the open source web search engine, Apache Nutch.

• Hadoopwas adopted by over 100 different companies

Hadoop Functionality

• Hadoopis broken down into different parts• Some of the more imperative components of

Hadoop include MapReduce, Zookeeper, HDFS, Hive, Jobtracker, Namenode, and HBase.

• Hadoop’sadaptive functionalities allow various organizations’ needs to be met.

Functionality

HadoopMapReduce

Zookeeper

JobTracker

NameNode

• Framework that processes large datasets• MapReduce is broken down into two steps• Maps out operation to servers and reduces the

results into a single result set

MapReduce

• Data warehouse infrastructure• Goal is to provide acceptable wait times for

data browsing, and queries over small data sets or test queries

• Used to maintain configuration information, manage computer naming schemes, provide distributed synchronization, and provide group services

Zookeeper

• Distributed storage system used by Hadoop• Designed to work and run on low-cost

hardware• Works on operations even when the system

NameNode

• Essential piece of the HDFS file system• Keeps a directory tree of all files in the file

system• NameNodewas considered a single point of

failure for a HDFS Cluster; when the NameNodefails, the file system goes offline

Hadoop Process

Application JobTracker

NameNode• HDFS

TaskTracker

• Hadoop Base (HBase) is the Hadoopdatabase• The goal of HBase is to host very large tables,

with billions of rows by millions of columns • In order to accomplish this HBase uses tables

including cascading, Hive and Pig source modules

Case Studies

• Many institutions and companies utilize Hadoop

• Using the Services:FacebookEbayGoogleSan Diego Supercomputing Center

Google

• Google first created MapReduce

Google

• Distributed File System

Facebook

• Hadoop Hive system

Fair SchedulerNameNode Zookeeper JobTracker

The San Diego Supercomputer Center

• MapReduce

Conclusion

Umfort current

• xCAT - Management• Linux ext3 over NFS -

Storage• TORQUE – Job

Distribution• MATLAB - Processing

Umfort proposed using Hadoop

• Hadoop NameNode and Zookeeper - Management

• Hadoop Distribution File System (HDFS) – Storage

• Hadoop JobTracker – Job Distribution

• MapReduce - Processing

Conclusion (con’t…)

• Benefits:– Homogeneous product– Support– Cost efficient

Future Work

• Installation • Implementation• Testing– Repeat of past summer 2009 Polar Grid team’s

project using Hadoop– Convert CReSIS data into GIS database

AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of...

Documents

CReSIS Data Processing and Analysis

Astudy ofthe fertility of mothers twins

AStudy on MALARIA CONTROL PROGRAM in East …...AStudy on MALARIA CONTROL PROGRAM（満田久義） ―82― Budget Priority Other Religion Health Education Housing Clothing/Vehicle

CReSIS Data Processing and Analysis Jerome E. Mitchell Indiana University - Bloomington

Process case astudy

Animated!Adaptations:!AStudy!of!Disney!Animation! as ... · Animated!Adaptations:!AStudy!of!Disney!Animation! as!Modern!Mythology!&!Folklore! Aoibheann!O’Daly!!!! Aresearch!Paper!submitted!to!the!University!of!Dublin,!!

Implications of Clouds for eScience and CReSIS CReSIS University of Kansas Lawrence January 4 2011 Geoffrey Fox gcf@indiana.edu

Bala ( Astudy on Customer Perception Towards Life Insurance Policies)

CReSIS UAV Preliminary Design Review: The Meridian · CReSIS UAV Preliminary Design Review: The Meridian William Donovan University of Kansas 2335 Irving Hill Road Lawrence, KS 66045-7612

ASTUDY OF SEEPAGE UNDER A CONCRETE DAM USING THE FINITE ...iwtc.info/2010_pdf/10-04.pdf · ASTUDY OF SEEPAGE UNDER A CONCRETE DAM USING THE FINITE VOLUME METHOD ... (Seep/w, Mseep

AStudy of the Noun Phrase in Spoken and Written Englishlingkurs.h.uib.no/webroot/static/178934.pdf · AStudy of the Noun Phrase in Spoken and Written English ... Since noun phrase

Time for Children, aStudy of Parents’ Time Allocation128499/FULLTEXT01.pdf · Time for Children, aStudy of Parents’ Time Allocation¤ Daniel Hallberg and Anders Klevmarkeny November

In-depth Epidemiological Study of Cholera in Zimbabwe AStudy of … · In-depth Epidemiological Study of Cholera in Zimbabwe –AStudy of Cholera Hotspots (…epidemiological basins

ASTUDY OF FACTORS AFFECTING THE CATCHING OF …

Using CReSIS Radar Data to Determine Ice Thickness and Surface Elevation at Pine Island Glacier

AStudy of MVNOData Paths and Performancepschmitt/docs/pam16-mvno.pdf · AStudy of MVNOData Paths and Performance Paul Schmitt, Morgan Vigil, and Elizabeth Belding University of California,

ASTUDY ON THE PROJECT OF COLLEGIO OF JESUIT

ASTUDY ON STRENGTH PROPERTIES OF CONCRETE BY …

Snow Radar - CReSIS Data Products

Astudy of neurosis and occupation - oem.bmj.com · Astudy ofneurosis andoccupation 189 may be given on a certificate. Codes 300 to 318 of the International Classification include