Hadoop MapReduce for Mobile Clouds · 2017-04-29 · Hadoop is a scalable platform that provides distributed storage and computational capabilities on clusters of commodity hardware

Calhoun: The NPS Institutional Archive

Faculty and Researcher Publications Faculty and Researcher Publications Collection

2016

Hadoop MapReduce for Mobile Clouds

George, Johnu

http://hdl.handle.net/10945/52393

2168-7161 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for moreinformation.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2016.2603474,IEEE Transactions on Cloud Computing

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY 2014 1

Hadoop MapReduce for Mobile CloudsJohnu George, Chien-An Chen, Radu Stoleru, Member, IEEE, Geoffrey G. Xie Member, IEEE

Abstract—The new generations of mobile devices have high processing power and storage, but they lag behind in terms of

software systems for big data storage and processing. Hadoop is a scalable platform that provides distributed storage and

computational capabilities on clusters of commodity hardware. Building Hadoop on a mobile network enables the devices to

run data intensive computing applications without direct knowledge of underlying distributed systems complexities. However,

these applications have severe energy and reliability constraints (e.g., caused by unexpected device failures or topology changes

in a dynamic network). As mobile devices are more susceptible to unauthorized access, when compared to traditional servers,

security is also a concern for sensitive data. Hence, it is paramount to consider reliability, energy efficiency and security for such

applications. The MDFS (Mobile Distributed File System) [1] addresses these issues for big data processing in mobile clouds. We

have developed the Hadoop MapReduce framework over MDFS and have studied its performance by varying input workloads in

a real heterogeneous mobile cluster. Our evaluation shows that the implementation addresses all constraints in processing large

amounts of data in mobile clouds. Thus, our system is a viable solution to meet the growing demands of data processing in a

mobile environment.

Index Terms—Mobile Computing, Hadoop MapReduce, Cloud Computing, Mobile Cloud, Energy-Efficient Computing, Fault-

Tolerant Computing

✦

1 Introduction

W ith advances in technology, mobile devices areslowly replacing traditional personal computers.

The new generations of mobile devices are power-ful with gigabytes of memory and multi-core proces-sors. These devices have high-end computing hardwareand complex software applications that generate largeamounts of data on the order of hundreds of megabytes.This data can range from application raw data to im-ages, audio, video or text files. With the rapid increasein the number of mobile devices, big data processingon mobile devices has become a key emerging necessityfor providing capabilities similar to those provided bytraditional servers [2].

Current mobile applications that perform massivecomputing tasks (big data processing) offload data andtasks to data centers or powerful servers in the cloud [3].There are several cloud services that offer computinginfrastructure to end users for processing large datasets.Hadoop MapReduce is a popular open source program-ming framework for cloud computing [4]. The frame-work splits the user job into smaller tasks and runs thesetasks in parallel on different nodes, thus reducing theoverall execution time when compared with a sequentialexecution on a single node. This architecture however,

• Johnu George, Chien-An Chen, and Radu Stoleru are withDepartment of Computer Science and Engineering, TexasA&M University, College Station, TX 77840.E-mail:{stoleru,jaychen}@cse.tamu.edu, [email protected]

• Geoffrey G. Xie is with Department of Computer Science,Naval Post Graduate School, Monterey, CA 93943E-mail:[email protected]

Manuscript received 29 Aug. 2013; revised 24 Jan. 2014; revised28 Mar. 2014

fails in the absence of external network connectivity,as it is the case in military or disaster response oper-ations. This architecture is also avoided in emergencyresponse scenarios where there is limited connectivity tocloud, leading to expensive data upload and downloadoperations. In such situations, wireless mobile ad-hocnetworks are typically deployed [5]. The limitations ofthe traditional cloud computing motivate us to studythe data processing problem in an infrastructureless andmobile environment in which the internet is unavailableand all jobs are performed on mobile devices. Weassume that mobile devices in the vicinity are willingto share each other’s computational resources.

There are many challenges in bringing big data ca-pabilities to the mobile environment: a) mobile devicesare resource-constrained in terms of memory, process-ing power and energy. Since most mobile devices arebattery powered, energy consumption during job exe-cution must be minimized. Overall energy consumptiondepends on the nodes selected for the job execution.The nodes have to be selected based on each node’sremaining energy, job retrieval time, and energy profile.As the jobs are retrieved wirelessly, shorter job retrievaltime indicates lower transmission energy and shorterjob completion time. Compared to the traditional cloudcomputing, transmission time is the bottleneck for thejob makespan and wireless transmission is the majorsource of the energy consumption; b) reliability of datais a major challenge in dynamic networks with unpre-dictable topology changes. Connection failures couldcause mobile devices to go out of the network reach afterlimited participation. Device failures may also happendue to energy depletion or application specific failures.Hence, a reliability model stronger than those used bytraditional static networks is essential; c) security is




also a major concern as the stored data often containssensitive user information [6] [7]. Traditional securitymechanisms tailored for static networks are inadequatefor dynamic networks. Devices can be captured byunauthorized users and data can be compromised easilyif necessary security measures are not provided. To ad-dress the aforementioned issues of energy efficiency, reli-ability and security of dynamic network topologies, thek-out-of-n computing framework was introduced [8] [9].An overview of the previous MDFS will be describedin section 2.2. What remains as an open challenge is

to bring the cloud computing framework to a k-out-of-nenvironment such that it solves the bottlenecks involvedin processing and storage of big data in a mobile cloud.

The Hadoop MapReduce cloud computing frameworkmeets our processing requirements for several reasons:1) in the MapReduce framework, as the tasks are runin parallel, no single mobile device becomes a bottle-neck for overall system performance; 2) the MapReduceframework handles resource management, task schedul-ing and task execution in an efficient fault tolerantmanner. It also considers the available disk space andmemory of each node before tasks are assigned toany node; 3) Hadoop MapReduce has been extensivelytested and used by large number of organizations forbig data processing over many years. However, thedefault file system of Hadoop, HDFS (Hadoop Dis-tributed File System) [10] is tuned for static networksand is unsuitable for mobile environments. HDFS isnot suitable for dynamic network topologies because:1) it ignores energy efficiency. Mobile devices havelimited battery power and can easily fail due to energydepletion; 2) HDFS needs better reliability schemes fordata in the mobile environment. In HDFS, each fileblock is replicated to multiple devices considering heavyI/O bound jobs with strong requirements on backendnetwork connections. Instead, we need lightweight pro-cesses which react well to slow and varying networkconnections. Consequently, we considered k-out-of-nbased MDFS [8], instead of HDFS, as our underlyingfile system for the MapReduce framework.

In this paper, we implement Hadoop MapReduceframework over MDFS and evaluate its performanceon a general heterogeneous cluster of devices. We im-plement the generic file system interface of Hadoopfor MDFS which makes our system interoperable withother Hadoop frameworks like HBase. There are nochanges required for existing HDFS applications to bedeployed over MDFS. To the best of our knowledge,this is the first work to bring Hadoop MapReduceframework for mobile cloud that truly addresses thechallenges of the dynamic network environment. Oursystem provides a distributed computing model for pro-cessing of large datasets in mobile environment whileensuring strong guarantees for energy efficiency, datareliability and security.

2 Related Work & Background

There have been several research studies that at-tempted to bring MapReduce framework to the het-erogeneous cluster of devices, due to its simplicity andpowerful abstractions [11].

Marinelli [12] introduced the Hadoop based platformHyrax for cloud computing on smartphones. HadoopTaskTracker and DataNode processes were ported onAndroid mobile phones, while a single instance ofNameNode and JobTracker were run in a traditionalserver. Porting Hadoop processes directly onto mobiledevices doesn’t mitigate the problems faced in the mo-bile environment. As presented earlier, HDFS is not wellsuited for dynamic network scenarios. There is a needfor a more lightweight file system which can adequatelyaddress dynamic network topology concerns. AnotherMapReduce framework based on Python, Misco [13]was implemented on Nokia mobile phones. It has asimilar server-client model where the server keeps trackof various user jobs and assigns them to workers ondemand. Yet another server-client model based MapRe-duce system was proposed over a cluster of mobiledevices [14] where the mobile client implements MapRe-duce logic to retrieve work and produce results fromthe master node. The above solutions, however, do notsolve the issues involved in data storage and processingof large datasets in the dynamic network.

P2P-MapReduce [15] describes a prototype imple-mentation of a MapReduce framework which uses apeer-to-peer model for parallel data processing in dy-namic cloud topologies. It describes mechanisms formanaging node and job failures in a decentralized man-ner.

The previous research focused only on the parallelprocessing of tasks on mobile devices using the MapRe-duce framework without addressing the real challengesthat occur when these devices are deployed in themobile environment. Huchton et al. [1] proposed a k-Resilient Mobile Distributed File System (MDFS) formobile devices targeted primarily for military opera-tions. Chen et al. [16] proposed a new resource al-location scheme based on k-out-of-n framework andimplemented a more reliable and energy efficient MobileDistributed File System for Mobile Ad Hoc Networks(MANETs) with significant improvements in energyconsumption over the traditional MDFS architecture.

In the remaining part of this section we presentbackground material on Hadoop and MDFS.

2.1 Hadoop Overview

The two primary components of Apache Hadoop arethe MapReduce framework and HDFS, as shown inFigure 1. MapReduce is a scalable parallel processingframework that runs on HDFS. It refers to two distincttasks that Hadoop jobs perform- the Map Task andthe Reduce Task. The Map Task takes the input dataset and produces a set of intermediate <key,value>




MapReduce Program

Job Client

Submit Job

Hadoop JobTracker

HDFS Client

Name Node

File.txt

Block A

Block B

Hadoop TaskTracker

HDFS Client

Data Node

A

M R

Client Node

Metadata Operations

Data Read/Write

Network

M

R

A B

Map Task

Reduce Task

File Blocks

Block A

Datanodes 1,2

Block B

Datanodes 1,2

Hadoop TaskTracker

HDFS Client

Data Node

M R

B A B

Data Read/Write

Assign Task 2Assign Task 1

Fig. 1. Hadoop architecture

pairs which are sorted and partitioned per reducer. Themap output is then passed to Reducers to produce thefinal output. The user applications implement mapperand reducer interfaces to provide the map and reducefunctions. In the MapReduce framework, computationis always moved closer to nodes where data is located,instead of moving data to the compute node. In theideal case, the compute node is also the storage nodeminimizing the network congestion and thereby maxi-mizing the overall throughput.

Two important modules in MapReduce are theJobTracker and the TaskTracker. JobTracker is theMapReduce master daemon that accepts the user jobsand splits them into multiple tasks. It then assignsthese tasks to MapReduce slave nodes in the clustercalled the TaskTrackers. TaskTrackers are the process-ing nodes in the cluster that run the tasks- Map andReduce. The JobTracker is responsible for schedulingtasks on the TaskTrackers and re-executing the failedtasks. TaskTrackers report to JobTracker at regularintervals through heartbeat messages which carry theinformation regarding the status of running tasks andthe number of available slots.

HDFS is a reliable, fault tolerant distributed filesystem designed to store very large datasets. Its keyfeatures include load balancing for maximum efficiency,configurable block replication strategies for data protec-tion, recovery mechanisms for fault tolerance and autoscalability. In HDFS, each file is split into blocks andeach block is replicated to several devices across thecluster.

The two modules in HDFS layer are NameNode andDataNode. NameNode is the file system master daemonthat holds the metadata information about the storedfiles. It stores the inode records of files and directorieswhich contain various attributes like name, size, per-missions and last modified time. DataNodes are the filesystem slave nodes which are the storage nodes in thecluster. They store the file blocks and serve read/writerequests from the client. The NameNode maps a file tothe list of its blocks and the blocks to the list of DataN-odes that store them. DataNodes report to NameNode

Plain File

Encrypted

Encrypted AES

AES

Erasure Coding Secret Sharing

Fig. 2. Existing MDFS architecture

at regular intervals through heartbeat messages whichcontain the information regarding their stored blocks.NameNode builds its metadata from these block reportsand always stays in sync with the DataNodes in thecluster.

When the HDFS client initiates the file read opera-tion, it fetches the list of blocks and their correspondingDataNode locations from NameNode. The locations areordered by their distance from the reader. It then triesto read the content of the block directly from the firstlocation. If this read operation fails, it chooses thenext location in the sequence. As the client retrievesdata directly from the DataNodes, the network trafficis distributed across all the DataNodes in the HDFScluster.

When the HDFS client is writing data to a file, itinitiates a pipelined write to a list of DataNodes whichare retrieved from the NameNode. The NameNodechooses the list of DataNodes based on the pluggableblock placement strategy. Each DataNode receives datafrom its predecessor in the pipeline and forwards it toits successor. The DataNodes report to the NameNodeonce the block is received.

2.2 MDFS Overview

The traditional MDFS was primarily targeted for mil-itary operations where front line troops are providedwith mobile devices. A collection of mobile devicesform a mobile ad-hoc network where each node canenter or move out of the network freely. MDFS isbuilt on a k-out-of-n framework which provides strongguarantees for data security and reliability. k-out-of-nenabled MDFS finds n storage nodes such that totalexpected transmission cost to k closest storage nodesis minimal. Instead of relying on conventional schemeswhich encrypt the stored data per device, MDFS usesa group secret sharing scheme.

As shown in Figure 2, every file is encrypted usinga secret key and partitioned into n1 file fragmentsusing erasure encoding (Reed Solomon algorithm). Un-like conventional schemes, the secret key is not storedlocally. The key is split into n2 fragments using Shamir’ssecret key sharing algorithm. File creation is completewhen all the key and file fragments are distributedacross the cluster. For file retrieval, a node has to




retrieve at least k1 (<n1) file fragments and k2 (<n2)key fragments to reconstruct the original file.

MDFS architecture provides high security by en-suring that data cannot be decrypted unless an au-thorized user obtains k2 distinct key fragments. Italso ensures resiliency by allowing the authorized usersto reconstruct the data even after losing n1-k1 frag-ments of data. Reliability of the file increases whenthe ratio k1/n1 decreases, but it also incurs higherdata redundancy. The data fragments are placed on aset of selected storage nodes considering each node’sfailure probability and its distance to the potentialclients. A node’s failure probability is estimated basedon the remaining energy, network connectivity, andapplication-dependent factors. Data fragments are thenallocated to the network such that the expected datatransferring energy for all clients to retrieve/recover thefile is minimized.

MDFS has a fully distributed directory service inwhich each device maintains information regarding thelist of available files and their corresponding key andfile fragments. Each node in the network periodicallysynchronizes the directory with other nodes ensuringthat the directories of all devices are always updated.

3 Challenges

This section describes the challenges involved in theimplementation of MapReduce framework over MDFS.

1. Traditional MDFS architecture only supports aflat hierarchy. All files are stored at the same level inthe file system without the use of folders or directories.But the MapReduce framework relies on fully qualifiedpath names for all operations. The fully qualified pathis added to MDFS, as described in Section 4.4.6.

2. The capabilities of traditional MDFS are verylimited. It supports only a few functionalities suchas read(), write() and list(). A user calls the write()function to store a file across the nodes in the networkand the read() function to read the contents of a filefrom the network. The list() function provides the fulllisting of the available files in the network.

However, MapReduce framework needs a fairlygeneric file system that implements wide range of func-tions. It has to be compatible with available HDFSapplications without any code modification or extraconfiguration. The implementation detail is describedin Section 4.4.

3. MapReduce framework needs streaming accessto their data, but MDFS reads and writes are notstreaming operations. How the data is streamed duringread/write operations are described in Section 4.4.

4. During the job initialization phase of Hadoop,JobTracker queries the NameNode to retrieve the in-formation of all the blocks of the input file (blocks andlist of DataNodes that store them) for selecting thebest nodes for task execution. JobTracker prioritizesdata locality for TaskTracker selection. It first looks for

an empty slot on any node that contains the block. Ifno slots are available, it looks for an empty slot on adifferent node but in the same rack. In MDFS, as nonode in the network has a complete block for processing,the challenge is to determine the best locations for eachtask execution. Section 4.5 proposes an energy-awarescheduling algorithm that overrides the default one.

5. The MapReduce and HDFS components are rack

aware. They use network topology for obtaining therack awareness knowledge. As discussed, if the nodethat contains the block is not available for task exe-cution, the default task scheduling algorithm selects adifferent node in the same rack using the rack awarenessknowledge. This scheme leverages the single hop andhigh bandwidth of in-rack switching. The challenge isto define rack awareness in context of mobile ad-hocnetwork as the topology is likely to change at anytime during the job execution. This challenge is alsoaddressed by the new scheduling algorithm describedin Section 4.5.

4 System Design

In this section, we present the details of our proposedarchitectures, system components and the interactionsamong the components that occur during file systemoperations.

4.1 Requirements

For the design of our system, the following requirementshad to be met:

• Since the Hadoop JobTracker is a single entitycommon to all nodes across the cluster, thereshould be at least one node in the cluster whichalways operates within the network range andremains alive throughout the job execution phase.The system must tolerate node failures.

• Data stored across the cluster may be sensitive.Unauthorized access to sensitive information mustbe prohibited.

• The system is tuned and designed for handlinglarge amounts of data in the order of hundreds ofmegabytes, but it must also support small files.

• Though we primarily focus on mobile devices,the system must support heterogeneous clusterof devices which can be a combination of tradi-tional personal computers, servers, laptops, mobilephones and tablets depending on the working en-vironment of the user.

• Like HDFS, the system must support sequentialwrites. Bytes written by a client may not be visibleimmediately to other clients in the network unlessthe file is closed or flush operation is called. Append

mode must be supported to append the data toan already existing file. Flush operation guaranteesthat bytes up to that given point in the stream arepersisted and changes are visible to all other clientsin the system.




• Like HDFS, the system must support streamingreads. It must also support random reads where auser reads bytes starting from an arbitrary offsetin the file.

4.2 System Components

In the traditional MDFS architecture, a file to be storedis encrypted and split into n fragments such that any k(<n) fragments are sufficient to reconstruct the originalfile. In this architecture, parallel file processing is notpossible as even a single byte of the file cannot beread without retrieving the required number of frag-ments. Moreover, MapReduce framework assumes thatthe input file is split into blocks which are distributedacross the cluster. Hence, we propose the notion ofblocks, which was missing in the traditional MDFSarchitecture. In our approach, the files are split intoblocks based on the block size. These blocks are thensplit into fragments that are stored across the cluster.Each block is a normal Unix file with configurable blocksize. Block size has a direct impact on performance asit affects the read and write sizes.

The file system functionality of each cluster node issplit across three layers- MDFS Client, Data processinglayer and Network communication layer.

4.2.1 MDFS Client

User applications invoke file system operations usingthe MDFS client, a built-in library that implements theMDFS file system interface. The MDFS client providesfile system abstraction to upper layers. The user doesnot need to be aware of file metadata or the storagelocations of file fragments. Instead, the user referenceseach file by paths in the namespace using the MDFSclient. Files and directories can be created, deleted,moved and renamed like in traditional file systems.All file system commands take path arguments in URIformat (scheme://authority/path). The scheme decidesthe file system to be instantiated. For MDFS, thescheme is mdfs and the authority is the Name Serveraddress.

4.2.2 Data processing layer

Data Processing layer manages the data and controlflow of file system operations. The functionality of thislayer is split across two daemons- Name Server andData Server.

4.2.2.1 Name Server: MDFS Name Server is alightweight MDFS daemon that stores the hierarchicalfile organization or the namespace of the file system.All file system metadata including the mapping of a fileto its list of blocks is also stored in the MDFS NameServer. The Name Server has the same functionalityas Hadoop NameNode. The Name Server is always up-dated with any change in the file system namespace. Onstartup, it starts a global RPC server at a port definedby mdfs.nameservice.rpc-port in the configuration file.

The client connects to the RPC server and talks toit using the MDFS Name Protocol. The MDFS clientand MDFS Name Server are completely unaware of thefragment distribution which is handled by the DataServer. We kept the namespace management and datamanagement totally independent for better scalabilityand design simplicity.

4.2.2.2 Data Server: The MDFS Data Server isa lightweight MDFS daemon instantiated on each nodein the cluster. It coordinates with other MDFS DataServer daemons to handle MDFS communication taskslike neighbor discovery, file creation, file retrieval andfile deletion. On startup, it starts a local RPC serverlistening on the port defined by mdfs.dataservice.rpc-port in the configuration file. When the user invokesany file system operation, the MDFS client connects tothe local Data Server at the specified port and talksto it using the MDFS Data Protocol. Unlike HadoopDataNode, the Data Server has to be instantiated onall nodes in the network where data flow operations(reads and writes) are invoked. This is because the DataServer prepares the data for these operations and theyare always executed in the local file system of the client.The architecture is explained in detail in the subsequentsections.

4.2.3 Network communication layer

This layer handles the communication between thenodes in the network. It exchanges control and datapackets for various file operations. This layer abstractsthe network interactions and hides the complexitiesinvolved in routing packets to various nodes in case ofdynamic topologies like in MANETs.

4.2.3.1 Fragment Mapper: The Fragment Map-per stores information of file and key fragments whichinclude the fragment identifiers and the location offragments. It stores the mapping of a block to its listof key and file fragments.

4.2.3.2 Communication Server: The Communi-cation Server interacts with every other node and isresponsible for energy-efficient packets routing. It mustsupport broadcast and unicast, the two basic com-munication modes required by MDFS. To allow moreflexible routing mechanisms in different environments,it is implemented as a pluggable component which canbe extended to support multiple routing protocols.

4.2.3.3 Topology Discovery & MaintenanceFramework: This component stores the networktopology information and the failure probabilitiesof participating nodes. When the network topologychanges, this framework detects the change through adistributed topology monitoring algorithm and updatesthe Fragment Mapper. All the nodes in the networkare thus promptly updated about network topologychanges.

There are two types of system metadata. The filesystem namespace which includes the mapping of file

to blocks is stored in the Name Server while mapping of




Submit Job

MapReduce Program

Job Client

Hadoop JobTracker

MDFS Client

Commun.Server

Fragment Mapper

Block 1Frag A Frag B

Block 2Frag C Frag D

Name Server

File.txtBlock 1

Block 2

Data ServerA C

Hadoop TaskTracker

MDFS Client

Commun. Server

M R

Client Node

Assign Task

Metadata Operations

Data Read/Write

Data Read/Write

Network

Fragment

Mapper

Block 1 Frag A Frag B

Block 2 Frag C Frag D

Data ServerB D

Metadata Operations

M

R

A B

C D

Map TaskReduce Task

File Fragments

Fragment Operations

Fragment Operations

Name Server

File.txtBlock 1

Block 2

Fig. 3. Distributed Architecture of MDFS

block to fragments is stored in the Fragment Mapper. Itwas our design decision to separate Fragment Mapperfunctionality from the Name Server. There are tworeasons 1) The memory usage of Fragment Mapper cangrow tremendously based on the configured value of nfor each file. For example, consider a system with n setto 10. For a 1 MB file with 4MB block size, only oneblock is required but 10 fragments are created by k-out-of-n framework. Hence, there are 10 fragment entriesin the Fragment Mapper and only 1 block entry inthe Name Server for this particular file. Since memoryrequirements of Name Server and Fragment Mapperare different, this design gives flexibility to run themindependently in different modes. 2) Fragment Mapperis invoked only during network operations (read/writes)while the Name Server is accessed for every file systemoperation. Since the Name Server is a light weightdaemon that handles only the file system namespace,the directory operations are fast.

4.3 System Architecture

We propose two approaches for our MDFS architecture-a Distributed architecture where there is no centralentity to manage the cluster and a Master-slave archi-tecture, as in HDFS. Although the tradeoffs betweenthe distributed architecture and the centralized archi-tecture in a distributed system are well-studied, thispaper is the first to implement and compare Hadoopframework on these two architectures. Some interestingobservations are also discovered, as described in sec-tion 6.6. The user can configure the architecture duringthe cluster startup based on the working environment.It has to be configured during the cluster startup andcannot be changed later.

4.3.1 Distributed Architecture

In this architecture, depicted in Figure 3, every par-ticipating node runs a Name Server and a FragmentMapper. After every file system operation, the updateis broadcasted in the network so that the local cachesof all nodes are synchronized. Moreover, each node pe-riodically syncs with other nodes by sending broadcast

messages. Any new node entering the network receivesthese broadcast messages and creates a local cache forfurther operations. In the hardware implementation,the updates are broadcasted using UDP packets. Weassume all functional nodes can successfully receivethe broadcast messages. The more comprehensive androbust distributed directory service is left as futurework.

This architecture has no single point of failure and noconstraint is imposed on the network topology. Eachnode can operate independently, as each node storesa separate copy of the namespace and fragment map-ping. The load is evenly distributed across the clusterin terms of metadata storage when compared to thecentralized architecture. However, network bandwidthis wasted due to the messages broadcast by each nodefor updating the local cache of every other node in thenetwork. As the number of nodes involved in processingincreases, this problem becomes more severe, leading tohigher response time for each user operation.

4.3.2 Master-Slave Architecture

In this architecture depicted in Figure 4, the NameServer and the Fragment Mapper are singleton in-stances across the complete cluster. These daemons canbe run in any of the nodes in the cluster. The nodethat runs these daemons is called the master node.MDFS stores metadata on the master node similar toother distributed systems like HDFS, GFS [17] andPVFS [18].

The centralized architecture has many advantages. 1)Since a single node in the cluster stores the completemetadata, there is no wastage of the device memoryby storing same metadata in all nodes when comparedto the distributed approach. 2) When a file is created,modified or deleted, there is no need to broadcast anymessage across the network to inform other nodes forupdating their metadata. This saves overall networkbandwidth and reduces transmission cost. Lesser trans-mission cost leads to higher energy efficiency of the sys-tem. 3) Since our system is assumed to have at least onenode that always operates within the network range,the Name Server and the Fragment Mapper can be runin the same node that hosts the Hadoop JobTracker.It can be a static server or any participating mobiledevice. Thus this approach doesn’t violate any of ourinitial assumptions.

The major disadvantage of the centralized approachis that the master node is a single point of failure.However, this problem can be solved by configuringa standby node in the configuration file. The standbynode is updated by the master node whenever there isa change in the file system metadata. The master nodesignals success to client operations only when metadatachange is reflected in both master and standby nodes.Hence, data structures of the master and standby nodealways remain in sync ensuring smooth failover.




MapReduce Program

Job Client

Submit Job

Hadoop JobTracker

MDFS Client

Commun. Server

Fragment Mapper

Block 1Frag A, Frag B

Block 2Frag C, Frag D

Data ServerA C

Hadoop TaskTracker

MDFS Client

Commun. Server

Data ServerB D

M R

Client Node

Assign Task

Metadata Operations

Data Read/Write

Data Read/Write

Network

M

R

A B

C D

Map Task

Reduce Task

File Fragments

Fragment Operations

Name Server

File.txtBlock 1

Block 2

Fig. 4. Centralized Architecture of MDFS

MDFS CLient

Name Server

Data Server

File Retrieval Module

Commu. Server

Fragment Mapper

k-out-of-n framework

1

23

204

5 19

8

11

6 7

Topology Discovery/

Maintenance Unit

1215

MDFS Client

Data Server

Commu. Server

9

10

MDFS Client

Data Server

Commu. Server

13

14

16-18

Control Flow

Data Flow

Fig. 5. Data Flow of a read operation

The master node can be loaded when large number ofmobile devices are involved in processing. There are sev-eral distributed systems like Ceph [19] and Lustre [20]that support more than one instance of metadata serverfor managing the file system metadata evenly. Multi-ple metadata servers are deployed to avoid scalabilitybottlenecks of a single metadata server. MDFS cannow efficiently handle hundreds of megabytes with asingle metadata server and there is no need for multiplemetadata servers in our environment. For rest of thediscussion, we use centralized approach for simplicity.

4.4 System Operations

This section describes how the file read, file write, fileappend, file delete, file rename, and directory operationsare performed in a centralized architecture.

4.4.1 File Read Operation

HDFS read design is not applicable in MDFS. For anyblock read operation, the required number of fragmentshas to be retrieved and then combined and decryptedto recover the original block. Unlike HDFS, an MDFSblock read operation is always local to the reader as

the block to be read is first reconstructed locally. Forsecurity purpose, the retrieved key/data fragments andthe decoded blocks are deleted as soon as the plain datais loaded into the taskTracker’s memory. Although thereis a short period of time that a plain data block may becompromised, we leave the more secure data processingproblem as future work.

The overall transmission cost during the read opera-tion varies across nodes based on the location of frag-ments and the reader location. As the read operation ishandled locally, random reads are supported in MDFSwhere the user can seek to any position in the file.Figure 5 illustrates the control flow of a read operationthrough these numbered steps.

Step 1: The user issues a read request for file blocksof length L at a byte offset O.

Steps 2-3: As in HDFS, the MDFS client queries theName Server to return all blocks of the file that spanthe byte offset range from O to O+L. The Name Serversearches the local cache for the mapping from the fileto the list of blocks. It returns the list of blocks thatcontain the requested bytes.

Step 4: For each block in the list returned by theName Server, the client issues a retrieval request to theData Server. Each file system operation is identified bya specific opcode in the request.

Step 5: The Data Server identifies the opcode andinstantiates the File Retriever module to handle theblock retrieval.

Steps 6-7: The Data Server requests the FragmentMapper to provide information regarding the key andfile fragments of the file. The Fragment Mapper replieswith the identity of the fragments and the locations ofthe fragments in the networks.

Steps 8-15: The Data Server requests the Communi-cation Server to fetch the required number of fragmentsfrom the locations which are previously returned by theFragment Mapper. Fragments are fetched in paralleland stored in the local file system of the requestingclient. After fetching each request, the CommunicationServer acknowledges the Data Server with the locationwhere the fragments are stored in the local file system.

Step 16: The above operations are repeated forfetching the key fragments. These details are not in-cluded in the diagram for brevity. The secret key isconstructed from the key fragments.

Step 17: Once the required file fragments are down-loaded into the local file system, they are decoded andthen decrypted using the secret key to get the originalblock.

Step 18: The key and file fragments which weredownloaded into the local file system during the re-trieval process are deleted for security reasons.

Step 19: The Data Server acknowledges the clientwith the location of the block in the local file system.

Step 20: The MDFS client reads the requestednumber of bytes of the block. Steps 4-19 are repeatedif there are multiple blocks to be read. Once the read




MDFS CLient

Name Server

Data Server

File Creator Module

Commu. Server

K-out-of-n framework

Fragment Mapper

1

23

204

5 19

14

11

8 9

Topology Discovery/

Maintenance Unit

1815

MDFS Client

Data Server

Commu. Server

13

12

MDFS Client

Data Server

Commu. Server

17

16

6-7

Control Flow

Data Flow

10

Fig. 6. Data Flow of a write operation

operation is completed, the block is deleted for securityreasons to restore the original state of the cluster.

If many clients are accessing the same file, the mo-bile nodes that store the fragments may become thehot spots. This problem can be fixed by enabling filecaching. Caching is disabled by default and each nodedeletes the file fragments after the file retrieval. Ifcaching is enabled, the reader node caches the filefragments in its local file system so that it does not fetchthe fragments from the network during the subsequentread operations.

If the file fragments are available locally, the readerclient verifies the length of cached file with the actuallength stored in Name Server. This avoids the problemof reading the outdated version of the file. If somedata has been appended to the file after caching, filefragments are re-fetched from the network overwritingthe existing ones. Fragment availability increases dueto caching which leads to fair distribution of load inthe cluster without consuming extra energy. However,caching affects system security due to higher availabilityof file fragments in the network.

4.4.2 File Write Operation

HDFS write design is not applicable for MDFS as datacannot be written unless the block is decrypted anddecoded. Hence in the MDFS architecture, when writeoperation is called, bytes are appended to the currentblock till the block boundary is reached or the file isclosed. The block is then encrypted, split into fragmentsand redistributed across the cluster.

Our MDFS architecture does not support randomwrites. Random writes make the design more compli-cated when writes span across multiple blocks. Thisfeature is not considered in the present design as it isnot required for the MapReduce framework. Figure 6illustrates the control flow of a write operation throughthese numbered steps

Step 1: The user issues a write request for a file oflength L. The file is split into blocks of size [L/B] where

B is the user configured block size. The last block mightnot be complete depending on the file length. The userrequest can also be a streaming write where the userwrites to the file system byte by byte. Once the blockboundary is reached or when the file is closed, the blockis written to the network. In both scenarios, the datato be written is assumed to be present in the local filesystem.

Step 2: Similar to HDFS block allocation scheme,for each block to be written, the MDFS client requeststhe Name Server to allocate a new block Id which isa unique identifier for each block. As all the identifiersare generated by a single Name Server in a centralizedarchitecture, there will not be any identifier. However,in the distributed architecture, an appropriate hashingfunction is required to generate the unique global iden-tifier. In our implementation, the absolute path of eachfile is used as the hash key.

Step 3: The Name Server returns a new block idbased on the allocation algorithm and adds the blockidentifier in its local cache. The mapping of file to listof blocks is stored in the Name Server.

Steps 4-5: The MDFS client issues a creation requestto the Data Server which contains a specific opcodein the request message. The Data Server identifies theopcode and instantiates the File Creator module tohandle the block creation.

Step 6: The block stored in the local file system isencrypted using the secret key. The encrypted block ispartitioned into n fragments using erasure encoding.

Step 7: The key is also split into fragments usingShamir’s secret key sharing algorithm.

Steps 8-9: The Data Server requests the k-out-of-nframework to provide n storage nodes such that totalexpected transmission cost from any node to k closeststorage nodes is minimal.

Step 10: The Data Server requests the FragmentMapper to add the fragment information of each filewhich includes the fragment identifier with the newlocations returned by the k-out-of-n framework. If thenetwork topology changes after the initial computation,k-out-of-n framework recomputes the storage nodesfor every file stored in the network and updates theFragment Mapper. This ensures that Fragment Mapperis always in sync with the current network topology.

Steps 11-18: The file fragments are distributedin parallel across the cluster. The key fragments arealso stored in the same manner. These details are notincluded in the diagram for brevity.

Steps 19-20: Once the file and key fragments aredistributed across the cluster, the Data Server informsthe client that the file has been successfully writtento the nodes. For security purposes, the original blockstored in the local file system of the writer is deletedafter the write operation as it is no longer needed.




4.4.3 File Append Operation

MDFS supports Append operation which was intro-duced in Hadoop 0.19. If a user needs to write to anexisting file, the file has to be open in append mode.If the user appends data to the file, bytes are added tothe last block of the file. Hence, for block append mode,the last block is read into the local file system of thewriter and the file pointer is updated appropriately tothe last written byte. Then, writes are executed in asimilar way as described in the previous section.

4.4.4 File Delete Operation

For a file to be deleted, all file fragments of every blockof the file have to be deleted. When the user issues afile delete request, the MDFS client queries the NameServer for all the blocks of the file. It then requests theData Server to delete these blocks from the network.The Data Server gathers information about the filefragments from the Fragment Mapper and requeststhe Communication Server to send delete requests toall the locations returned by the Fragment Mapper.Once the delete request has been successfully executed,the corresponding entry in the Fragment Mapper isremoved. In case of the distributed architecture, theupdate has to be broadcast to the network so that theentry is deleted from all nodes in the network.

4.4.5 File Rename Operation

The File Rename operation requires only an updatein the namespace where the file is referenced with thenew path name instead of the old path. When the userissues a file rename request, the MDFS client requeststhe Name Server to update its namespace. The NameServer updates the current inode structure of the filebased on the renamed path.

4.4.6 Directory Create/Delete/Rename Operations

When the user issues the file commands to create, deleteor rename any directory, the MDFS client requests theName Server to update the namespace. The namespacekeeps a mapping of each file to its parent directorywhere the topmost level is the root directory (’/’). Allpaths from the root node to the leaf nodes are unique.Recursive operations are also allowed for delete andrename operations.

4.5 Energy-aware task scheduling

Hadoop Mapreduce framework relies on data locality

for boosting overall system throughput. Computationis moved closer to the nodes where the data resides.JobTracker first tries to assign tasks to TaskTrackersin which the data is locally present (local). If this isnot possible (no free map slots or if tasks have alreadyfailed in the specific node), it then tries to assign tasksto other TaskTrackers in the same rack (non-local).This reduces the cross-switch network traffic therebyreducing the overall execution time of Map tasks. Incase of non-local task processing, data has to be fetched

Algorithm 1: Task Scheduling

Input: Sb, Fb, d, kOutput: X, C⋆

i // index b in C⋆

i(b) is omitted

C⋆i = 0

X ←− 1×N array initialized to 0D ←− 1×N arrayfor j=1 to N do

D[j].node=jD[j].cost=(Sb/k)× Fb(j)× dij

if D[j].cost == 0 then

D[j].cost=N2 // Just assign a big number

end

end

D ←− Sort D in increasing order by D.costfor i=1 to k do

X[D[i].node]=1C⋆

i += D[i].costend

return X, C⋆i

from the corresponding nodes, adding latency. In amobile environment, higher network traffic leads toincreased energy consumption which is a major concern.Therefore, fetching data for non-local data processingresults in higher energy consumption and increasedtransmission cost. Hence, it is important to bring MapTask processing nearer to the nodes that store the datafor minimum latency and maximum energy efficiency.

There are many challenges in bringing data locality toMDFS. Unlike native Hadoop, no single node runningMDFS has a complete data block; each node has atmost one fragment of a block due to security reasons.Consequently, the default MapReduce scheduling al-gorithm that allocates processing tasks closer to datablocks does not apply. When MDFS performs a read

operation to retrieve a file, it finds the k fragmentsthat can be retrieved with the lowest data transferringenergy. Specifically, the k fragments that are closestto the file requester in terms of the hop-count areretrieved. As a result, knowing the network topology(from the topology maintenance component in MDFS)and the locations of each fragment (from the fragmentsmapper), we could estimate the total hop-count for eachnode to retrieve the closest k fragments of the block.Smaller total hop-count indicates lower transmissiontime, lower transmission energy, and shorter job com-pletion time. Although this estimation adds a slightoverhead, and is repeated again when MDFS actuallyretrieves/reads the data block, we leave the engineeringoptimization as the future work.

We now describe how to find the minimal cost (hop-count) for fetching a block from a taskTracker. Al-gorithm 1 illustrates the main change made to thedefault Hadoop MapReduce scheduler such that thedata transferring energy is taken into account whenscheduling a task. The improved scheduler uses thecurrent network condition (topology and nodes’ failure




probability) to estimate the task retrieval energy andassign tasks. Only nodes that are currently functionaland available may be selected. C⋆

i (b) is defined as theminimal cost of fetching block b at node i. Let Fb be a1×N binary vector where each element Fb(j) indicateswhether node j contains a fragment of block b (note that∑N

j=1 Fb(j) = n ∀b); Sb is the size of block b; di,j isthe distance (hop-count) between node i and node j;the pair-wise node distance can be estimated efficientlyusing all pair shorted distance algorithm X is a 1×Nbinary decision variable where Xj = 1 indicates thatnode j sends a data fragment to node i.

C⋆i (b) = min

N∑

j=1

(Sb/k)Fb(j)di,jXj , s.t.N∑

j=1

Xj = k

C⋆i (b) can be solved by Algorithm 1, which minimizes

the communication cost for node i to retrieve k datafragments of block b. Once C⋆

i (b) for each node is solved,the processing task for block b is then assigned to nodep where p = arg mini C⋆

i (b). The time complexity ofAlgorithm 1 is NlogN due to the sorting procedure, andbecause it is performed once for each of the N node, thetime complexity for assigning a block to a taskTrackeris N2logN . Considering the size of our network (≤ 100)and the processing power of the modern mobile devices,the computational overhead of the scheduling algorithmis minimal.

4.6 Consistency Model

Like HDFS, MDFS also follows single writer and multi-ple reader model. An application can add data to MDFSby creating a new file and writing data to it (CreateMode). The data once written cannot be modified orremoved except when the file is reopened for append(Append Mode). In both write modes, data is alwaysadded to the end of the file. MDFS provides the supportfor overwriting the entire file but not from any arbitraryoffset in the file.

If an MDFS client opens a file in Create or Appendmode, the Name Server acquires a write lock on thecorresponding file path so that no other client can openthe same file for write. The writer client periodicallynotifies the Name Server through heartbeat messagesto renew the lock. To prevent the starvation of otherwriter clients, the Name Server releases the lock aftera user configured time limit if the client fails to renewthe lock. The lock is also released when the file is closedby the client.

A file can have concurrent reader clients even if it islocked for a write. When a file is opened for a read, theName Server acquires a read lock on the correspondingfile path to protect it from deletion from other clients.As the writes are always executed in the local filesystem, the data is not written to the network unless thefile is closed or the block boundary is reached. So, thechanges made to the last block of the file may not bevisible to the reader clients while the write operation

is being executed. Once the write has completed, thenew data is visible across the cluster immediately. Inall circumstances, MDFS provides strong consistencyguarantee for reads such that all concurrent readerclients will read the same data irrespective of theirlocations.

4.7 Failure Tolerance

This section describes how MDFS uses k-out-of-n en-coding technique and snapshot operation to improve thedata reliability and prevent node failures.

4.7.1 k-out-of-n reliability

In HDFS, each block is replicated a specific number oftimes for fault tolerance, which is determined by thereplication factor configured by the user. In MDFS, thek-out-of-n framework ensures data reliability where kand n parameters determine the level of fault tolerance.These parameters are per file configurable which arespecified at the file creation time. Only k nodes arerequired to retrieve the complete file, ensuring datareliability.

4.7.2 Snapshot operation

A snapshot operation creates a backup image of currentstate which includes in-memory data structures. Duringsafe shutdown of the Name Server and Fragment Map-per, a snapshot operation is automatically invoked tosave the state on the disk. On restart, the saved imageis used to rebuild the system state. Snapshot operationsare particularly useful when a user is experimentingwith changes that need to be rolled back easily inthe future. When client requests a snapshot operation,the Name Server enters a special maintenance statecalled safe mode. No client operations are allowed whenthe Name Server is in safe mode. The Name Serverleaves safe mode automatically once backup is created.The data server is not backup as it mainly handlesMDFS communication tasks like neighbor discovery, filecreation, file retrieval. These information varies withtime, so it is unnecessary to include in the snapshot.

4.8 Diagnostic Tools

MDFS Shell is a handy and powerful debugging tool toexecute all available file system commands. It is invokedby hadoop mdfs <command><command args>. Eachcommand has file path URI and other command specificarguments. MDFS shell can simulate complex clusteroperations like concurrent reads and writes, device fail-ures, device reboots etc. All MDFS specific parameterscan be changed at run time using MDFS shell. MDFSshell is particularly useful in testing new features andanalyzing its impact on overall performance of thesystem. All file system operations are logged in a userspecific folder for debugging purposes and performanceanalysis. If any issue is encountered, the operation logscan be used to reproduce the issue and diagnose it.




5 System Implementation

We have used Apache Hadoop stable release 1.2.1 [21]for our implementation. Our MDFS framework consistsof 18,365 lines of Java code, exported as a single jar file.The MDFS code does not have any dependency on theHadoop code base. Similar to DistributedFileSystemclass of HDFS, MDFS provides MobileDistributedFSclass that implements FileSystem, the abstract baseclass of Hadoop for backwards compatibility of allpresent HDFS applications. The user invokes this objectto interact with the file system. In order to switch fromHDFS to MDFS, the Hadoop user only needs to addthe location of jar file to the HADOOP CLASSPATHvariable and change the file system scheme to ‘mdfs’.The parameter ‘mdfs.standAloneConf’determines theMDFS architecture to be instantiated. If it is setto false, all the servers are started locally as inthe distributed architecture. If it is set to true, theuser needs to additionally configure the parameter‘mdfs.nameservice.rpc-address’to specify the location ofName Server. In the present implementation, the Frag-ment Mapper is started in the same node as the NameServer. Since no changes are required in the existingcode base for MDFS integration, the user can upgradeto a different Hadoop release without any conflict.

6 Performance Evaluation

In this section, we present performance results andidentify bottlenecks involved in processing large inputdatasets. To measure the performance of MDFS onmobile devices, we ran Hadoop benchmarks on a hetero-geneous 10 node mobile wireless cluster consisting of 1personal desktop computer (Intel Core 2 Duo 3 GHzprocessor, 4 GB memory), 10 netbooks (Intel Atom1.60 GHz processor, 1 GB memory, Wi-Fi 802.11 b/ginterface) and 3 HTC Evo 4G smartphones runningAndroid 2.3 OS (Scorpion 1Ghz processor, 512 MBRAM, Wi-Fi 802.11 b/g interface). As TaskTrackerdaemons are not ported to the Android environmentyet, smartphones are used only for data storage, andnot for data processing. Note that although the Hadoopframework is not yet completely ported to Androidsmartphones in our experiment, which will be our futurework, the results obtained from the netbooks shouldbe very similar to the results on real smartphones asmodern smartphones are equipped with more powerfulCPU, larger memory, and higher communication band-width than the netbooks we used.

We used TeraSort, a well-known benchmarking toolthat is included in the Apache Hadoop distribution.Our benchmark run consists of generating a randominput data set using TeraGen and then sorting thegenerated data using TeraSort. We considered thefollowing metrics: 1) Job completion time of TeraSort;2) MDFS Read/Writes Throughput; and 3) Networkbandwidth overhead. We are interested in the followingparameters: 1) Size of input dataset; 2) Block Size; and

300

310

320

330

340

350

360

370

380

390

400

0 2 4 6 8 10

Hadoop J

ob T

ime (

sec)

Block Size (MB)

TeraSort of 50 MB DataSet

(a)

300

320

340

360

380

400

4 5 6 7 8 9 10

Hadoop J

ob T

ime (

sec)

Cluster Size

TeraSort of 50 MB DataSet

(b)

Fig. 7. Effect of (a) Block size (b) Cluster size on JobCompletion Time

3) Cluster Size. Each experiment was repeated 15 timesand average values were computed. The parameters kand n are set to 3 and 10, respectively for all runs.Each node is configured to run 1 Map task and 1Reduce task per job, controlled by the parameters‘mapred.tasktracker.map.tasks.maximum’and‘mapred.tasktracker.reduce.tasks.maximum’respectively.As this paper is the first work that addresses thechallenges in processing of large datasets in mobileenvironment, we do not have any solutions tocompare against. MDFS suffers the overhead of dataencoding/decoding and data encryption/decryption,but MDFS achieves better reliability and energy-efficiency. Furthermore, MDFS uses Android phones asstorage nodes and performs read/write operations onthese phones, but HDFS data node is not ported onAndroid devices. As a result, it is difficult to comparethe performance between HDFS and MDFS directly.

6.1 Effect of Block Size on Job Completion Time

The parameter ‘dfs.block.size’ in the configuration filedetermines the default value of block size. It can beoverridden by the client during file creation if needed.Figure 7(a) shows the effect of block size on job com-pletion time. For our test cluster setup, we found thatthe optimal value of block size for a 50MB dataset is4 MB. The results show that the performance degradeswhen the block size is reduced or increased further.

A larger block size will reduce the number of blocksand thereby limit the amount of possible parallelismin the cluster. By default, each Map task processesone block of data at a time. There has to be sufficientnumber of tasks in the system such that they can be runin parallel for maximum throughput. If the block sizeis small, there will be more Map tasks processing lesseramount of data. This would lead to more read and writerequests across the network proving to be costly in amobile environment. Figure 7(a) shows that processingtime is 70% smaller than the network transmissiontime for TeraSort benchmark. So, tasks have to besufficiently long enough to compensate the overhead intask setup and data transfer for maximum throughput.For real world clusters, the optimal value of block sizewill be experimentally obtained.




0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8

Job C

om

ple

tion r

ate

Number of node failures

HDFS (1,3)

MDFS (3,9)

(a)

260

280

300

320

340

360

380

400

1 2 3 4 5 6 7 8 9

Hadoop J

ob T

ime (

sec)

Number of Iterations

1 node failed

2 node failed

3 node failed

Terasort of 100 MB DataSet (sec)

(b)

Fig. 8. Effect of (a) Comparison of job completion ratebetween HDFS and MDFS (b) Job time vs. Number offailures.

6.2 Effect of Cluster Size on Job Completion Time

The cluster size determines the level of possible paral-lelization in the cluster. As the cluster size increases,more tasks can be run in parallel, thus reducing thejob completion time. Figure 8(b) shows the effect ofcluster size on job completion time. For larger files,there are several map tasks that can be operated inparallel depending on the configured block size. Sothe performance is improved significantly with increasein cluster size as in the figure. For smaller files, theperformance is not affected much by the cluster size, asthe performance gain obtained as part of parallelism iscomparable to the additional cost incurred in the tasksetup.

6.3 Effect of node failures on MDFS and HDFS

In this section, we compare the fault-tolerance capabil-ity between MDFS and HDFS. We consider a simplefailure model in which a task fails with its processornode and a taskTracker can not be restarted once itfails (crash failure). There are total 12 nodes in thenetwork. In HDFS, each data block is replicated to 3different nodes, and HDFS can tolerate to lose at most2 data nodes; in MDFS, each data block is encoded andstored to 9 different nodes (k = 3, n = 9), and MDFScan tolerate to lose up to 6 data nodes. The reason weset parameter (k, n) = (3, 9) in this experiment (ratherusing the same n = 10 in the previous experiments) isthat (k, n) = (3, 9) has the same data redundancy as thedefault HDFS 3-Replication scheme. This experimentis independent to the previous experiments in whichn = 10. Note that although HDFS can tolerate to loseat most 2 data nodes, it does not mean that the jobwould fail if more than 2 nodes fail; if the failed nodedoes not carry the data block of the current job, it doesnot affect the taskTracker; as a result, we see completionrate gradually drops from 1 after more than 3 nodes fail.Figure 8(a) shows that MDFS clearly achieves betterfault-tolerance when 3 or more nodes fail.

6.4 Effects of Node Failure Rate on Job Completion

Time

Our system is designed to tolerate failures. Figure 7(b)shows the reliability of our system in case of nodefailures. The benchmark is run for 10 iterations for

1

1.5

2

2.5

3

4 8 12 16 20 24

MD

FS

Th

rou

gh

pu

t (M

B/s

)

Size of Input Dataset (MB)

MDFS Write

MDFS Read

(a)

0

0.5

1

1.5

2

2.5

3

1 2 3 4 5 6 7

MD

FS

Th

rou

gh

pu

t (M

B/s

)


MDFS Write

MDFS Read

(b)

Fig. 9. MDFS Read/Write Throughput of (a) Large files(b) Small files

0

100

200

300

400

500

20 40 60 80 100

Ha

do

op

Jo

b T

ime

(se

c)


TeraSort

TeraGen

(a)

20

40

60

80

100

120

140

160

180

200

10 20 30 40 50

Ha

do

op

Jo

b T

ime

(se

c)


Network Time (sec)I/O Time (sec)

(b)

Fig. 10. (a) Job Completion time v.s. input dataset size(b) Processing time vs. Transmission time

100 MB data. Node failures are induced by turningoff the wireless interface during the processing stage.This emulates real world situations wherein devicesget disconnected from the network due to hardware orconnection failures. In Figure 7(b), one, two and threesimultaneous node failures are induced in iterations3, 5 and 8 respectively and original state is restoredin the succeeding iteration. The job completion timeis increased by 10% for each failure but the systemsuccessfully recovered from these failures.

In the MDFS layer, the k-out-of-n framework pro-vides data reliability. If a node containing fragments isnot available, the k-out-of-n framework chooses anothernode for the data retrieval. Since k and n are set to3 and 10 respectively, the system can tolerate up to7 node failures before the data becomes unavailable.If any task fails due to unexpected conditions, Task-Trackers notify the JobTracker about the task status.JobTracker is responsible for re-executing the failedtasks on some other machine. JobTracker also considersa task to be failed if the assigned TaskTracker does notreport the failure in configured timeout interval.

6.5 Effect of Input Data Size

Figure 9(a) and Figure 9(b) show the effect of inputdataset size on MDFS throughput. The experimentmeasures the average read and write throughput for dif-ferent file sizes. The block size is set to 4 MB. The resultshows that the system is less efficient with small filesdue to the overhead in setup of creation and retrievaltasks. Maximum throughput is observed for file sizesthat are multiples of block size. This will reduce the to-tal number of subtasks needed to read/write the wholefile, decreasing the overall overhead. In Figure 9(b), thethroughput gradually increases when the input datasetsize is increased from 1 MB to 4 MB because more data




can be transferred in a single block read/write request.However, when input dataset size is increased further,one additional request is required for extra data andthus throughput drops suddenly. The results show thatmaximum MDFS throughput is around 2.83 MB/s forreads and 2.12 MB/s for writes for file sizes that aremultiples of block size.

Figure 10 shows the effect of input dataset size onjob completion time. The experiment measures the jobcompletion time for different file sizes ranging from 5MB to 100MB. Files generated in mobile devices areunlikely to exceed 100 MB. However, MDFS does nothave any hard limit on input dataset size and it can takeany input size allowed in the standard Hadoop release.The result shows that the job completion time varies inless than linear time with input dataset size. For largerdatasets, there is a sufficient number of tasks that canbe executed in parallel across the cluster resulting inbetter node utilization and improved performance.

6.6 Centralized versus Distributed Architecture

The major difference between the distributed solutionand the centralized solution is that nodes in distributedsolution need to continuously synchronize their NameServer and Fragment Mapper. Due to the synchro-nization, the distributed solution needs to broadcastdirectory information after a file is created or updated.The broadcast messages directly impact the perfor-mance difference between the centralized architectureand the distributed architecture. When a MapReducejob has more read operations, distributed architecturemight perform better as all metadata information canbe queried locally rather than contacting the centralizedserver; when a MapReduce task has more write opera-tions, centralized architecture might perform better dueto lesser broadcast messages.

Figure 11(a) compares the number of broadcast mes-sages sent during file creation for different input datasetsizes. The block size is set to 4 MB. As input dataset sizeincreases, the number of file blocks also increases. In adistributed architecture, each block allocation in NameServer and subsequent fragment information update inFragment Mapper needs to be broadcast to all othernodes in the cluster so that their individual caches re-main in sync with each other. Large usage of bandwidthmakes broadcasting a costly operation in wireless net-works. This effect is much worse when the cluster sizegrows. Figure 11(b) compares the number of broadcastmessages sent during file creation for varying clustersizes. The updates are not broadcast in a centralizedapproach as the Name Server and Fragment Mappersare singleton instances.

The results prove that the distributed architectureis ideal for medium sized clusters with independentdevices and no central server. The overhead due tobroadcasting is minimal if the cluster is not large. Forlarge clusters, the communication cost required to keepthe metadata synchronized across all nodes becomes

0

50

100

150

200

250

300

350

400

0 10 20 30 40 50

# o

f B

roa

dca

st

Pa

cke

ts


Distributed Architecture

Centralized Architecture

(a)

0

50

100

150

200

250

300

350

400

4 5 6 7 8 9 10

# o

f B

roa

dca

st

Pa

cke

ts

Cluster Size

Distributed ArchitectureCentralized Architecture

(b)

Fig. 11. Effect of (a) Input dataset size on Networkbandwidth overhead in Centralized and Distributed Archi-tecture (b) Cluster size

0

200

400

600

800

1000

1200

1400

1600

0 20 40 60 80 100

Hadoop J

ob T

ime (

sec)


TeraGen-EnergyUnawareTeraGen-EnergyAware

TeraSort-EnergyUnawareTeraSort-EnergyAware

Fig. 12. New task scheduling algorithm vs. Random

significant. Hence, a centralized approach is preferredin large clusters. However, data reliability is guaranteedby k-out-of-n framework in both architectures.

6.7 Energy-aware task scheduling v.s. Random task

scheduling

As mentioned in Section 4.5, our energy-aware taskscheduling assigns tasks to taskTrackers considering thelocations of data fragments. The default task schedulingalgorithm in Map-Reduce component is ineffective inmobile ad-hoc network as the network topology in atraditional data center is completely different from amobile network. Figure 12 compares the job completiontime between our energy-ware scheduling algorithm anda random task scheduling. The default Map-Reducetask scheduling in a mobile ad-hoc network is essen-tially a random task allocation. In both TeraGen andTeraSort experiments, our scheduling algorithm effec-tively reduces the job completion time by more than100%. Lower job completion time indicates lower dataretrieval time and lower data retrieval energy of eachtaskTracker.

7 Roadmap for Mobile Hadoop 2.0

Porting both the jobTracker and the taskTracker dae-mons to the Android environment is our ongoing work.In future, we plan to integrate the energy-efficientand fault-tolerant k-out-of-n processing framework pro-posed in [9] into the Mobile Hadoop to provide betterenergy-efficiency and fault-tolerance in a dynamic net-work. We will also look into the heterogeneous prop-erties of the hardware and prioritize between various




devices based on the device specifications. For example,if static devices are available in the network, they canbe prioritized over other mobile nodes in the clusterfor data storage as they are less likely to fail; if amore powerful node like a laptop is present, it can beassigned more tasks than a smartphone. How to balancethe processing load as well as the communication loadof each node is a practical question that needs to beaddressed. To mitigate the single point of failure issue inthe centralized architecture, we plan to develop a hybridmodel where the Name Server and Fragment Mapperrun concurrently on multiple master nodes. The hybridmodel can reduce the load of each master node, toleratefailures, and improve the RPC execution time.

8 Conclusions

The Hadoop MapReduce framework over MDFSdemonstrates the capabilities of mobile devices to cap-italize on the steady growth of big data in the mobileenvironment. Our system addresses all the constraintsof data processing in mobile cloud - energy efficiency,data reliability and security. The evaluation resultsshow that our system is capable for big data analyticsof unstructured data like media files, text and sensordata. Our performance results look very promising forthe deployment of our system in real world clusters forbig data analytic of unstructured data like media files,text and sensor data.

Acknowledgment

This work was supported by Naval Postgraduate Schoolunder Grant No. N00244-12-1-0035 and NSF underGrants #1127449, #1145858 and #0923203.

References

[1] S. Huchton, G. Xie, and R. Beverly, “Building and evaluatinga k-resilient mobile distributed file system resistant to devicecompromise,” in Proc. MILCOM, 2011.

[2] G. Huerta-Canepa and D. Lee, “A virtual cloud computingprovider for mobile devices,” in Proc. of MobiSys, 2010.

[3] K. Kumar and Y.-H. Lu, “Cloud computing for mobile users:Can offloading computation save energy?” Computer, 2010.

[4] “Apache hadoop,” http://hadoop.apache.org/.[5] S. George, Z. Wei, H. Chenji, W. Myounggyu, Y. O. Lee,

A. Pazarloglou, R. Stoleru, and P. Barooah, “Distressnet: awireless ad hoc and sensor network architecture for situationmanagement in disaster response,” Comm. Mag., IEEE,2010.

[6] J.-P. Hubaux, L. Buttyan, and S. Capkun, “The quest forsecurity in mobile ad hoc networks,” in Proc. of MobiHoc,2001.

[7] H. Yang, H. Luo, F. Ye, S. Lu, and L. Zhang, “Security inmobile ad hoc networks: challenges and solutions,” WirelessCommunications, IEEE, 2004.

[8] C. A. Chen, M. Won, R. Stoleru, and G. Xie, “Resourceallocation for energy efficient k-out-of-n system in mobile adhoc networks,” in Proc. ICCCN, 2013.

[9] C. Chen, M. Won, R. Stoleru, and G. Xie, “Energy-efficientfault-tolerant data storage and processing in dynamic net-work,” in MobiHoc, 2013.

[10] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “Thehadoop distributed file system,” in Proc. of MSST, 2010.

[11] J. Dean and S. Ghemawat, “Mapreduce: Simplified dataprocessing on large clusters,” Commun. ACM, 2008.

[12] E. E. Marinelli, “Hyrax: Cloud computing on mobile devicesusing mapreduce,” Master’s thesis, School of Computer Sci-ence Carnegie Mellon University, 2009.

[13] T. Kakantousis, I. Boutsis, V. Kalogeraki, D. Gunopulos,G. Gasparis, and A. Dou, “Misco: A system for data analysisapplications on networks of smartphones using mapreduce,”in Proc. Mobile Data Management, 2012.

[14] P. Elespuru, S. Shakya, and S. Mishra, “Mapreduce systemover heterogeneous mobile devices,” in Software Technologiesfor Embedded and Ubiquitous Systems, 2009.

[15] F. Marozzo, D. Talia, and P. Trunfio, “P2p-mapreduce:Parallel data processing in dynamic cloud environments,” J.Comput. Syst. Sci., 2012.

[16] C. A. Chen, M. Won, R. Stoleru, and G. G. Xie, “Energy-efficient fault-tolerant data storage and processing in dy-namic networks,” in Proc. of MobiHoc, 2013.

[17] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google filesystem,” SIGOPS Oper. Syst. Rev., 2003.

[18] P. H. Carns, W. B. Ligon, III, R. B. Ross, and R. Thakur,“Pvfs: A parallel file system for linux clusters,” in Proc. ofAnnual Linux Showcase & Conference, 2000.

[19] S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long,and C. Maltzahn, “Ceph: A scalable, high-performance dis-tributed file system,” in Proc. of OSDI, 2006.

[20] “Lustre file system,” http://www.lustre.org.[21] “Hadoop 1.2.1 release,” http://hadoop.apache.org/docs/r1.

2.1/releasenotes.html.

Johnu George received his B.Tech degreein Computer Science from National Instituteof Technology Calicut, India in 2009. Hewas a research engineer at Tejas Networks,Bangalore during 2009-2012. He completedhis M.S. degree in Computer Science fromTexas A&M University in 2014. His researchinterests include distributed processing andbig data technologies for mobile cloud.

Chien-An Chen received the BS and MSdegree in Electrical Engineering from Univer-sity of California, Los Angeles in 2009 and2010 respectively. He is currently workingtoward the PhD degree in Computer Scienceand Engineering at Texas A&M University.His research interests are mobile computing,energy-efficient wireless network, and cloudcomputing on mobile devices.

Radu Stoleru is currently an associate pro-fessor in the Department of Computer Sci-ence and Engineering at Texas A&M Uni-versity. Dr. Stoleru’s research interests arein deeply embedded wireless sensor systems,distributed systems, embedded computing,and computer networking. He is the recipientof the NSF CAREER Award in 2013. He hasauthored or co-authored over 80 conferenceand journal papers with over 3,000 citations.

Geoffery G. Xie received the BS degreein computer science from Fudan University,China, and the PhD degree in computersciences from the University of Texas, Austin.He is a professor in the Computer ScienceDepartment at the US Naval PostgraduateSchool. He has published more than 60 ar-ticles in various areas of networking. Hiscurrent research interests include networkanalysis, routing design and theories, un-derwater acoustic networks, and abstraction

driven design and analysis of enterprise networks.

http://hadoop.apache.org/

http://www.lustre.org

http://hadoop.apache.org/docs/r1.2.1/releasenotes.html

http://hadoop.apache.org/docs/r1.2.1/releasenotes.html

Documents

Hadoop MapReduce for Mobile Clouds · 2017-04-29 · Hadoop is a scalable platform that provides distributed storage and computational capabilities on clusters of commodity hardware