MapReduce: Simplified Data Processing on Large Clusters Authors: Jeffrey Dean and Sanjay Ghemawat Presenter: Guangdong Liu Jan 28th, 2011

MapReduce: Simplified Data Processing on

Large Clusters

Authors: Jeffrey Dean and Sanjay Ghemawat

Presenter: Guangdong Liu

Jan 28th, 2011

Presentation Outline

Motivation

Goal

Programming Model

Implementation

Refinement

Motivation

Large-scale data processingMany data-intensive applications involve processing huge amounts of data and then producing lots of other data

Certain common themes are shared when executing such applications

Hundreds or thousands of machines are used Two categories of basic operation on the input data:

1) Map():process a key/value pair to generate a set of intermediate key/value pairs

2) Reduce(): merge all intermediate values with the same key

Goal

MapReduce: an abstraction that allows users to

perform simple computations across large data set

which is distributed on large clusters of

commodity PCs while hiding the details of

parallelization, data distribution, load balancing

and fault toleration User-defined functions

Automatic parallelization and distribution

Fault tolerance

I/O scheduling

Status monitoring

Programming Model

Inspired by Lisp primitives map and reduce

Map(key, val) Written by a user

Process a key/value pair to generate intermediate key/value pairs

The MapReduce library groups all intermediate values associated with the same key together and passes them to the reduce function

Reduce(key,vals) Also written by a user

Merge all intermediate values associated with the same key

Programming Model

Programming Model

Count words in docs Input consists of (doc_url, doc_contents) pairs

Map(key=doc_url, val=doc_contents), for each word w in contents, emit(w, “1”)

Reduce(key=word, values=counts_list), sum all “1”s in value list and emit result “(word, sum)”

Programming Model

Hello World, Bye World!

Hello MapReduce, Goodbye to MapReduce.

Welcome to UNL, Goodbye to

UNL.

Reduce Phase

DFS Map Phase

Intermediate Result

DFS

M1

M2

M3

(Hello, 1) (Bye, 1)

(World, 1)(World, 1)

(Welcome, 1)(to, 1)(to, 1)

(Goodbye, 1)(UNL, 1)(UNL, 1)

(Hello, 1)(to, 1)(Goodbye, 1)(MapReduce, 1)(MapReduce, 1)

R1

R2

(Hello, 2) (Bye, 1)(Welcome, 1)(to, 3)

(World, 2)(UNL, 2)(Goodbye, 2)(MapReduce, 2)

Implementation

User to do list Indicate input and output files

M: number of map tasks

R: number of reduce tasks

W: number of machines

Write map and reduce functions

Submit jobs

This requires no knowledge of parallel/distributed systems!!!

Implementation

… …

Reduce Phase

DFS

… …

Map Phase

Master

M2

R1

Inpu

t

P1... …Pr

B2

… …

Bn

B1 M1

Local WriteRead fro

m

DFS

P1… …

Pr

P1… …

Pr

Assign

MapTask Assign ReduceTask

Remote ReadOutput 1

Output r

Write to DFS

… …

Intermediate Result

DFS

Rr

ReducerMapperMn

Implementation

1. Input files split (M splits)

Each block is typically 16~64MB

Start up many copies of user program on a cluster of machines

2. Master & Workers One special instance becomes the master

Workers are assigned tasks by the master

There are M map tasks and R reduce tasks to assign

Master finds idle workers and assigns map or reduce tasks to them

Implementation

3. Map tasks Map workers read contents of corresponding

input partition

Perform user-defined map computation to create intermediate <key,value> pairs

The intermediate <key,value> pairs produced by the map function are buffered in memory

4. Writing intermediate data to disk (R regions) Buffered output pairs written to local disk

periodically

Partitioned into R regions by a partitioning function

Location of these buffered pairs on the local disk are passed back to the master

Implementation

5. Read & Sorting Use remote procedure calls to read the buffered

data from the local disks of map workers Sort intermediate data by the intermediate keys

6. Reduce tasks Reduce worker iterates over ordered

intermediate data Each unique key encountered – key & values are

passed to user's reduce function Output of user's reduce function is written to

output file on a global file system

7.When all tasks have completed, the master

wakes up user program

Implementation

Fault tolerance-in a word, redo Workers are periodically pinged by master No response = failed worker Reschedule failed tasks Note: completed map task by the failed

worker need to be re-executed because the output is stored on the local disk

Implementation

Locality Input data is managed by GFS and has

several replicas

Schedule a task on a machine containing a local replica or near a replica

Task GranularityM map tasks and R reduce tasks

Make M and R much larger than number of worker machines

Implementation

Backup tasksStraggler: a machine that takes an unusually

long time to complete one of the last few map or reduce tasks in the computation.

Cause: bad disk, competition for CPU …

Resolution: schedule backup executions of

in-progress tasks when a MapReduce operation is close to completion

Source

The example is quoted from: Wei Wei; Juan Du; Ting Yu; Xiaohui Gu; , "SecureMR:

A Service Integrity Assurance Framework for MapReduce," Computer Security Applications Conference, 2009. ACSAC '09. Annual , vol., no., pp.73-82, 7-11 Dec. 2009

Making Cluster Application Energy-Aware

Authors: Nedeljko Vaasic, Martin Braistits and Vincent Salzgerber

Jan 28th, 2011

Outline

Introduction

Case Study

Approach

Introduction

Power consumption A critical issue in large scale clusters

Data centers consume as much energy as a city

7.4 billion dollars per year

Current techniques for efficiency Consolidate workload into fewer machines

Minimize the energy consumption while keeping the same overall performance level

Problems Cannot operate at multiple power levels

Cannot deal with energy consumption limits

Case Study

Google’s Server Utilization and Energy

Consumption

Case Study

Hadoop Distributed File System (HDFS)

Case Study

Hadoop Distributed File System (HDFS)

Case Study

MapReduce

Case Study

Conclusion It is a wise decision to aggregate load

on a fewer number of machines for saving energy

Distributed applications must actively participate in the power management in order to avoid poor performance

Approach

On the Energy (In)efficiency of Hadoop Clusters

Authors: Jacob Leverich, Christ Kozyrakis

Jan 28th, 2011

Introduction

Improvement of energy efficiency of a cluster Place some nodes into low-power standby

modes

Avoid energy waste on oversized components for each node

Problems

Approach

Hadoop data layout overview Distribute replicas across different nodes in

order to improve performance and reliability

The user specifies a block replication factor n to ensure n identical copies of any data-block are stored across a cluster (typically n=3)

The largest number of nodes that can be disabled without impacting data availability is n-1

Approach

Covering subsetAt least one replica of a data-block must

be stored in a subset of nodes called covering subset

Make sure that a large number of nodes can be gracefully removed from a cluster without affecting the availability of data or interrupting the normal operation of a cluster

Documents

MapReduce: Simplified Data Processing on Large Clusters Authors: Jeffrey Dean and Sanjay Ghemawat Presenter: Guangdong Liu Jan 28th, 2011