Papers on Storage Systems

Papers on Storage Systems

1) Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud, SC 2011.

2) Making Cloud Intermediate Data Fault-Tolerant, SOCC 2010.

Present by: Qiangju Xiao

Purlieus: Locality-aware Resource Allocation for

MapReduce in a CloudSC ’11, 2011

Authors:

Balaji PalanisamyAameek SinghLing LiuBhushan Jain

Introduction (1)• What does the paper present?– This paper designed Purlieus, a MapReduce

resource allocation system aimed to enhance the performance of MapReduce jobs in the cloud.

• How does Purlieus work?– Provision virtual MapReduce clusters in a locality-

aware manner;– Enable MapReduce VMs access to input data

(Map Phase) and intermediate data (Reduce phase) from local or close-by physical machines

Introduction (2)

• What are the improvements for Purlieus?– Reduces cumulative data center network traffic;– 50% reduction in job execution times for a variety

of workloads because network transfer times are big components of total execution time

Impact of Reduce Locality

System Model (1) –Current Cloud Infrastructure

Data Load

System Model (2) –Purlieus Infrastructure

1) Data is broken into chunks2) Blocks stored on distributed

file system of the physical machines

3) VM access data on physical machines

System Model (3) – Dataflow from physical to virtual machines

Two Key Questions

• Data Placement– Which physical machines should be used for each

dataset?• VM Placement– Where should the VMs be provisioned to process

these data blocks?

Purlieus’ Solution –Principles (1)

• Job Specific Locality-awareness– Placing data in the MapReduce cloud service

should incorporate job characteristics like the amount of data accessed in the map and reduce phases.

– Three distinct classes of jobs – (1) Map-input heavy; (2) Map-and-Reduce heavy; (3) Reduce-input-heavy.


• Load Awareness– Placing data in a MapReduce cloud should also

account for computational load (CPU, memory ) on the physical machines.

– Ensure that the expected load on the servers does not exceed a configurable threshold.


• Job-specific Data Replication– Replicas of the data set are placed based on the

type and frequency of jobs.For example, if an input dataset is used by three sets of MapReduce jobs, two of which are reduce-input heavy and one map-input heavy, Purlieus places two replicas of data blocks in a reduce-input heavy fashion and the third one using map-input heavy strategy.

Purlieus – Placement Techniques (1)

• Map-input heavy jobs– Data placement

• Do not require reducers to be executed close to each other;• Purlieus chooses machines that have the least expected load.

– VM placement• Attempt to place VMs on the physical machines that contain

the input data chunks for the map phase; if those machines have lower expected computational load, the VM may be placed close to the node that stores the actual data chunk.

• Among the physical machines at a same network distance, the one having the least load is chosen.


• Map and Reduce-input heavy jobs– Data Placement• Should support reduce-locality – VMs should be machines

close to each other;• Data blocks get placed in a set of closely connected

physical machines.– VM placement• Ensure that VMs get placed on either the physical

machines storing the input data or the close-by ones.• Map tasks use local reads and reduce tasks also read

within the same rack, maximizing the reduce locality


• Reduce-input heavy jobs– Data Placement• Map-locality is not so important;• Chooses the physical machine with maximum free

storage– VM placement• Network traffic for transferring intermediate data

among MapReduce VMs is intense in reduce-input heavy jobs and hence the set of VMs for the job should be placed close to each other.

Experiments• Data Placement Techniques

– Purlieus proposed locality and load-aware data placement (LLADP)– Random data placement (RDP)

• VM placement techniques:– Locality-unaware VMPlacement(LUAVP)– Map-locality aware VM placement (MLVP)– Reduce-locality aware VM placement (RLVP)– Map and Reduce-locality aware VM placement (MRLVP)– Hybrid locality-aware VM placement (HLVP): Our proposed HLVP

technique adaptively picks the placement strategy based on type of the input job. It uses MLVP for map-input heavy, RLVP for reduce-input heavy jobs and MRLVP for map and reduce-input heavy jobs.

Results – Map and Reduce-input heavy workload

Results – Map-input heavy workload

Results – Reduce-input heavy workload

Results – Macro analysis using MapReduce simulator, PurSim (1)

Results – Macro analysis using MapReduce simulator, PurSim (2)

Conclusions

• Purlieus’ proposed placement techniques optimize for data locality during both map and reduce phases of the job by considering VM placement, MapReduce job characteristics and load on the physical cloud infrastructure at the time of data placement.

• Purlieus’ evaluation shows significant performance gains with some scenarios showing close to 50% reduction in the cross-rack network traffic.

Making Cloud Intermediate Data Fault-Tolerant

SOCC 2010

Authors:

Steven Y. KoImranul HoqueBrian ChoIndranil Gupta

MapReduce

• Phases– Map– Shuffle– Reduce

• Data– Input– Intermediate– Output

Intermediate Data• Short lived• Used immediately• Discarded on

completion• Write once/• Read bounded• Large• Many blocks

Intermediate Data –Failures

Cascaded re-execution

Intermediate Data

Loss requires recomputation

Intermediate Data – Behavior breakdown

0f-10min

1f-30sec

Intermediate Data –Repliation

Traditional replication expensive

Can replication be accomplished without significantly affecting execution speed?

Extend HDFS

• Asynchronous replication• Replicate within rack• Minimize replicated data

Asynchronous Replication

• HDFS Replication usually pessimistic– Blocks until replicas made

• Do not block (Async)– Consistency loss not problem - only one writer

Asynchronous Replication

Replicate within Rack

• HDFS replicates to a different rack for greater availability

• Lifespan of intermediate data short• “Safe” to replicate to machine in same rack

Replicate within Rack

Minimize Data Replicated

• HDFS replication– Shuffle phase replicates most data as side effect– Only data used locally is not copied

• ISS– Replicate only local data

Minimize Data Replicated

IIS under failure

Conclusion

• Intermediate data properties allow a tailored replication strategy to outperform a traditional one

• Replication improves MapReduce performance in the case of failure

References

1) Balaji Palanisamy, Aameek Singh, Ling Liu, Bhushan Jain; Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud, SC 2011.

2) Steven Y. Ko, Imranul Hoque, Brian Cho, Indranil Gupta; Making Cloud Intermediate Data Fault-Tolerant, SOCC 2010

Documents

Papers on Storage Systems