13
B97902029 葉葉葉 B97902083 葉葉葉 B97902085 葉葉葉 MapReduce : Simplified Data Processing on Large Clusters

B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

Embed Size (px)

Citation preview

Page 1: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

B97902029 葉彥廷B97902083 林廷韋B97902085 王頃恩

MapReduce: Simplified Data Processing on Large Clusters

Page 2: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

Why we choose this topicIntroductionProgramming ModelExampleImplementationConclusion

Outline

Page 3: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

趨勢騰雲駕霧程式競賽 (2010) Miserable memory in the last summer vacation.We didn’t design a distributed system

successfully in the end.So we want to learn the ideas of cloud

computing more.

Why we choose this topic

Page 4: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

How long can you stand for searching the answer of automata homework?A week? A day?Or ask Google for instant answers?

Introduction(1)

Page 5: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

But how can Google do it so fast?Google is good at automata?It’s MapReduce!!

And what can MapReduce do?

Introduction(2)

Page 6: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

MapReduce can:Simplified the procedure of computing large

amount of data.Split works into independent jobs, which can

be computed in distributed clusters.For programmer, he/she only needs to

implement the interface of Map and Reduce without much effort.

But how does it work?

Introduction(3)

Page 7: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

Map function:Take two input parameters : KEY/VALUESplit the VALUE into several intermediate

key/value pairs with user defined implementation. (may use KEY or not)

Send key/value pair to Reduce functions.

Programming Model(1)

Page 8: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

Reduce function:Receive input key/value pairs from Map

function.Merge together these values to form a possibly

smaller set of values with the same key. Collect the output from all clusters, and show

the result to the user.

Programming Model(2)

Page 9: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

Assume we have a log file of web page requests and it’s name.

We want to know what web page appears in the log file and it’s frequency.

Map functionInput: <logs file name , web page requests>Output:<URL,1>

Reduce functionInput:<URL,1>Output:<URL, total counts>

Example

Page 10: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

Implementation(1)

Page 11: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

Master Data StructureFor each map and reduce, it stores the state,

and the identity of worker machine.Fault Tolerance

Worker FailureMaster Failure

Implementation(2)

Page 12: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

LocalityRead the input locally without much use of

the network.Task Granularity

Backup Tasks

Implementation(3)

Page 13: B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion

Please DO NOT assign papers without inform us in the beginning of this semester.

Please stop FLIRTING with CHINA student.Please PREPARE the course content instead

of discussing 5 minutes.Please OK?

Conclusion