Upload
peter-mcdonald
View
220
Download
2
Embed Size (px)
Citation preview
B97902029 葉彥廷B97902083 林廷韋B97902085 王頃恩
MapReduce: Simplified Data Processing on Large Clusters
Why we choose this topicIntroductionProgramming ModelExampleImplementationConclusion
Outline
趨勢騰雲駕霧程式競賽 (2010) Miserable memory in the last summer vacation.We didn’t design a distributed system
successfully in the end.So we want to learn the ideas of cloud
computing more.
Why we choose this topic
How long can you stand for searching the answer of automata homework?A week? A day?Or ask Google for instant answers?
Introduction(1)
But how can Google do it so fast?Google is good at automata?It’s MapReduce!!
And what can MapReduce do?
Introduction(2)
MapReduce can:Simplified the procedure of computing large
amount of data.Split works into independent jobs, which can
be computed in distributed clusters.For programmer, he/she only needs to
implement the interface of Map and Reduce without much effort.
But how does it work?
Introduction(3)
Map function:Take two input parameters : KEY/VALUESplit the VALUE into several intermediate
key/value pairs with user defined implementation. (may use KEY or not)
Send key/value pair to Reduce functions.
Programming Model(1)
Reduce function:Receive input key/value pairs from Map
function.Merge together these values to form a possibly
smaller set of values with the same key. Collect the output from all clusters, and show
the result to the user.
Programming Model(2)
Assume we have a log file of web page requests and it’s name.
We want to know what web page appears in the log file and it’s frequency.
Map functionInput: <logs file name , web page requests>Output:<URL,1>
Reduce functionInput:<URL,1>Output:<URL, total counts>
Example
Implementation(1)
Master Data StructureFor each map and reduce, it stores the state,
and the identity of worker machine.Fault Tolerance
Worker FailureMaster Failure
Implementation(2)
LocalityRead the input locally without much use of
the network.Task Granularity
Backup Tasks
Implementation(3)
Please DO NOT assign papers without inform us in the beginning of this semester.
Please stop FLIRTING with CHINA student.Please PREPARE the course content instead
of discussing 5 minutes.Please OK?
Conclusion