Upload
huguk
View
121
Download
0
Embed Size (px)
DESCRIPTION
Radu Pastia: I've been working with Hadoop two years ago, when I started the Big Data Team at Avira. At first I was oriented more towards the operations side - sizing and setting up our new Hadoop cluster to run smoothly. As our setup stabilized, I started delving deeper into data science and machine learning. I have been coding ever since I had my first home computer running BASIC and my background before Hadoop is in backend scripting for web-based applications.
Citation preview
pas$aro.wordpress.com
@rpas$a
Building a connector – The Wrong Way
Mapper Reducer
Building a connector – The Right Way
Mapper Reducer Par$$oner
Input Split
Input Format
Record Reader
Record Writer
Output Format
The InputFormat: From Input to Mapper --range 2014-09-01;2014-09-20
--number_of_mappers 4
2014-‐09-‐01 2014-‐09-‐02 2014-‐09-‐03
2014-‐09-‐04
2014-‐09-‐05
… … …
2014-‐09-‐06
2014-‐09-‐20
2014-‐09-‐01
2014-‐09-‐02
2014-‐09-‐05
.
.
.
Input Split 1
(2014-‐09-‐01-‐A; record A)
(2014-‐09-‐01-‐B; record B)
(2014-‐09-‐01-‐…; record …)
(2014-‐09-‐02-‐A; record A)
(2014-‐09-‐02-‐B; record B)
(2014-‐09-‐02-‐…; record …)
(2014-‐09-‐05-‐A; record A)
(2014-‐09-‐05-‐B; record B)
(2014-‐09-‐05-‐…; record …)
Record Reader 1
Mapper
The InputFormat: From Input to Mapper --range 2014-09-01;2014-09-20
--number_of_mappers 4
2014-‐09-‐01 2014-‐09-‐02 2014-‐09-‐03
2014-‐09-‐04
2014-‐09-‐05
… … …
2014-‐09-‐06
2014-‐09-‐20
2014-‐09-‐01
2014-‐09-‐02
2014-‐09-‐05
.
.
.
Input Split 1
(2014-‐09-‐01-‐A; record A)
(2014-‐09-‐01-‐B; record B)
(2014-‐09-‐01-‐…; record …)
(2014-‐09-‐02-‐A; record A)
(2014-‐09-‐02-‐B; record B)
(2014-‐09-‐02-‐…; record …)
(2014-‐09-‐05-‐A; record A)
(2014-‐09-‐05-‐B; record B)
(2014-‐09-‐05-‐…; record …)
Record Reader 1
Mapper