Upload
suvash-shah
View
42
Download
6
Embed Size (px)
Citation preview
Big Data Project on
Crystal BallSubmitted By:
Sushil Sedai(984474)
Suvash Shah(984461)
Submitted to:Prof. Prem Nair
Pair approach (Mapper) – pseudo code
method map(docid id, doc d)
for each term w in doc d do
total = 0;for each neighbor u in Neighbor(w) do
Emit(Pair(w, u), 1);
total++;
Emit(Pair(w, *), total);
Pair approach (Mapper) – Java Code
Pair approach (Reducer) – pseudo code
method reduce(Pair p, Iterable<Int> values)
if p.secondValue == *
if p.firstValue is new
currentvalue = p.firstvalue;
marginal = sum(values)
else
marginal += sum(values)
else Emit(p, sum(values)/marginal);
Pair approach (Reducer) – Java Code
Pair approach - input
Mapper1 input
18 29 12 34 79 18 56 12 34 92
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Pair approach – Output (Reducer1)(10,12) 0.5
(10,34) 0.5
(12,10)0.09090909090909091
(12,18)0.09090909090909091
(12,34)0.36363636363636365
(12,56) 0.18181818181818182
(12,79)0.09090909090909091
(12,92)0.18181818181818182
(18,12) 0.25
(18,29) 0.125
(18,34) 0.25
(18,56) 0.125
(18,79) 0.125
(18,92) 0.125
(29,10)0.06666666666666667
(29,12)0.26666666666666666
(29,18)0.06666666666666667
(29,34)0.26666666666666666
(29,56)0.13333333333333333
(29,79)0.06666666666666667
(29,92)0.13333333333333333
(34,10)0.08333333333333333
(34,12) 0.25
(34,18)0.08333333333333333
(34,29)0.08333333333333333
(34,56) 0.25
(34,79)0.08333333333333333
(34,92)0.16666666666666666
(56,10) 0.1
(56,12) 0.3
(56,29) 0.1
(56,34) 0.3
(56,92) 0.2
(92,10)0.3333333333333333
(92,12)0.3333333333333333
(92,34)0.3333333333333333
Pair approach – Output (Reducer2)
(79,12) 0.2
(79,18) 0.2
(79,34) 0.2
(79,56) 0.2
(79,92) 0.2
Stripe approach (Mapper) – pseudo code
method map(docid id, doc d)
Stripe H;
for each term w in doc d do
clear(H);
for each neighbor u in Neighbor(w) do
if H.containsKey(u)
H{u} += 1;
else
H.add(u, 1);
Emit(w, H);
Stripe approach (Mapper) – Java Code
Stripe approach (Reducer) – pseudo code
total = 0;
method reduce(Text key, Stripe H [H1, H2, …])
total = sumValues(H);
for each Item h in H do
h.secondValue /= total;
Emit(key, H);
Stripe approach (Reducer) – Java Code
Stripe appoach (Reducer) – Java Code
Stripe approach – input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Stripe approach – Output(Reducer1)
10 [ (34,0.5000) (12,0.5000) ]
12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ]
18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ]
29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667) (12,0.2667) ]
34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833) (12,0.2500) ]
56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ]
92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]
Stripe approach – Output(Reducer2)
79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000) (12,0.2000) ]
Hybrid approach (Mapper) – pseudo code
method map(docid id, doc d)
HashMap H;
for each term w in doc d do
for each neighbor u in Neighbor(w) do
if H.contains(Pair(w, u))
H{Pair(w, u)} += 1;
else
H.add(Pair(w, u));
for each Pair p in H do
Emit(p, H(p));
Hybrid approach (Mapper) – Java Code
Hybrid approach (Reducer) – pseudo codeprev = null;
HashMap H;
Method reduce(Pair p, Iterable<Int> values)
if p.firstValue != prev and not first
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
clear(H);
End if
prev = p.firstValue;
H.add(p.secondValue, sum(values));
Method close
//for last pair
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
Hybrid approach (Reducer) – Java Code
Hybrid approach (Reducer) – Java Code
Hybrid approach - Input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Hybrid approach – Output(Reducer1)
10(12,0.5) (34,0.5)
12(10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909) (92,0.18181819)
18(12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125)
29(10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334) (79,0.06666667) (92,0.13333334)
34(10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336) (92,0.16666667)
56(10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2)
92(10,0.33333334) (12,0.33333334) (34,0.33333334)
Hybrid approach – Output(Reducer2)
79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)
Comparison
Apache Spark
Write a java program on spark to calculate total number of students in MUM coming in different entries. This program should display total number student by country.
Spark - Java Code
Spark - input
2014 Feb Nepal 20
2014 Feb India 15
2014 Oct Italy 2
2014 July France 1
2015 Feb Nepal 10
2015 Feb India 25
2015 Oct Italy 7
Spark - Output
(France,1)
(Italy,9)
(Nepal,30)
(India,40)
Tools Used
• VMPlayer Pro 7
• cloudera-quickstart-vm-5.4.0-0-vmware
• Eclipse Version: Luna Service Release 2 (4.4.2)
• Windows 8.1
References
• http://glebche.appspot.com/static/hadoop-ecosystem/mapreduce-job-java.html
• https://hadoopi.wordpress.com/2013/06/05/hadoop-implementing-the-tool-interface-for-mapreduce-driver/
• http://www.bogotobogo.com/Hadoop/BigData_hadoop_Apache_Spark.php
Thank You