13
ACADGILD ACADGILD In this post, we will be performing analysis on the Uber dataset in Hadoop using MapReduce in Java. The Uber dataset consists of four columns; they are dispatching_base_number, date, active_vehicles and trips. You can download the dataset from here . Problem Statement 1: In this problem statement, we will find the days on which each basement has more trips. Source Code Mapper Class: public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy"); String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"}; private Text basement = new Text(); Date date = null; private int trips; public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { String line = value.toString(); String[] splits = line.split(","); basement.set(splits[0]); try { date = format.parse(splits[1]); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } trips = new Integer(splits[3]); String keys = basement.toString()+ " "+days[date.getDay()]; https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Map reduce use case uber data analysis

Embed Size (px)

Citation preview

Page 1: Map reduce use case   uber data analysis

ACADGILDACADGILD

In this post, we will be performing analysis on the Uber dataset in Hadoop using MapReduce in Java.

The Uber dataset consists of four columns; they are dispatching_base_number, date, active_vehicles

and trips. You can download the dataset from here.

Problem Statement 1:

In this problem statement, we will find the days on which each basement has more trips.

Source Code

Mapper Class:

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy");

String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};

private Text basement = new Text();

Date date = null;

private int trips;

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

String line = value.toString();

String[] splits = line.split(",");

basement.set(splits[0]);

try {

date = format.parse(splits[1]);

} catch (ParseException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

trips = new Integer(splits[3]);

String keys = basement.toString()+ " "+days[date.getDay()];

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 2: Map reduce use case   uber data analysis

ACADGILDACADGILD

context.write(new Text(keys), new IntWritable(trips));

}

}

From the Mapper, we will take the combination of the basement and the day of the week as key and the

number of trips as value.

First, we will parse the date, which is in string format into date format using SimpleDateFormat class

in Java. Now, to take out the day of the date, we will use the getDay() which will return an integer

value with the day of the week's number. So, we have created an array which consists of all the days

from Sunday to Monday and have passed the value returned by getDay() into the array in order to get

the day of the week.

Now, after this operation, we have returned the combination of Basement_number+Day of the

week as key and the number of trips as value.

Reducer Class:

In the reducer, we will calculate the sum of trips for each basement and for each particular day, by

using the below lines of code.

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 3: Map reduce use case   uber data analysis

ACADGILDACADGILD

}

Whole Source Code:

import java.io.IOException;

import java.text.ParseException;

import java.util.Date;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Uber1 {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy");

String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};

private Text basement = new Text();

Date date = null;

private int trips;

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

String line = value.toString();

String[] splits = line.split(",");

basement.set(splits[0]);

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 4: Map reduce use case   uber data analysis

ACADGILDACADGILD

try {

date = format.parse(splits[1]);

} catch (ParseException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

trips = new Integer(splits[3]);

String keys = basement.toString()+ " "+days[date.getDay()];

context.write(new Text(keys), new IntWritable(trips));

}

}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "Uber1");

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 5: Map reduce use case   uber data analysis

ACADGILDACADGILD

job.setJarByClass(Uber1.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

Running the Program:

First, we need to build a jar file for the above program and we need to run it as a normal Hadoop

program by passing the input dataset and the output file path as shown below.

hadoop jar uber1.jar /uber /user/output1

In the output file directory, a part of the file is created and contains the below output:

B02512 Sat 15026

B02512 Sun 10487

B02512 Thu 15809

B02512 Tue 12041

B02512 Wed 12691

B02598 Fri 93126

B02598 Mon 60882

B02598 Sat 94588

B02598 Sun 66477

B02598 Thu 90333

B02598 Tue 63429

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 6: Map reduce use case   uber data analysis

ACADGILDACADGILD

B02598 Wed 71956

B02617 Fri 125067

B02617 Mon 80591

B02617 Sat 127902

B02617 Sun 91722

B02617 Thu 118254

B02617 Tue 86602

B02617 Wed 94887

B02682 Fri 114662

B02682 Mon 74939

B02682 Sat 120283

B02682 Sun 82825

B02682 Thu 106643

B02682 Tue 76905

B02682 Wed 86252

B02764 Fri 326968

B02764 Mon 214116

B02764 Sat 356789

B02764 Sun 249896

B02764 Thu 304200

B02764 Tue 221343

B02764 Wed 241137

B02765 Fri 34934

B02765 Mon 21974

B02765 Sat 36737

B02765 Sun 22536

B02765 Thu 30408

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 7: Map reduce use case   uber data analysis

ACADGILDACADGILD

B02765 Tue 22741

B02765 Wed 24340

Problem Statement 2:

In this problem statement, we will find the days on which each basement has more number of active

vehicles.

Source Code

Mapper Class:

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy");

String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};

private Text basement = new Text();

Date date = null;

private int active_vehicles;

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

String line = value.toString();

String[] splits = line.split(",");

basement.set(splits[0]);

try {

date = format.parse(splits[1]);

} catch (ParseException e) {

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 8: Map reduce use case   uber data analysis

ACADGILDACADGILD

// TODO Auto-generated catch block

e.printStackTrace();

}

active_vehicles = new Integer(splits[3]);

String keys = basement.toString()+ " "+days[date.getDay()];

context.write(new Text(keys), new IntWritable(active_vehicles));

}

}

From the Mapper, we will take the combination of the basement and the day of the week as key and the

number of active vehicles as value.

First, we will parse the date which is in string format to date format using SimpleDateFormat class in

Java. Now, to take out the day of the date, we will use the getDay(), which will return an integer value

with the day of the week's number. So, we have created an array which consists of all the days from

Sunday to Monday and have passed the value returned by getDay(), into the array in order to get the

day of the week.

Now, after this operation, we have returned the combination of Basement_number+Day of the

week as key and the number of active vehicles as value.

Reducer Class:

Now, in the reducer, we will calculate the sum of active vehicles for each basement and for each

particular day, using the below lines of code.

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 9: Map reduce use case   uber data analysis

ACADGILDACADGILD

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

}

Whole Source Code:

import java.io.IOException;

import java.text.ParseException;

import java.util.Date;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Uber2 {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy");

String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};

private Text basement = new Text();

Date date = null;

private int active_vehicles;

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 10: Map reduce use case   uber data analysis

ACADGILDACADGILD

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

String line = value.toString();

String[] splits = line.split(",");

basement.set(splits[0]);

try {

date = format.parse(splits[1]);

} catch (ParseException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

active_vehicles = new Integer(splits[2]);

String keys = basement.toString()+ " "+days[date.getDay()];

context.write(new Text(keys), new IntWritable(active_vehicles));

}

}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 11: Map reduce use case   uber data analysis

ACADGILDACADGILD

}

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "Uber1");

job.setJarByClass(Uber2.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

Running the Program:

First, we need to build a jar file for the above program and run it as a normal Hadoop program by

passing the input dataset and the output file path as shown below.

hadoop jar uber2.jar /uber /user/output2

In the output file directory, a part file is created and contains the below output:

B02512 Fri 2221

B02512 Mon 1676

B02512 Sat 1931

B02512 Sun 1447

B02512 Thu 2187

B02512 Tue 1763

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 12: Map reduce use case   uber data analysis

ACADGILDACADGILD

B02512 Wed 1900

B02598 Fri 9830

B02598 Mon 7252

B02598 Sat 8893

B02598 Sun 7072

B02598 Thu 9782

B02598 Tue 7433

B02598 Wed 8391

B02617 Fri 13261

B02617 Mon 9758

B02617 Sat 12270

B02617 Sun 9894

B02617 Thu 13140

B02617 Tue 10172

B02617 Wed 11263

B02682 Fri 11938

B02682 Mon 8819

B02682 Sat 11141

B02682 Sun 8688

B02682 Thu 11678

B02682 Tue 9075

B02682 Wed 10092

B02764 Fri 36308

B02764 Mon 26927

B02764 Sat 33940

B02764 Sun 26929

B02764 Thu 35387

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit

Page 13: Map reduce use case   uber data analysis

ACADGILDACADGILD

B02764 Tue 27569

B02764 Wed 30230

B02765 Fri 3893

B02765 Mon 2810

B02765 Sat 3612

B02765 Sun 2566

B02765 Thu 3646

B02765 Tue 2896

B02765 Wed 3152

We hope this post has been helpful in understanding the Uber Data Analysis use case using

MapReduce. In the case of any queries, feel free to comment below and we will get back to you at the

earliest. Keep visiting our site www.acadgild.com for more updates on BigData and other technologies.

https://acadgild.com/blog/wp-admin/post.php?post=13515&action=edithttps://acadgild.com/blog/wp-admin/post.php?post=13515&action=edit