43
Thailand Hadoop Big Data Challenge #1 13-15 March 2015

Thailand Hadoop Big Data Challenge #1

Embed Size (px)

Citation preview

Thailand Hadoop BigData Challenge #1

13-15 March 2015

2

Special thanks to Amazon Web Servicesfor supporting AWS's credit to run

EMR Hadoop cluster

3

Schedule13 March

– 16.00 - 18.00 Workshop / Demo on Big Data Analyticsusing Amazon EMR

– 18.00: Start registration for those who interested in runningthe cluster for 30 Hours & Account access to Amazon EMRwill be given

14 March

– 06.00 Amazon EMR Cluster will be opened

– Participant will be discussed via online / Social Media

15 March (@ EGA Office)

– 12.00 Amazon EMR will be closed

– 13.00 Presentation by each competitor on the result

– 15.30 Winner Announcement

4

Architecture Overview of Amazon EMR

5

Hadoop Cluster for the challenge

10 AWS’s m3.xlarge EC2 server each with4vCPU, 15 GByte Memory, 80 GB SSD Memory

A sample data set with more than 10 millionrecords will be given

6

Challenge rules

A competitor can use a sample data to analysewith Hive, Pig or Map/Reduce

In addition, a competitor can use own large set ofdata.

A winner will be judged from those who have abest innovation / result from the analytics.

Those who are just would like to try using thecluster are also welcome

7

Judging Criteria:

Complexity of the problem & Data Set 30%

Benefit to the society 20%

Innovation 30%

Presentation 20%

8

Judges

Assoc.Prof. Dr.Jirapun Daengdej

Mr. Danairat Thanabodithammachari

Dr.Thanachart Numnonda

Ms.Nantawan Wongkachonkitti

9

Awards

The best winner will receive an Apple TV.

Two winners will be selected for two free trainingcourses on– Big Data using Hadoop Workshop; 30-31 March 2015

– Business Intelligence Design and Process; 18-20, 25-26May 2015

Starbucks Card 200 Baht

10

EMR Cluster Setup(This will be done by IMC Institute)

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Select EMR

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR (cont.)

Name the cluster and also specify Log folder

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR (cont.)

Leave the Software Configuration as default

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR (cont.)

Leave the Hardware Configuration as default

Choose an exisitng EC2 key pair

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR (cont.)

Leave the others as default

Select Create Cluster

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

EMR Cluster Details

Note on the Master public DNS:

To see the details on how to connect to the Master Node using SSH click at SSH

18

Running the cluster

19

Set Up an SSH Tunnel to the Master Node

– See instruction at– http://docs.aws.amazon.com/ElasticMapReduce/latest/

DeveloperGuide/emr-ssh-tunnel.html

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

SSH Instruction for Mac/Linux

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

SSH Instruction for Windows

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Connect to the master node

23

Launch the Hue Web Interface

Set Up an SSH Tunnel to the Master Node

– See instruction at

– http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-ssh-tunnel.html

Configure Proxy Settings to View Websites

– See instruction at

– http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-connect-master-node-proxy.html

24

Launch the Hue Web Interface (Cont.)

http://master-public-dns-name:8888/

25

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Web Interface Host on EMR Cluster

27

Running Hive Demo

28

Movielen Data

http://grouplens.org/datasets/movielens/

MovieLens 10M

(http://files.grouplens.org/datasets/movielens/ml-10m.zip)

– ratings.dat

– users.dat

– movies.dat

29

Transfer Data to Hadoop Cluster

wget http://files.grouplens.org/datasets/movielens/ml-10m.zip

30

Change data format

31

Upload Data to Amazon S3

hadoop fs -put movies.csv s3://imcinstitute/data

32

Running Hive from CLI

33

Running Hive from Hue

34

Running Examplehttps://github.com/myui/hivemall/wiki/MovieLens-Dataset

35

Data Challenge

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Flight Details Data

http://stat-computing.org/dataexpo/2009/the-data.html

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Data Description

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Snapshot of Dataset

39

Register for thechallenge

40

Registration

Provide your name, organization, mobile, e-mailaddress

On-site registartion at 17.00 pm, 13 March

E-mail: [email protected]

Facebook message to Thanachart Numnonda

Your username & password & key & public DNS willbe send to your e-mail by 6 am, 14 March

41

On-line communication

Facebook Group: Hadoop-Thailand

Line group

Facebook message

E-mail to [email protected]

42

www.facebook.com/imcinstitute

43

Thank you

[email protected]/imcinstitutewww.slideshare.net/imcinstitutewww.thanachart.org