14
Netherlands | USA | India | UK | France SOFTWARE DEVELOPMENT DONE RIGHT

Hadoop

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Hadoop

Netherlands | USA | India | UK | France

SOFTWARE DEVELOPMENT DONE RIGHT

Page 2: Hadoop

Generally refers to data that can not be processed by traditional systems efficiently mainly because of it's size.

Twitter/Facebook example Facebook – 500TB data daily Twitter – 250million tweets daily

90% of data has been generated in last 2-3 years.

What is Big Data?

Page 3: Hadoop

Sources -• Social networking sites like twitter, facebook etc.• Smart phones • Trading platforms• Machines• Log Files

This data is used for different purposes like• Product Trends• Market Analysis

Big Data Sources

Page 4: Hadoop

Apache Hadoop is a Framework for running applications on large cluster built of commodity hardware. Transparently provides applications both reliability and data motion. Implements a computational paradigm named Map/Reduce where application is divided in small fragments of work. Provides a distributed file system (HDFS) Transfers code near to data. Hadoop opened the gates for processing Big Data

What is Hadoop ?

Page 5: Hadoop

Hadoop is based on work done by Google

GFS – HDFS

Google Map Reduce – Hadoop Map Reduce

BigTable – HBase

Hadoop's History

Page 6: Hadoop

Partial Failure Support

Data Recoverability

Component Recovery

Consistency

Scalability

Hadoop Features

Page 7: Hadoop

Core Components• HDFS – Hadoop Distributed File System• Map Reduce

Projects in Hadoop Ecosystem• Pig, Hive, HBase, Flume, Oozie, Sqoop etc.

Hadoop Components

Page 8: Hadoop

HDFS

Page 9: Hadoop

Map/Reduce

Page 10: Hadoop

Product - Data Quality and cleansing product solutions.

Before Hadoop Two node DB cluster Multi-threaded java application for de-duplication 1 million records took 10 hrs. to process

After Hadoop 8 GB Ram, 4 cores, 4 machines in cluster. 1 million records took 30 min to process

Case Study

Page 11: Hadoop

Any application which has > 10TB data Needs fast and cheap processing

Log Analysis Recommendation Engine Feed Analysis Data Mining Statistical Analysis ETL Processing Business Intelligence

Hadoop In Use

Page 12: Hadoop

Cloudera is “The commercial Hadoop company”.

Founded by leading experts on Hadoop from Facebook, Google,Oracle and Yahoo.

Provides consulting and training services for Hadoop users.

Staff includes committers to virtually all Hadoop projects.

Cloudera

Page 13: Hadoop

Books Hadoop : The Definitive Guide (by Tom White) Hbase : The Definitive Guide (by Lars George) MapReduce Design Patterns (by Donald Miner)

Web http://hadoop.apache.org/ http://hbase.apache.org/ http://research.google.com/archive/bigtable.html http://research.google.com/archive/mapreduce-osdi04.pdf

Resources

Page 14: Hadoop

Contact us @

Xebia IndiaWebsitewww.xebia.comwww.xebia.inwww.xebia.fr

Thought Leadershiphttp://blog.xebia.comhttp://podcast.xebia.com