View
215
Download
0
Category
Preview:
DESCRIPTION
Confidential Netmagic Internal Use Only Apache Project and Animal Friendly names Some of the Projects under Apache Foundation to mention: 3 Apache Zookeeper Apache Tomcat Apache Pig
Citation preview
Big Data Tools
Hadoop
S.S.MulaySr. V.P. Engineering
February 1, 2013
Confidential Netmagic Internal Use Only
Hadoop - A Prelude
2
Confidential Netmagic Internal Use Only
Apache Project and Animal Friendly names
Some of the Projects under Apache Foundation to mention:
3
Apache Zookeeper Apache Tomcat
Apache Pig
Confidential Netmagic Internal Use Only
And now Hadoop
Confidential Netmagic Internal Use Only
Hadoop – The Name
5
Confidential Netmagic Internal Use Only
Hadoop – The Relevance
6
Apache Zookeeper
Two Important things to know when discussing Big Data
● MapReduce
● Hadoop.
Confidential Netmagic Internal Use Only
Hadoop – How was it Born?
● To Process Huge Volume of data, as the amount of generated data continued to rapidly increase. (Big Data).
● Also the Web generated more and more information, which was becoming quite challenging to index the content.
7
Apache Zookeeper Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – The Reality Vs Myth
8
Apache Zookeeper Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – Some Use Cases
9Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – What do we expect from it ?
If we analyze the mentioned use cases, we realize that
10
Confidential Netmagic Internal Use Only
Hadoop – Components which come to the rescue
11
Apache Zookeeper Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – Who’s Using It ?
12
Apache Zookeeper
Apache Tomcat
Uses Hadoop and HBase for :• Social services • Structured data storage• Processing for internal use Uses Hadoop for :
• Amazon's product search indices They process millions of sessions daily for analytics.
Uses Hadoop for :• Search optimization• Research
Uses Hadoop for :• Databasing and analyzing Next Generation Sequencing (NGS) data produced for the Cancer Genome Atlas (TCGA) project and other groups
Uses Hadoop for :• Internal log reporting/parsing systems designed to scale to infinity and beyond.• web-wide analytics platform
Uses Hadoop :• As a source for reporting/analytics and machine learning.
And Many More ….
Confidential Netmagic Internal Use Only
Hadoop – The Various Forms Today
13
Apache Zookeeper Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – Use Case Example – Log Processing
● Some of the Practical Use cases for Log Processing Generally in use today :
Assuming a situation we have Huge Log’s generated for a period of time ranging in TB’s and we want to know :
14Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – Use Case Example – Log Processing
In the Conventional Method :Parallelism is on a per file basis and not on a Single file.
15Apache Tomcat
Final Data Set
Concatenate Data Set
Task - new
Confidential Netmagic Internal Use Only
Hadoop – Use Case Example – Log Processing
With Map Reduce:
16Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – Use Case Example – Log Processing
● Infrastructure realities in Conventional Method :
● How things Change With Map Reduce
● Assuming ● Single Disk can transfer data at the speed of 75MB/Sec● If we consider a Hadoop Cluster of 4000 Nodes and each Server of 6 Disks each.● The overall Throughput of the Setup would be
17Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – Big Data Integration Challenges
18Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – Native Solutions & Challenges
19
Apache Zookeeper Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – Advantages of Commercial Solutions
20
Apache Zookeeper Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop – Commercial Solutions For Hadoop
The Solutions Fit into 2 Categories :● Infrastructure Automation● Application Automation
21
Apache Zookeeper Apache Tomcat
Confidential Netmagic Internal Use Only
Gartner Report – Magic Quadrant for Data Integration Tools
22
Apache Zookeeper Apache Tomcat
Confidential Netmagic Internal Use Only
Hadoop & Cloud – Hand in Hand ?
What Advantages does Cloud Bring in :
Thus Hadoop going on Cloud does bring in the above advantages on the table to the Enterprises.All the Commercial Distributions available today, do offer a Virtual image option to deploy on Cloud / Virtualization Platform.
Virtualization Solution Providers like vmware have come up with Project “Serengeti” to Support Quick Deployment and Management of Hadoop on Cloud.
Cloud Service providers like Amazon, Netmagic and others have a deployment option of Hadoop Infrastructure on Cloud.
23Apache Zookeeper
Apache Tomcat
Confidential Netmagic Internal Use Only
Insert your image here
Contact Details
For related queries/ feedback, mail tossmulay@netmagicsolutions.com
+91-9820453568
Confidential Netmagic Internal Use Only
Thank You
http://www.linkedin.com/companies/netmagic
http://twitter.com/netmagic http://www.facebook.com/NetmagicSolutions
http://www.youtube.com/user/netmagicsolutions
Recommended