8/4/2019 Big Data Camp Intro Hadoop
1/22
Big Data Camp, Delhi, Sep 10,
2011
Introduction to Hadoop / Big Data
1
8/4/2019 Big Data Camp Intro Hadoop
2/22
2
Good Times < Year 2000
Web Users
Web Servers RDBMS
Online Applications- OLTP
Report Users
Reporting Servers RDBMS DW
Analytics and Reporting- OLAP
8/4/2019 Big Data Camp Intro Hadoop
3/22
3
Year 2000 +
Web UsersWeb Servers
RDBMS
Online Applications- OLTP
Report Users
Reporting Servers
RDBMS DW
Analytics and Reporting- OLAP
8/4/2019 Big Data Camp Intro Hadoop
4/22
Big Data- Problems to Solve
Scalability
Storage
Fail
8/4/2019 Big Data Camp Intro Hadoop
5/22
The Knight in Shining Armor
Engine + LogicFile system
8/4/2019 Big Data Camp Intro Hadoop
6/22
Video:
What can Apache Hadoop Do for You?
8/4/2019 Big Data Camp Intro Hadoop
7/22
Who Uses Hadoop?
Search
Yahoo, Amazon, Zvents,
Log processing
Facebook, Yahoo
Recommendation Systems Facebook
Data Warehouse
Facebook, AOL
Video and Image Analysis
New York Times, Eyealike
INDIAN GOVERNMENT- UUID project
8/4/2019 Big Data Camp Intro Hadoop
8/22
HDFS: Design Principles
Hardware will Fail!
Petabyte ScaleStore!
8/4/2019 Big Data Camp Intro Hadoop
9/22
HDFS: Design Principles
8/4/2019 Big Data Camp Intro Hadoop
10/22
Map Reduce
Origin in Lisp!
Google- GFS paper!
Divide and Rule!
8/4/2019 Big Data Camp Intro Hadoop
11/22
Borrows from functional programming
Users implement interlace of two functions :
map (in_key, in_value) ->
(out_key, intermediate value) list
reduce (out_key, intermediate value list) ->
out_value list
Map ReduceProgramming Model
8/4/2019 Big Data Camp Intro Hadoop
12/22
Hadoop Map Reduce
8/4/2019 Big Data Camp Intro Hadoop
13/22
Hadoop Map Reduce
8/4/2019 Big Data Camp Intro Hadoop
14/22
Hadoop Example
Weather sensors collecting data every hour at manylocations cross the globe gather a large volume of logdata, which is a good candidate for analysis withMapReduce, since it is semistructured and record-oriented.
Data Format:
The data is stored using a line-oriented ASCII format, in whicheach line is a record. The format supports a rich set ofmeteorological elements, many of which are optional or with
variable data lengths. For simplicity, we shall focus on the basicelements, such as temperature, which are always present and areof fixed width.
8/4/2019 Big Data Camp Intro Hadoop
15/22
Hadoop Example
8/4/2019 Big Data Camp Intro Hadoop
16/22
Hadoop Example
8/4/2019 Big Data Camp Intro Hadoop
17/22
OLTP
Java Applications
Structured Data
hiho
Sqoop
Hadoop Ecosystem Map
RDBMS
File system
Engine + Logic
UnstructuredData
High LevelInterfaces
JAQL
Workflow
Cascading
Support
Cascading
More HighLevel
Interfaces
Monitor/manageHadoop ecosystem
1 2
3
45
6
7
8
9
10
11
12
13
14
http://www.google.co.in/imgres?imgurl=http://isabel-drost.de/Bilder/wordpress/karmasphere.jpg&imgrefurl=http://berlinbuzzwords.de/&usg=__RJZN_XQYGrXhMKTU3_tNe5NisJE=&h=43&w=227&sz=5&hl=en&start=7&sig2=N8gC6-ZvXO1TsJdXEgttoA&um=1&itbs=1&tbnid=T2TK9niQ5gcaBM:&tbnh=20&tbnw=108&prev=/images?q=karmasphere&um=1&hl=en&sa=N&rls=com.microsoft:en-us&tbs=isch:1&ei=ZPdSTLK-FIOtrAeLsYkw8/4/2019 Big Data Camp Intro Hadoop
18/22
How can You Contribute?
Apache Hadoop Projects Learn more about Hadoop
Contribute to source code
Participate in Mailing Lists/Forums
Share blogs etc.
Impetus Open Source Projects
Github/Google code hosted projects
Contribute to source code
8/4/2019 Big Data Camp Intro Hadoop
19/22
Thank you
Visit bigdata.impetus.com
http://bigdata.impetus.com/http://bigdata.impetus.com/8/4/2019 Big Data Camp Intro Hadoop
20/22
Big Data in EDW
20
8/4/2019 Big Data Camp Intro Hadoop
21/22
Building Big Data Analytics Platform
Commercial
Teradata/Netezza
Greenplum/Vertica/ Aster
Informatica
SAS/Microstrategy/
BusinessObjects
Pentaho/Jasper
Open source
CloverETL/Kettle/ Talend
Jaspersoft/Pentaho
Reporting
Hadoop
Apache Cassandra
Hybrid
ETL - Open Sourceand Commercial
Analytics - OpenSource or
Commercial
Commercial HadoopVersions
8/4/2019 Big Data Camp Intro Hadoop
22/22
Web Analytics
22