Upload
arohi-khandelwal
View
43
Download
2
Embed Size (px)
Citation preview
Presentation on Big Data & Hadoop
PRESENTED BY: AROHI KHANDELWAL
1
Contents : What is BIG DATA ? Why BIG DATA ? Hadoop Hadoop Architecture Hadoop Distributed File System HDFS Architecture Map Reduce How Map Reduce Works ? Hadoop Ecosystem What is Hadoop used for ? Users of Hadoop Advantage & Disadvantage of Hadoop Conclusion
2
What Is BIG DATA ?
Big Data
VolumeVarietyVelocity
3
Why BIG DATA ? 4
Mobile phone increased 70.3% to 918m in last two years.
Twitter has 328m monthly active users – 55% growth
Facebook has 765m active users.
Google+ has 495m monthly active users – grow 45%
LinkedIn has 300m users.
On every single minute 48 hours of video are posted.
Hadoop :
Open source distributed computing framework . Built on Java and Scala languages. Named by Doug Cutting on his son’s toy elephant.
5
Storage
Process
Hadoop
Hadoop Architecture :
Hadoop designed and built on two independent frame works namely : Hadoop Distributed File System Map Reduce
Hadoop
Map ReduceHDFS
6
Hadoop Distributed File System :
Based on Google File System. Data is stored in the form of blocks . Provide data reliability. Provide fast processing on data.
7
HDFS Architecture :
Hadoop Distributed File System has : Name node Data nodes
8
Map Reduce :9
Takes a set of data & breaks individual
elements into tuple
Takes Map’s o/p as i/p and combine those data tuple forming a similar set of
tuple
How Map Reduce works ?10
Hadoop Ecosystem
:HDFSYARN Map Reduce V2HBASEHIVEApache PigOozieZookeeperSqoop
11
What is Hadoop used for ?
Search • Yahoo , AmazonLog processing • Facebook , Yahoo
Data Warehouse • Facebook , AOLVideo & Image Analysis • New York Times
12
Users of Hadoop :13
Advantage of Hadoop :
platform independent. Block structured file system. We can store any thing. Huge storage capacity. Rapidly process large amounts of data in parallel. Fault-tolerance.
14
Disadvantage of Hadoop :
Not Fit for Small Data Setup Issue Programming model is very restrictive
15
Summery
Hadoop excels at Big Data , analytics , batch processing.
Not real-time , no random access ; not a database.
HDFS makes it all possible: Fault tolerant file system Fast accessing speed . Pig , Hive are easy to use.
16
THANKING YOU …