20
www.KnowBigData.com Hadoop What, Why & How to Get Started with Big Data & Hadoop

Big data and Hadoop session - KnowBigData (techgig)

Embed Size (px)

Citation preview

Page 1: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

What, Why & How to Get Started with Big Data & Hadoop

Page 2: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

ABOUT INSTRUCTOR?

2014 KnowBigData Founded2014

Amazon Built High Throughput Systems for Amazon.com site similar to storm.

20122012 InMobi Built Recommender that churns 200 TB2011

tBits Global Founded tBits GlobalBuilt an enterprise grade Document Management System

2006

D.E.Shaw Built the big data systems before the term was coined

20022002 IIT Roorkee Finished B.Tech.

Page 3: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

❏ What/why of Big Data?

❏ Why Now?

❏ Examples Customers

❏ What is Hadoop?

TODAY’S CLASS

❏ Components of Hadoop

❏ Further Reading/Assignment

Page 4: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

WHAT IS BIG DATA?

• Simply: Data of Very Big Size

• Can’t process with usual tools

• Distributed Architecture Needed

Page 5: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

1.Groups of networked computers2.Interact with each other3.To achieve a common goal.

DISTRIBUTED COMPUTING

Page 6: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

CHARACTERSTICS OF BIG DATA

Problems Involving the handling of data coming at

fast rate.e.g. Number of requests

being received by Facebook, Youtube streaming, Google

Analytics

Problems involving complex data structurese.g. Maps, Social Graphs,

Recommendations

VOLUME VELOCITY VARIETY

Data At Rest Data In Motion Data in Many Forms

Problems related to storage of huge data reliably.

e.g. Storage of Logs of a website, Storage of data by

gmail.FB: 300 PB. 600TB/ day

Page 7: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

WHY IS IT IMPORTANT NOW?

Smart Phones

4.6 billion mobile-phones. 1 - 2 billion people accessing the internet. Facebook:1.06 bn monthly active users, 30 billion

pieces shared monthly.~175 million tweets every day

Connectivity: Social Networks

The connectivity improved. The devices became cheaper, faster and smaller.

Connectivity: Internet Of Things

Page 8: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

EXAMPLE BIG DATA CUSTOMERS

Web and e-commerce1.Recommendation Engines2.Search Quality3.Sentiment Analyses4.A/B testing

Telecommunications1.Customer Churn Prevention2.Network Performance Optimization3.Calling Data Record (CDR) Analysis4.Analyzing Network to Predict Failure

Page 9: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

EXAMPLE BIG DATA CUSTOMERS

Government1.Fraud Detection2.Cyber Security Welfare3.Justice

Healthcare & Life Sciences1.Health information exchange2.Gene sequencing3.Healthcare improvements4.Drug Safety

Page 10: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

EXAMPLE BIG DATA PROBLEMSRecommendations

Page 11: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

EXAMPLE BIG DATA PROBLEMSRecommendations

Page 12: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

EXAMPLE BIG DATA PROBLEMSSentiment Analysis

Page 14: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

BIG DATA SOLUTIONS

1.Apache Hadoop○ Apache Spark

2.Cassandra3.MongoDB4.Google Compute Engine

Page 15: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

WHAT IS HADOOP?

A. Created by Doug Cutting (of Yahoo) and Mike CafarellaB. Based on GFS, GMR & Google Big TableC. Built for Nutch search engine projectD. Named after Toy ElephantE. Open Source - ApacheF. Power, Popular & SupportedG. Framework to handle Big DataH. For reliable, scalable, distributed computingI. Written in Java

Page 16: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

Workflow

SQL Inteface

New Language

Machine learning / STATS

NoSQL Datastore

Compute Engine

Main Component

COMPONENTS

Page 17: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

ABOUT KNOWBIGDATA

❏ Expert Instructors

❏ CloudxLab

❏ Lifetime access to LMS

❏ Presentations

❏ Class Recording

❏ Assignments + Quizzes

❏ Project Work

❏ Real Life Project

❏ Course Completion Certificate

❏ 24x7 support

❏ KnowBigData - Alumni

❏ Jobs

❏ Stay Abreast (Updated Content,

Complimentary Sessions)

❏ Stay Connected

Page 18: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

WHAT IS CLOUDxLABSTM?

1. For Real Life Experience2. An online cluster of servers 3. With all required tools installed4. Accessible globally5. Do not require high end

configuration

Page 19: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

www.KnowBigData.com1.Starting on...

● 12 Dec - 7am Big Data & Hadoop● 12 Dec - 8:30pm Big Data & Spark

2.Sat-Sun - 3 hours3.33 hrs - 3 hr x 11 classes4.₹19999 (25% off) (Incl. Taxes) - $3695.Includes CloudxLabs + Support + LMS6.Every class is also recorded.

[email protected] +1 419 665 3276 (US) +91 803 959 1464 (IN)

Upcoming Courses

Page 20: Big data and Hadoop session - KnowBigData (techgig)

www.KnowBigData.comHadoop

Thank you.

[email protected] +1 419 665 3276 (US) +91 803 959 1464 (IN)

Subscribe to our Youtube channel for latest videos - https://www.youtube.com/channel/UCxugRFe5wETYA7nMH6VGyEA