17
INTRODUCTION TO BIG DATA

Oh! Session on Introduction to BIG Data

Embed Size (px)

Citation preview

Page 1: Oh! Session on Introduction to BIG Data

INTRODUCTION TO BIG DATA

Page 2: Oh! Session on Introduction to BIG Data

What is BIG DATA? Characteristics of Big Data What is BIG DATA Analysis? Traditional vs. Current Analytics Trends BIG Data using Hadoop! Hadoop History Hadoop – High Level Architecture Hadoop Variants Hadoop Skills NOSQL Introduction Big Data – Case Studies

Topics CoveredTable of Contents

2 | Oh! Session - Introduction to Big Data

Page 3: Oh! Session on Introduction to BIG Data

What is BIG DATA?Big Data, simply put, is data which is very BIG!

3 | Oh! Session - Introduction to Big Data

Big data is new and “ginormous” & scary – very, very scary term. No, wait. It is not.

Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate.

Examples of Big Data:SOCIAL MEDIA ACTIVITY – like Facebook, Twitter, LinkedIn, etc.FINANCIAL TRANSACTIONS – Internet Banking logs, Share Market, etc.LOCATION TRACKING – Global Positioning System data, etc.WEB BEHAVIOUR – Internet browsing, Google searches, etc.

Page 4: Oh! Session on Introduction to BIG Data

Characteristics of BIG DATA?Big data can be described by the following characteristics:

4 | Oh! Session - Introduction to Big Data

Volume The Quantity of generated & stored data. Size determines big data.

Variety The Type And Nature of the data.

Velocity The Speed of data generation.

Variability Inconsistency of the data set

Veracity The Quality of captured data can vary greatly, affecting accurate

analysis.

Page 5: Oh! Session on Introduction to BIG Data

What is BIG DATA ANALYSIS?

5 | Oh! Session - Introduction to Big Data

Big data analytics is the process of examining large data sets containing a variety of data types i.e. Big Data – to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information.

Benefits of Big Data AnalyticsThe analytical findings done on the Big Data can lead to: •more effective marketing•new revenue opportunities•better customer service•improved operational efficiency•competitive advantages over rival organizations •& other business benefits.

Page 6: Oh! Session on Introduction to BIG Data

Traditional vs. Current Analytics Trends

6 | Oh! Session - Introduction to Big Data

Data processing and Analytics: The old way

Traditionally, data processing analytics followed creation of modest amounts of structured data via enterprise applications (CRM, ERP, etc.) The modeled & cleansed data loaded into an enterprise data warehouse. The extent of complexity of data analyzed was limited to relational data only, thus TERADATA, EXADATA & NETEZZA was running the show.

Data processing and Analytics : The New way

Currently, data is growing exponentially and the variety has grown from text & relational (i.e. structured) to a mix of structured, semi-structured & un-structured data. The analytical tools-set had to change for handling the un-structured part of data which is why technologies like Hadoop, SPARK, NOSQL have become famous and have reduced the cost by providing open source systems & resilience with parallel processing.

Page 7: Oh! Session on Introduction to BIG Data

BIG Data using Hadoop!Why Hadoop? The most well known technology, which is open source, Java-based framework

helping manage structured and unstructured data is Hadoop

It is Flexible, Scalable, Robust, Cost effective, adaptive to upcoming technologies.

7 | Oh! Session - Introduction to Big Data

Hadoop in Action:Hadoop is a great framework for advertising companies as well. It keeps a good track of the millions of clicks on the ads and how the users are responding to the ads posted by the big Ad agencies! 

•Facebook – over 1.3 billion active users – storing, managing & keeping track of all profiles along with the related posts, comments, images, videos, and so on.

•LinkedIn – managing over 1 billion personalized recommendations/week using Map Reduce & HDFS features!

•Walmart – Helping handle more than 1 million customer transactions/hour

•Twitter – Managing and handling 85 million tweets from users/day

•Google – Managing more than 1 terabyte of data/hour

•eBay – handling and managing 80 terabytes of data/day and suggesting additional suitable products to their customers

•Spadac.com – helps run spatial intelligence & predictive analytics on huge volumes of data for providing actionable intelligence to its customers

Page 8: Oh! Session on Introduction to BIG Data

Hadoop History!!Brief Historical Timeline of Hadoop

8 | Oh! Session - Introduction to Big Data

Page 9: Oh! Session on Introduction to BIG Data

Hadoop – High Level Architecture

9 | Oh! Session - Introduction to Big Data

Page 10: Oh! Session on Introduction to BIG Data

Hadoop VariantsMajor variants for Hadoop and their distribution

10 | Oh! Session - Introduction to Big Data

1. Cloudera Hadoop(CDH)2. HortonWorks3. MapR

Page 11: Oh! Session on Introduction to BIG Data

Hadoop Skills

11 | Oh! Session - Introduction to Big Data

Page 12: Oh! Session on Introduction to BIG Data

Big Data – Case Studies

12 | Oh! Session - Introduction to Big Data

1. 2012 US Presidential Election • Barack Obama's Big Data won the

US election2. Data Storage

• NetApp3. Human Sciences

• NextBio

Page 13: Oh! Session on Introduction to BIG Data

Data in this model is stored inside documents.Documents are not typically forced to have a schema and therefore are flexible and easy to change.No Joins required

MONGODBWhat is MONGODB?

13 | Oh! Session - Introduction to Big Data

Page 14: Oh! Session on Introduction to BIG Data

MONGODBUse of HADOOP with MONGODB

14 | Oh! Session - Introduction to Big Data

Page 15: Oh! Session on Introduction to BIG Data

MONGODB Replicatation Possible Horizontal scalable Master Slave concept We can use Commodity Hardware

MONGODBSimilarities with HADOOP

15 | Oh! Session - Introduction to Big Data

HADOOP Replication Possible Horizontal scalable Master Slave concept We can use Commodity Hardware

Page 16: Oh! Session on Introduction to BIG Data

MONGODB Data stores in a Database Data serialize Data can be writable any time

MONGODBDifferences with HADOOP

16 | Oh! Session - Introduction to Big Data

HADOOP Data stores in a File system Data parallelism One time Writable

Page 17: Oh! Session on Introduction to BIG Data

Thank You

Feel Free to drop your queries to:Benoy Daniel [email protected] Bibhusisa Pattanaik [email protected]