Upload
nagarjuna-dn
View
779
Download
3
Embed Size (px)
Citation preview
Introduction: Cloud Computing and Big Data - Hadoop
Presented By: Nagarjuna D.NSAP CTLAT&T, Bengaluru
Date: 14-07-2015
Overview• Cloud Computing Evolution
• Why Cloud Computing needed?
• Cloud Computing Models
• Cloud Solutions
• Cloud Jobs opportunities
• Criteria for Big Data
• Big Data challenges
• Technologies to process Big Data- Hadoop
• Hadoop History and Architecture
• Hadoop Eco-System
• Hadoop Real-time Use cases
• Hadoop Job opportunities
• Hadoop and SAP HANA integration
• Summary2
Internet of Things (IoT)
Big Data “One of the Reason is Cloud Computing….!”
3
Cloud Computing (Evolution of an internet and its hidden from the end user)
• Infrastructure is maintained somewhere with shared computing resources -servers and storage, network, all delivered over the Internet.
• The Cloud delivers a hosting environment that is- -immediate, -flexible, -scalable,-secure,-available,-saves corporations money, time and resources.
Flexible
Scalable
Secure
Cloud Computing (Cont….)
• In addition, the platform provides on demand services, i.e always on, anywhere, anytime and any place.
• “Pay-for-what-you-use”- metered basis.
• Its based on utility computing and Virtualization.
5
Cloud Computing History
Traditional Infrastructure Model
Chart Title Forecasted Infrastructure Demand
Time
Capital
7
Acceptable Surplus
Chart Title Forecasted Infrastructure Demand
Surplus
Time
Capital
8
Actual Infrastructure Model
Chart Title
Actual Infrastructure Demand
Time
Capital
9
Unacceptable Surplus
Surplus
Time
Capital
10
Unacceptable Deficit
Deficit
Time
Capital
11
Utility Infrastructure Model(Concept of Cloud Computing)
Chart Title
Actual Infrastructure Demand
Time
Capital
12
Cloud Flavors (Service Models)
• IaaS – Infrastructure as a Service
• PaaS – Platform as a Service
• SaaS – Software as a Service
13
SaaS Examples
14
IaaS Examples
15
PaaS Examples
16
Cloud Deployment Models• Public Cloud
• Private Cloud
• Hybrid Cloud
• Community Cloud
17
Cloud Distribution Examined
18
Enterprise Cloud Solutions
1. Test / Development / QA Platformo Use cloud infrastructure servers as test and development
platform2. Disaster Recoveryo Keep images of servers on cloud infrastructure ready to
go in case of a disaster 3. Cloud File Storageo Backup or Archive company data to cloud file storage
4. Load Balancingo Use cloud infrastructure for overflow management during
peak usage times
19
Enterprise Cloud Solutions (cont)
5. Overhead Controlo Lower overhead costs and make bids more competitive
6. Distributed Network Control and Cost Reportingo Create an individual private networks (VPC) for each of
subsidiaries or contracts7. Rapid Deploymento Turn up servers immediately to fulfill project timelines
8. Functional IT Labor Shifto Refocus IT labor expense on revenue producing activities
20
Preparing for the Future Cloud IT JobsSampling of IT skills likely to be in demand in the futureo Functional application development and support
I.e. Oracle, SAP, SQL, linking hardware to software o Leveraging data to make strategic business decisions
I.e. Business Intelligence : Applying sales forecasts to inventory and manufacturing decisions
o Mobile apps Android, iPhone, Windows Mobile
o Wi-Fi engineers USF to include broadband communications (LTE replaces GSM/CDMA)
o Optical engineers Optical offers the highest bandwidth today (PON, CWDM, DWDM)
o Virtualization Specialists Economies of scale require virtualization (server, storage, client…)
o IP Engineerso Network Security Specialistso Web developerso Social Media developerso Business Intelligence application development and support
21
IT Cloud infrastructure
23
“Big Data- Big Thing”
• Big Data is exactly like Rubik’s cube.
• Just like a Rubik’s cube Big Data has many different solutions.
• If you take five Rubik’s cube and mix up the same way and give it to five different expert’s.
• They will solve the Rubik’s cube in fractions of the seconds.
• But if you pay attention to the same closely, you will notice that even though the final outcome is the same, the route taken to solve the Rubik’s cube is not the same.
• Every expert will start at a different place(colors) and will try to resolve it with different methods.
• It is nearly impossible to have a exact same route taken by two experts.
Begining Big Data
24
25
Big Data Definition in general
• Big Data is a collection of data sets that are large and complex in nature.
• They constitute both structured and unstructured data that grow large so fast that they are not manageable by traditional relational database systems(Eg., RDBMS).
26
Big Data Technically
i. Volumepetta bytes or Zetta bytes.
ii. VelocityBatch or real(stream) time processing.
iii. VarietyStructured, semi-structured & Unstructured.It is estimated that 80% of world’s data are unstructured and rest of them semi-structured and structured.
iv. Veracity The quality of the data being captured
can vary greatly.
Fig.Big Data Based on Doug Cutting 3Vs model
27
Variety of Data1. Structured Data:- Data i.e. identifiable because its organized in a structure(Standard defined format)E.g.: Database, Data Warehouses & Electronic spreadsheets.
2. Semi-Structured Data:- Data i.e. neither raw data, nor typed data in a conventional database systemE.g.: Wiki pages, Tweets, Facebook data & Instant Messages.
3. Unstructured Data:- its doesn’t have standard defined structureE.g.: Data files, Audio files, Video, Graphics & Multimedia.
28
Traditional Data v/s Big Data
Attributes Traditional Data Big Data
Volume Gigabytes to terabytes Petabytes to zettabytes
Organizaton Centralized Distributed
Structure Structured Semi-structured & unstructured
Data model Strict schema based Flat schema
Data relationship Complex interrelationships Almost flat with few relationships
29
Criteria of Big Data
1. 272 hours of video are uploaded to YouTube every minute and over 3 billion hours of video are watched every month.
2. Radio Frequency ID (RFID) systems generated up to 1,000 times more data compared to the conventional bar code systems.
3. 340 million tweets are sent every day and that amounts of 7TB of data.
4. Social networking site, Facebook, processes over 10TB of data every day.
5. Over 5 billion people use cell phones to call, send SMS, email, browse Internet, and interact via social networking sites.
6. The Square Kilometre Array project of NASA receives 700 TB of data per second.
30
Challenges with Big Data
1. Scaling is costly.2. Strategy must be in place before you hit the limit of a single
computer. 3. Most entreprises responded to scalability needs when they started
facing problems of poor response and low throughput.4. Adding hardware to existing system is manpower extensive and
hence error prone.5. Mixed data type - structured and unstructured - makes scaling
even harder.
31
Exploring Big Data for business insights
32
33
Big Data solutions with Hadoop
34
Organizations Adopted Big Data
35
How are Organizations using Big Data Technology?
36
37
Feb 14th 2011 –Watson is IBM’s super computer built using Big Data Technology.Its not online & its process like a human brain.
38
39
Tools typically used in Big Data Scenarios
40
Technology to process Big Data- Hadoop (Open-source software framework written in Java)
• Open-source software: It's free to download, though more and more commercial versions of Hadoop are becoming available.
• Framework: It means that everything you need to develop and run software applications is provided –programs, connections, etc.
• Distributed storage: The Hadoop framework breaks big data into blocks, which are stored on clusters of commodity hardware.
• Processing power: Hadoop concurrently processes large amounts of data using multiple low-cost computers for fast results.
• Hadoop an DFS and not Database. Its designed for information from many forms.
• Open source project started by Doug Cutting- employee of Yahoo. Hadoop is the name of his sons toy elephant.
• Apache software foundation- Apache Hadoop.
41
Hadoop Creation History
42
Hadoop ArchitectureHadoop core has two major components (daemons):1. HDFS
a. NameNodeb. Secondary NameNodec. DataNode
2. MapReduce Engine (distributed data processing framework)a. JobTrackerb. TaskTracker
46
What components make up Hadoop?
• Hadoop Common – the libraries and utilities used by other Hadoop modules.
• Hadoop Distributed File System (HDFS) – the Java-based scalable system that stores data across multiple machines without prior organization.
• MapReduce – a software programming model for processing large sets of data in parallel.
• YARN – resource management framework for scheduling and handling resource requests from distributed applications. (YARN is an acronym for Yet Another Resource Negotiator.)
45
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Slaves
Master
Task Tracker
Data Node
Job Tracker
Name Node
MapReduce
HDFS
Hadoop Architecture
47
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Slaves
Master
Task Tracker
Data Node
Job Tracker
Name Node
48
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Slaves
Master
Task Tracker
Data Node
Job Tracker
Name Node
49
Node
RACK
RACK
RACK
RACK
Cluster
Data Center
50
51
MapReduce Example
52
Benefits of Hadoop• Scalable– New nodes can be added without needing to change
data formats.• Cost effective– Hadoop brings massively parallel computing to
commodity hardwares.• Flexible– Hadoop is schema-less, and can absorb any type of data,
structured or not, from any number of sources.• Fault tolerant– When you lose a node, the system redirects work to
another location of the data and continues processing without missing a heartbeat.
• Programming languages- Java(default)/python.• Last but not least – it’s free! ( Open source).
43
Hadoop is not Suitable for All Kinds of Applications
Hadoop is not suitable to:
• perform real-time, stream-based processing where data is processed immediately upon its arrival.
• perform online access where low latency is required.
44
Hadoop Eco-System
53
Real-Time Hadoop Use Cases
1. Risk Modeling (How can banks understand customers & markets ?)
2. Customer churn analysis (why do companies really lose customers?)
3. Ad Targeting (How can companies increase campaign efficiency?)
4. Point of sale transaction analysis (How do retailers target promotion guaranteed to make you buy?)
5. Search quality (What’s in your search?) Hyperlink54
55
56
Hadoop Job Opportunities
57
58
Apache Hadoop & SAP HANA Integration(Future Generation Technologies)
59
In Real-Time Business
60
Resources
61
Summaryo Cloud Computingo Big Datao Apache Hadoop o Hadoop and SAP HANA integration
62
Than
k You
More Details
Nagarjuna D [email protected][email protected]
More Cloud Solutions Architect Skills:• Amazon Cloud (Amazon Web Services)
• MongoDB (NoSQL Database)
• Play Framework (Web Application Framework)
• Domain/ SSL Certificate setup
• Apache Hadoop, Apache Pig, Apache hive
Your Valuable Feedback Please
• Compulsory to where I must improve………..!