52
© 2014 IBM Corporation 1 Getting started with Hadoop on the Cloud Nicolas Morales – Solutions Engineer – [email protected] @NicolasJMorales October 11, 2014

Getting started with Hadoop on the Cloud with Bluemix

Embed Size (px)

DESCRIPTION

Silicon Valley Code Camp -- October 11, 2014. Session: Getting started with Hadoop on the Cloud. Hadoop and Cloud is an almost perfect marriage. Hadoop is a distributed computing framework that leverages a cluster built on commodity hardware. The Cloud simplifies provisioning of machines and software. Getting started with Hadoop on the Cloud makes it simple to provision your environment quickly and actually get started using Hadoop. IBM Bluemix has democratized Hadoop for the masses! This session will provide a brief introduction to what Hadoop is, how does cloud work and will then focus on how to get started via a series of demos. We will conclude with a discussion around the tutorials and public datasets - all of the tools needed to get you started quickly. Learn more about BigInsights for Hadoop: https://developer.ibm.com/hadoop/

Citation preview

Page 1: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation1

Getting started with Hadoop on the Cloud

Nicolas Morales – Solutions Engineer – [email protected]@NicolasJMorales

October 11, 2014

Page 2: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation2

Welcome

Goal: Get you started with Hadoop on the Cloud

� Hadoop

− What technical problem is it helping solve? � BIG DATA

− What is Hadoop?

− BigInsights (IBM’s Hadoop distro)

� Bluemix (IBM’s PaaS cloud solution)

− What technical problem is it helping solve?

− Analytics for Hadoop in the Cloud

� Demo & Get hands-on

− Bluemix: bluemix.net

− Hadoop Dev: ibm.biz/hadoopdev

Page 3: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation3

It starts with a line of code.

Page 4: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation4 4

Page 5: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation5 5

Source:

Wikibon

2/12/2014

(Link)

Page 6: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation6 6

Source: The Forrester Wave: Big Data Hadoop

Solutions, Q1 2014

2/27/2014 (Link)

Page 7: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation7

What is Big Data?

A way to describe data problems that are unsolvable using traditional tools

More Analytics on More Data for More People

Page 8: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation8

What Data?

Transactional & Application Data

Machine Data Social Data Enterprise Content

© 2013 IBM Corporation

More Analytics on More Data for More People

Page 9: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation9

9

Page 10: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation10 In 2

00

5 t

he

re w

ere

1.3

billio

n R

FID

tag

s in

cir

cu

lati

on

aro

un

d t

he

wo

rld

……

10 ……

by t

he

en

d o

f 20

11

, th

is w

as

ab

ou

t 3

0

billio

n a

nd

gro

win

g e

ven

fa

ste

r.

Page 11: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation11

An increasingly sensor-enabled and instrumented business environment generates HUGE volumes of

data with MACHINE SPEED characteristics…

1 BILLION lines of codeEACH engine generating 10 TB every 30 minutes!

Page 12: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation12

Welcome to the Instrumented Interconnected World!

12

12+ TBsof tweet data

every day

25+ TBs oflog data

every day

? TBs of

data

every

day

2+

billion

people on the

Web by end

2011

30 billion

RFID tags today

(1.3B in 2005)

4.6

billion

camera phones world wide

100s of

millions

of GPS

enabled

devices sold

annually

76 million smart

meters in 2009…200M by 2014

Page 13: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation13

6,000,000 users on Twitter

pushing out 300,000 tweets per day

500,000,000 users on Twitter

pushing out 400,000,000tweets per day

83x

1333x13

Page 14: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation14

Volume

Variety Veracity

We’ve Moved into a New Era of Computing

Velocity

14

decision makers trust their information.

Only 1 in 3of different types of data.

100’s

of Tweets create daily.

12+terabytestrade eventsper second.

5+million

Page 15: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation15

Imagine the Possibilities of Harnessing Your Data Resources

Retailer reduces time to run queries by 80% to

optimize inventory

Stock Exchange cuts queries from 26 hours to

2 minutes on 2 PB

Government cuts acoustic analysis from hours to

70 Milliseconds

Utility avoids power failures by analyzing

10 PB of data in minutes

Telco analyses streaming network data to reduce hardware costs by 90%

Hospital analyses streaming vitals to detect illness

24 hours earlier

Big data challenges exist in every organization today

Page 16: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation16

Insurance

• 360˚ View of Domain or Subject

• Catastrophe Modeling

• Fraud & Abuse

• Producer Performance Analytics

• Analytics Sandbox

Banking

• Optimizing Offers and Cross-sell

• Customer Service and Call Center Efficiency

• Fraud Detection & Investigation

• Credit & Counterparty Risk

Every Industry can Leverage Big Data and Analytics

Telco

• Pro-active Call Center

• Network Analytics

• Location Based Services

Energy & Utilities

• Smart Meter Analytics

• Distribution Load Forecasting/Scheduling

• Condition Based Maintenance

• Create & Target Customer Offerings

Media & Entertainment

• Business process transformation

• Audience & Marketing Optimization

• Multi-Channel Enablement

• Digital commerce optimization

Retail

• Actionable Customer Insight

• Merchandise Optimization

• Dynamic Pricing

Travel & Transport

• Customer Analytics & Loyalty Marketing

• Predictive Maintenance Analytics

• Capacity & Pricing Optimization

Consumer Products

• Shelf Availability

• Promotional Spend Optimization

• Merchandising Compliance

• Promotion Exceptions & Alerts

Government

• Civilian Services

• Defense & Intelligence

• Tax & Treasury Services

Healthcare

• Measure & Act on Population Health Outcomes

• Engage Consumers in their Healthcare

Automotive

• Advanced Condition Monitoring

• Data Warehouse Optimization

• Actionable Customer Intelligence

Life Sciences

• Increase visibility into drug safety and effectiveness

Chemical & Petroleum

• Operational Surveillance, Analysis & Optimization

• Data Warehouse Consolidation, Integration & Augmentation

• Big Data Exploration for Interdisciplinary Collaboration

Aerospace & Defense

• Uniform Information Access Platform

• Data Warehouse Optimization

• Airliner Certification Platform

• Advanced Condition Monitoring (ACM)

Electronics

• Customer/ Channel Analytics

• Advanced Condition Monitoring

© 2013 IBM Corporation

Page 17: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation17

Enabling everybody to leverage Big Data

GPS

External Data

Business Users...offer personalized price

promotions to different customer segments in real-time

Business Development... find and deliver new mechanisms to monetize network traffic and partner with upstream content providers

Administrators...secure, manage, and optimize data access and analysis operations

Executive Leaders...get real-time reports and analysis based on data inside as well as outside the enterprise (web, social media etc.)

Business Analysts... analyze social media buzz for the new services/offerings to gauge initial success and any course correction needed

Developers... develop new Apps and detailed algorithms in response to user and business requirements

Data Scientists... analyze subscriber usage pattern in real-time and combine that with the profile for delivering promotional or retention offers

Page 18: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation18

Leveraging Big Data Requires Multiple Platform Capabilities

Manage & store huge volume of any data

Hadoop File System

MapReduce

Manage streaming data Stream Computing

Analyze unstructured data Text Analytics Engine

Data WarehousingStructure and control data

Integrate and govern all data sources

Integration, Data Quality, Security, Lifecycle Management, MDM

Understand and navigate federated big data sources

Federated Discovery and Navigation

Page 19: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation19

What is Hadoop?

� Apache open source software framework for reliable, scalable, distributed

computing of massive amount of data

� Hides underlying system details and complexities from user

� Developed in Java

� Core sub projects:

− MapReduce

− Hadoop Distributed File System a.k.a. HDFS

� Supported by several Hadoop-related projects

� HBase

� Zookeeper

� Avro

� Flume

� etc

� Meant for heterogeneous commodity hardware

Page 20: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation20

� New way of storing and processing the data:− Let system handle most of the issues automatically:

• Failures• Scalability• Reduce communications • Distribute data and processing power to where the data is• Make parallelism part of operating system• Relatively inexpensive hardware

� Bring processing to Data!

� Hadoop = HDFS + MapReduce infrastructure + …

� Optimized to handle− Massive amounts of data through parallelism

− A variety of data (structured, unstructured, semi-structured)

− Using inexpensive commodity hardware

� Reliability provided through replication

Design Principles of Hadoop

Page 21: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation21

Map-Reduce →→→→ Hadoop →→→→ BigInsights

Page 22: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation22

Hadoop Open Source Projects

� Hadoop is supplemented by an ecosystem of open source projects

Page 23: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation23

What’s a Hadoop Distribution?

� What’s a Linux Distribution?

− Linux Kernel

− Open Source Tools around Kernel

− Installer

− Administration UI

� Open Source Distribution Formula

− Kernel

− Core Projects around Kernel

− Value Add

• Test Components

• Installer

• Administration UI

• Apps

Page 24: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation24

� Scalable

− New nodes can be added on the fly

� Affordable

− Massively parallel computing on commodity servers

� Flexible

− Hadoop is schema-less, and can absorb any type of data

� Fault Tolerant

− Through MapReduce software framework

� Performance & reliability

− Adaptive MapReduce, Compression, Indexing, Flexible Scheduler, +++

� Enterprise Hardening of Hadoop

� Productivity Accelerators

− Web-based UI’s and tools

− End-user visualization

− Analytic Accelerators

− +++

� Enterprise Integration

− To extend & enrich your information supply chain

IBM Enriches Hadoop

24

Page 25: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation25

IBM BigInsights – Open Source and IBM Value Adds

Real-time Analytics InfoSphere Streams

Enterprise Performance Adaptive Map Reduce & Big SQL

Storage IntegrationGPFS POSIX Distributed Filesystem

Data Governance and SecurityData Click, LDAP and Secured Cluster

SearchBigIndex and Data Explorer

Data ExplorationBigSheets “schema-on-read” tooling

MapReduceHDFS HBase Flume

Pig

Lucene

Jaql ZooKeeperOozie Hive

Sqoop

HCatalog

100% based on Apache Open Source Hadoop Components

Predictive ModelingBigR scalable data mining” on R

Text AnalyticsText processing with AQL

ANSI SQLBigSQL Optimized SQL support

Application Tooling Toolkits and accelerators

Page 26: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation26

Manage your cluster from the integrated Web Console

� Start or stop services

� Monitor overall system health

� Inspect status of specific services

� Add / remove nodes

� Manage your Apps and workflows from the console

� Drill down into Map/Reduce, Tasks, Attempts

� Access status, logs, counters of individual flows / jobs

Page 27: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation27

Manage your HDFS Files� Navigate the distributed file system to see what’s stored

� Create/remove/rename directories

� Modify permissions

� Upload / download files, remove/rename files, Edit files

� Execute Hadoop file system shell commands

Page 28: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation28

Monitoring cluster, components and applications

� Cluster: system load average, CPU/Disk/Memory/Network utilization, nodes live status

� HDFS: block and file info, NameNode JVM and GC info, throughput bytes written/read

� Mapreduce: Jobs status, Mapper, Reducer, JobTracker

� HBase: region split info, #of queries/stored files/regions etc

� Hive: metadata store (call frequency and duration)

� Oozie statistics

� Zookeeper: queries, latency, watcher count, followers etc

� Flume: source and sink, #of retries and bytes written etc

EXT E N S I B L E !!

Build your own Monitoring Dashboards, with the key KPI that are of your interest!

Page 29: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation29

Text Analytics: Getting measurable insights

� Most of the world’s data is in unstructured or semi-structured text.

� Social media is full with discussions about products and services

� Company Internal Information is locked in blobs, description fields, and sometimes even discarded

� How do you get a metrics based understanding of facts from unstructured text?

Healthcare Analytics: E-Medical records, hospital

reportsPublic Sectors Case files, police records, emergency calls…

Automotive Quality Insight: Tech notes, call logs,

online media

Insurance Fraud: Insurance claims

Social Media for Marketing: twitter, facebook, blogs,

forums

Over 80% of stored information is unstructured*

Structural analysis

Mining and visualization

Page 30: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation30

Big R

R Clients

Data Sources

Embedded R Execution

R Packages

R Packages

1

2

1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm

2. Scale out R• Partitioning of large data

(“divide”)

• Parallel cluster execution of pushed down R code (“conquer”)

• All of this from within the R environment (Jaql, Map/Reduce are hidden from you

• Almost any R package can run in this environment

“End-to-end integration of R into IBM BigInsights”

Pull data (summaries) to

R client

Or, push R functions

right on the data

Page 31: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation31

BigSheets - Spreadsheet-style Analytic Tool

No programming knowledge needed!

How it works� Model “big data” collected from various

sources as collections� Filter and enrich content with built-in

functions� Combine data in different collections � Visualize results through spreadsheets,

charts� Export data into common formats (if

desired)

Page 32: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation32

Overview of Application Development Lifecycle

Package and publish your application using the BigInsights Eclipse Task Launcher

How it works

� Sample your Data

� Develop your application using BigInsights tools

� Test your application

� Package and publish your application

� Deploy your application on the cluster

Task Wizards for the ease of use

to Develop Applications

Editors for: Java, Java MapReduce, Hive, Jaql, Pig, Big SQL, BigSheets Reader, BigSheets Macro, AQL module, Jaql Module, etc …

Page 33: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation33

Running Applications in Big Data

How it works

Build in Apps make it easy to run Big Data applications & tasks:

� Import and Export Data from a Database or files

� Import and Export Web and Social Data

� Perform Tex Analytics on specified content

� Query HBase Content

� Query content stored in BigInsights using Big SQL.

� Execute Pig or JAQL applications.

E XT E N S I B L E !! Build your own applications and make them easy to execute from an appealing Application launcher

Page 34: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation34

Big SQL

SQL-basedApplication

Big SQL Engine

Data Sources

IBM data server client

SQL MPP Run-time

CSVCSV

SeqSeq

ParquetParquet

RCRC

ORCORC

AvroAvro

CustomCustom

JSONJSON

34

� IBM’s SQL engine for Hadoop

� Comprehensive, standard SQL – SELECT: joins, unions, aggregates, subqueries . . . – GRANT/REVOKE, INSERT … INTO– PL/SQL– Stored procs, user-defined functions – IBM data server JDBC and ODBC drivers

� Optimization and performance – Java MapReduce layer replaced with high performance

IBM MPP engine (C++) – Continuous running daemons (no start up latency) – Message passing allow data to flow between nodes

without persisting intermediate results – In-memory operations with ability to spill to disk (useful

for aggregrations, sorts that exceed available RAM) – Cost-based query optimization with 140+ rewrite rules

� Various storage formats supported– Data persisted in DFS, Hive – No IBM proprietary format required

� Integration with RDBMSs via LOAD, query

federation

BigInsights

Page 35: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation35

3

5

Big Data Accelerators Make it Easier than Ever to Build Big DataApplications

Telecommunications Event DataCDR streaming analyticsDeep Customer Event Analytics

Ships with InfoSphere

Streams

Social Data AnalyticsSentiment Analytics, Intent to purchase

Ships with InfoSphere

BigInsights & Streams

Machine Data AnalyticsOperational data including logs for operations efficiency

Ships with InfoSphere

BigInsights

Page 36: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation36

Maybe our politicians should take a playbook out of the rivalry between duke/unc and take it to the courts http://ity.com/wfUsir

Maybe our politicians should take a playbook out of the rivalry between duke/unc and take it to the courts http://ity.com/wfUsir

I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Raleigh) w/ 2 others http://4sq.com/gbsaYR

I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Raleigh) w/ 2 others http://4sq.com/gbsaYR

@silliesylvia good!!! U shouldnt! Think about the important stuff, like ur 43rd

birthday ;) btw happy birthday Sylvia ;)

@silliesylvia good!!! U shouldnt! Think about the important stuff, like ur 43rd

birthday ;) btw happy birthday Sylvia ;)

Location

Intent to consume

@silliesylvia I <3 your leather leggings!! Its so katniss!!

@silliesylvia I <3 your leather leggings!! Its so katniss!!

Age

Personal Attributes

• Sylvia Campbell, Female, In a Relationship

• 32 years old, birthday on 7/17• Lives near Raleigh, NC• College graduate; Income of 80-120k

Buzz/Sentiment

• Retweets BF’s comments• Interest in BBC shows: Downton Abbey,

Sherlock, Fringe, (P&P?)• Sherlock Holmes, Robert Downey, Jr.• Hunger Games, Katniss/J. Lawrence

Interests/Behavior

• Watch movies, tv shows• Romance plots, “hero types”, strong

women• Uses iPad 3, Redbox, Hulu• Shopping , interest in sales/deals• Duke/ UNC basketball

@silliesylvia $10 dollars says matthew & mary get married next season :) #downtownabbey

@silliesylvia $10 dollars says matthew & mary get married next season :) #downtownabbey

Behavior

Interest

@bamagirl can’t wait to watch sherlock with you! Oh, robert downey jr, I still love you but bbc is so amazing

@bamagirl can’t wait to watch sherlock with you! Oh, robert downey jr, I still love you but bbc is so amazing

OMG OMG. just dropped my new ipad3 crappola!!!

OMG OMG. just dropped my new ipad3 crappola!!!

Interest

Consumption

Prediction

dear redbox please have kings speech for my new tvcolin firth movie marathon

dear redbox please have kings speech for my new tvcolin firth movie marathon

360 degree profile

Intent to consume

Consumption

Social Data AnalyticsUsing social media as a rich source of information

Page 37: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation37

© 2013 IBM Corporation37

Machine Data Analysis is a Business Imperative� Cost of system down-time

− 49 percent of Fortune 500 companies experience more than 80 hours of system down time annually1

• Cost of down-time varies from $90,000/hour in the media sector to $6.48 million / hour for large online brokerages

• 80 hours * $6.48M = approx $500M per year

− System downtown costs North American businesses $26.5 billion a year in lost revenue2

� When systems go down

− Sales and other processes stop

− Work in progress may be destroyed

− Failure to meet SLA’s and contractual obligations can result in damages, fees, adverse publicity and damage to reputation

− Customers are lost to competitors, some permanently

− Productivity suffers and remediation costs additional $$$’s

Page 38: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation38

Page 39: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation39

Evolution of Cloud Technologies

Virtualization Dynamic Hybrid

“I want to get more out

of my existing

hardware”

“I want to strategically

use public and private

cloud together”.

Cloud Native

“I want to rapidly build new,

born on the cloud, engaging

applications in a continuous

delivery model”

Business Services (SaaS)

“I want to use an app

without having to own it”

Cloud Enabled

“I want to move my

existing middleware

workloads to the cloud”

Page 40: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation40

Networking Networking Networking

Storage Storage Storage

Servers Servers Servers

Virtualization Virtualization Virtualization

O/S O/S O/S

Middleware Middleware Middleware

Runtime Runtime Runtime

Data Data Data

Applications Applications Applications

Infrastructureas a Service

Platformas a Service

Softwareas a Service

Vendo

r Ma

na

ge

s in

Clo

ud

Vendo

r Ma

na

ge

s in

Clo

ud

Vendo

r Ma

na

ge

s in

Clo

ud

Clie

nt

Mana

ge

s

Clie

nt M

ana

ge

s

Customization; higher costs; slower time to valueCustomization; higher costs; slower time to value

Standardization; lower costs; faster time to

value

Standardization; lower costs; faster time to

value

IT Admin

Developer Business Person

PaaS sits at the center of the cloud delivery model

Page 41: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation41

• Move quickly, see results fast.

• Learn by tinkering and

playing.

• Needs to learn new skills

through playing and

experimenting safely.

• Needs freedom to experiment

without worrying about

pricing right away.

Developers, Developers, Developers!

Page 42: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation4242

Bluemix is an open-standard, cloud-based platform for building, managing,

and running applications of all types (web, mobile, big data, new smart

devices, and so on).

Go Live in Seconds

The developer can choose any language runtime or bring their own. Zero to production in one command.

DevOps

Development, monitoring, deployment, and logging tools allow the developer to run the entire application.

APIs and Services

A catalog of IBM, third party, and open source API services allow the developer to stitch an application together in minutes.

On-Prem Integration

Build hybrid environments. Connect to on-premise assets plus other public and private clouds.

Flexible Pricing

Sign up in minutes. Pay as you go and subscription models offer choice and flexibility.

Layered Security

IBM secures the platform and infrastructure and provides you with the tools to secure your apps.

What is Bluemix?

Page 43: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation43

Create apps quickly with prebuilt services

43

• Runtimes, services, and tooling up to you

Choice

Industry Leading IBM Capabilities

• Services leveraging the depth of IBM software

• Full range of capabilities

Completeness

• Open source platform and services

• Third party to enable key use cases

Security

Services

Web and

application

services

Cloud

Integration

Services

Mobile

Services

Database

services

Big Data

services

Internet

of Things

Services

Watson

Services

DevOps

Services

A full range of capabilities to suit any great idea.

Page 44: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation44

Embracing Cloud Foundry as an Open Source PaaS

44 ©2014 IBM Corporation

Continuing our history of embracing and extending Open Source

Page 45: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation45

Meets Developer’s

NeedsFocus on app

development, not provisioning VMs,

databases, messaging servers, etc.

Agile development model

Deploy and scale in seconds

Open Cloud PlatformThere is an increasing

appetite for cloud-based

mobile, social and analytics applications

from line-of-business executives - drives the need

for a more open cloud development platform

Compelling Community Cloud Foundry has a

compelling community and emerging ecosystem as well

as a mature set of capabilities and robustness

Cloud Foundry is more than code

Page 46: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation46

Capabilities include Java, mobile backend development, application monitoring, as well as capabilities from ecosystem partners and open source — all through an as-a-service model in the cloud.

IBM extends CF by adding developer tools, runtimes, & services

Page 47: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation47

Infrastructure Services

Virtual Appliance

Metadata

Application

Server

Operatingsystem

Virtual Appliance

Metadata

Application

Server

Operatingsystem

Virtual Appliance

Metadata

HTTP

Server

Operatingsystem

Defined Pattern Services

Systems of Record

Business Services

An Entire Continuum Working Together

Analytics

Composable Services

Page 48: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation48

IBM Analytics for Hadoop Service

� Powered by

− BigInsights 3.0 & Bluemix

� Get started with Hadoop in Minutes− Tutorial: https://developer.ibm.com/hadoop/docs/tutorials/

� Dedicated Single Node Env

• BIAdmin Authority

• Access to the Web console

• Secure HTTPS channel powered by SSL certificates

• Bluemix Single Sign On (SSO)

Page 49: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation49

Register today at bluemix.net

With on-demand services and infrastructure, developers can go from 0 to running code in a matter of minutes.

When coupled with DevOps, teams both large and small can automate the development and delivery of many applications.

By connecting securely to on-preminfrastructure, organizations can extend their existing investments.

1. Rapidly bring products and services to market at lower cost

2. Continuously deliver new functionality to their applications

3. Extend existing investments in IT infrastructure

Page 50: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation50

Want to learn more?

� Download Quick Start Edition � Test drive the technologies

– Follow online tutorials– Enroll in online classes – Watch video demos, read articles, etc.

� Links all available from HadoopDev– https://developer.ibm.com/hadoop/

Page 51: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation51

BigInsights Quick Start Edition

� Download: http://ibm.co/QuickStart

Page 52: Getting started with Hadoop on the Cloud with Bluemix

© 2014 IBM Corporation52

� FREE

� All types of practitioners

� All skill levels

� Hands-on Labs

� Future Meetups:

− Hadoop

− Text Analytics

− Real-time Analytics

− SQL for Hadoop

− HBase

− Social Media Analytics

− Machine Data Analytics

− Security and Privacy

Big Data Developers

http://www.meetup.com/BigDataDevelopers/

http://bigdatadevelopers.meetup.com/