18
IBM Big Data & Analytics © 2013 IBM Corporation 1 © 2014 IBM Corporation Information Management BigInsights — Technical Overview OC Big Data Meetup Lynn Hedegard Technical Sales Specialist West Region 15 th of October, 2014 © 2014 IBM Corporation 3 Real-Time CRM in the Social World (Meet Lisa) Telco Customer Profile Retailer Customer Profile Lisa registers with Retailer. Gives Retailer & Telco permissions to “Opt In” Lisa uses promo code to purchase product from offer AND a few more items that go with the outfit Lisa “follows” a friend’s post on FB and clicks the “Like” button on an Item she likes Retailer Fan Page Intelligent Advisor Platform Product Catalog The “Intelligent Advisor” platform processes Lisa’s recent on-line activity and constructs a targeted offer based on recent behavior AND internal marketing strategy While walking past the store, Lisa receives a promo code for a product we think she might like Lisa receives a message with an offer reminding her to stop by if she’s in the area

SD Big Data Monthly Meetup #4 - Session 1 - IBM

Embed Size (px)

DESCRIPTION

Overview of IBM BigInsights, BigSQL, BigSheets, BigR

Citation preview

Page 1: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 1

© 2014 IBM Corporation

Information Management

BigInsights — Technical OverviewOC Big Data Meetup

Lynn Hedegard

Technical Sales Specialist

West Region

15th of October, 2014

© 2014 IBM Corporation 3

Real-Time CRM in the Social World (Meet Lisa)

Telco Customer ProfileRetailer Customer Profile

Lisa registers with

Retailer. Gives

Retailer & Telco

permissions to

“Opt In”

Lisa uses promo code to purchase product from offer AND a few more items that go with the outfit ☺

Lisa “follows” a friend’s post on FB and clicks the “Like” button on an Item she likes

Retailer Fan Page

Intelligent Advisor Platform

Product Catalog

The “Intelligent Advisor” platform processes Lisa’s recent on-line activity and constructs a targeted offer based on recent behavior AND internal marketing strategy

While walking past the store, Lisa receives a promo code for a product we think she might like

Lisa receives a message with an offer reminding her to stop by if she’s in the area

Page 2: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 2

© 2014 IBM Corporation 4

Problem Statement — Complex Environment

• The Local Environment is Complex:

• A single large retail store (1.5 million SKUs)

• Large manufacturing floor (~6 million parts)

• Vegas Casino (20 million card carrying customers)

• The Global Environment is Complex:

• The number of variables affecting business performance is huge.

• US citizens (source: google population)

• 300+ Million total

• (21M+ teenagers) + (40M+ in their 20’s) (that’s a lot of calls & text messages!)

• The interrelationships between these variables is very complex (e.g., N2 problem)

• Multiple customer touch points

• Multiple suppliers & distribution methods

• Market forces (cost of raw goods & services, pricing dynamics, supply/demand)

• Working Premise: Few people in the enterprise can make “good”

Operational Decisions — consistently & quickly

• Few people can “see” all the necessary data.

• Few people can “analyze” all the necessary data.

• Few people understand all the inter-relationships

between business variables.

Businesses can no

longer tolerate

inconsistent Business

Processes

© 2014 IBM Corporation 5

IBM’s Big Data Reference Architecture — High Level

BI and

Reporting

Exploration

Visualization

Functional

App

Industry

App

Predictive

Analytics

Content

Analytics

Analytic Applications

IBM Big Data Platform

Systems

Management

Application

Development

Visualization

& Discovery

Accelerators

Information Integration & Governance

Hadoop

System

Stream

Computing

Data

Warehouse

An Enterprise Eco-System for Big Data

• Integration of all classes of Data Repositories (e.g. DW, Hadoop, & Streaming Data)

• Management

• Enterprise Class Security & Data Governance

• Workload Optimization

• Workload Scheduling

• Dynamic Reconfiguration

• Advanced Analytics

• Complete set of reusable analysis components (i,e., Accelerators)

• Apply analysis to data in its native form (i.e. in the repository)

• Data Exploration of data from myriad repositories using a common interface

• Powerful Visualization Tools

• Eclipse based Development Environments

Big Data Reference Architecture

Page 3: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 3

© 2014 IBM Corporation 6

Application Accelerators Improve Time to Value

Finance AnalyticsStreaming options trading

Insurance and banking DW models

TelecommunicationsCDR streaming analytics

Deep Customer Event Analytics

Social Data AnalyticsSentiment Analytics, Intent to purchase

Machine Data AnalyticsOperational data including logs

for operations efficiency

Text AnalyticsNatural Language Processing

Multi-Language Support

Domain Specific

© 2014 IBM Corporation 7

Analytical Sources

Enhanced Applications

Actionable Insight

Decision Management

Modeling & Predictive Analytics

Discovery & Exploration

Analysis & Reporting

Planning & Forecasting

Content AnalyticsShared Operational Information

Master & Reference

ContentHub

Activity Hub

Metadata Catalog

Customer Experience

Financial Performance

New Business

Model

Risk

Operations& Fraud

IT Economics

Integrated Data

Warehouse

Enterprise Warehouse

Landing Exploration &

Archive

Big DataRepository

Deep Analytics & ModelingAnalytical Appliances

Interactive Analysis & Reporting

Data Marts

Data Integration

Data Quality, Xfrm & Load

Data Sources

TraditionalData Sources

Third-PartyData

Transactional Data

Application Data

NewData Sources

Machine &Sensor Data

Image & Video

EnterpriseContent Data

Social Data

InternetData

Da

ta A

cqu

isit

ion

& A

pp

lica

tio

n A

cce

ss

Streaming Computing

Real-Time Analytical Processing

Security & Business Continuity Management

Event Detection and Action

Platforms

Governance

IBM’s Big Data / Analytics Reference Architecture

Page 4: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 4

© 2014 IBM Corporation 8

Merging the Traditional and Big Data Approaches

IT Group

Structures the data to answer that question

IT Group

Delivers a platform to enable creative discovery

Business Users & Data Scientists

Explore what questions could be asked

Business Users

Determine what question to ask

Monthly sales reports

Profitability analysis

Customer surveys

Brand sentiment

Product strategy

Maximum asset utilization

Big Data ApproachIterative & Exploratory Analysis

Traditional ApproachStructured & Repeatable Analysis

© 2014 IBM Corporation 9

BigInsights

BigInsights

Page 5: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 5

© 2014 IBM Corporation 10

BigInsights: Value Beyond Open Source

OpenSource

Components

Key differentiators

• Built-in text analytics

• Enterprise software integration

• SQL support

• Spreadsheet-style analysis

• Integrated installation of supported open

source and other components

• Web Console for admin and application access

• Platform enrichment: additional security,

performance features, GPFS (alternative file

system), . . .

• World-class support

• Full open source compatibility

Business benefits

• Quicker time-to-value due to IBM technology

and support

• Reduced operational risk

• Enhanced business knowledge with flexible

analytical platform

• Leverages and complements existing software

Visualization & Exploration

Development Tool

Advanced Engines

Connectors

Workload Optimization

Administration & Security

IBM-certifiedApache Hadoop

and related projects

IBM’sValueAdd

© 2014 IBM Corporation 11

BigSheets

• Model “big data” collected from

various sources in spreadsheet-

like structures

• Filter and enrich content with

built-in functions

• Combine data in different

workbooks

• Visualize results through

spreadsheets, charts

• Export data into common formats

(if desired)

No programming knowledge needed!

Page 6: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 6

© 2014 IBM Corporation 12

Social Data Analytics Accelerator

What does it do?

� Provides the ability to analyze large volumes of various types of social media data with real-time processing

Social Data Analytics

Example Application : Movie Campaign Effectiveness• Large Movie Studio wants to understand reaction of movie commercials around events (e.g., SuperBowl)

• Over 30 Million social media consumer profiles built and used in the analysis

• Real-time summary of insights correlated with the airing of the commercial

Why should you care?

� It enables clients to easily obtain insights necessary for:

– Effective/targeted Marketing Campaigns

– Timely product/marketing decisions

– Gaining competitive Intelligence

– Building customer retention and new customer acquisition programs

© 2014 IBM Corporation 13

Big SQL

• Standard SQL syntax and data types

• Joins, unions, aggregates . . .

• VARCHAR, decimal, TIMESTAMP, . . .

• JDBC/ODBC drivers

• Prepared statements

• Cancel support

• Database metadata API support

• Secure socket connections (SSL)

• Optimization

• MapReduce parallelism

or…

• “Local” access for low-latency queries

• Varied storage mechanisms appropriate

for Hadoop ecosystem

• Integration

• Eclipse tools

• DB2, Netezza, Teradata (via LOAD)

• Cognos Business Intelligence

. . .

Page 7: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 7

© 2014 IBM Corporation 14

Big R

R Clients

Scalable Statistics Engine

Data Sources

Embedded R Execution

R Packages

R Packages

1

2

3

1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm

2. Scale out R

• Partitioning of large data (“divide”)

• Parallel cluster execution of pushed down R code (“conquer”)

• All of this from within the R environment (Jaql, Map/Reduce are hidden from you

• Almost any R package can run in this environment

3. Scalable machine learning

• A scalable statistics engine that provides canned algorithms, and an ability to author new ones, all via R

“End-to-end integration of R into IBM BigInsights”

Pull data

(summaries) to

R client

Or, push R

functions

right on the

data

© 2014 IBM Corporation 15

Text Analytics Toolkit

• Mature System: “System T” text analytics engine embedded in IBM products

• Found in Lotus Notes, IBM e-discovery Analyzer, CCI, InfoSphere Warehouse,+++

• Almost a decade since initial release

• Extensible: User can customize Text Analytics Engine

• Toolkit: BigInsights Text Analytic Toolkit provides

• Developer tools

• Easy to use text analytics language

• Set of extractors for fast adoption

• Multilingual support, including support for DBCS languages

• AQL: BigInsights includes Annotator Query Language (AQL): SQL-like!

• Fully declarative text analytics language

• No “black boxes” or modules that can’t be customized.

• Tooling for easy customization because you are abstracted from the programmatic

details

• Competing solutions make use of locked up black-box modules that cannot be

customized, which restricts flexibility and are difficult to optimize for performance

Page 8: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 8

© 2014 IBM Corporation 16

BigInsights Enterprise Edition

Cognos BICognos BI

DataStageDataStage

GuardiumGuardium

DataExplorerDataExplorer

FlumeFlume

RR

StreamsStreams

NetezzaNetezza

DB2DB2

SqoopSqoop

JDBCJDBC

HDFSHDFS

Map ReduceMap Reduce

HiveHive

PigPig

HCatalogHCatalogZoo KeeperZoo Keeper

HbaseHbase

JaqlJaql

OozieOozie

Big SQLBig SQL

GPFS-FPOGPFS-FPO

LuceneLucene

FlexibleSchedulerFlexible

Scheduler

IndexingIndexing

EnhancedSecurity

EnhancedSecurity

AdaptiveMap Reduce

AdaptiveMap Reduce

TextCompression

TextCompression

Integrated Installer

Integrated Installer

MachineLearningMachineLearning

DB ImportDB Import

DB ExportDB Export

DistributedFile Copy

DistributedFile Copy

BoardReaderBoardReader

Web CrawlerWeb Crawler

Accelerator for Social Data

Analysis

Accelerator for Social Data

Analysis

Accelerator for Machine Data

Analysis

Accelerator for Machine Data

Analysis

Text Processing Engine & LibraryText Processing Engine & Library

BigSheetsBigSheets

Deep Analytics

Open SourceOpen Source

IBM Value AddIBM Value Add

Dashboards And Visualizations

Dashboards And Visualizations Data

Integration

System

Mgmt

Analytics

of Data in

Motion

Visualization and Discovery

Deploy Applications

Deploy Applications

MonitorWorkflowMonitor

Workflow

Dynamic Configuration

Dynamic Configuration

File Systems

Parallel Processing

Engines

IBM InfoSphere BigInsights

Infr

as

tru

ctu

re

© 2014 IBM Corporation 17

Web Console

Web

Console

Page 9: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 9

© 2014 IBM Corporation 18

Welcome Tab: Your Starting Point

Tasks: Where and how to begin performing common administrative or analytical tasks

Quick links to common functions

Learn more through external Web resources

© 2014 IBM Corporation 19

Overview of Web Console Capabilities

• Manage BigInsights

• Inspect /monitor system

health

• Add / drop nodes

• Start / stop services

• Launch / monitor jobs

• Explore / modify file system

• Create custom dashboards

• . . .

• Launch applications

• Spreadsheet-like analysis tool

• Pre-built applications (IBM

supplied or user developed)

• Publish applications

• Monitor cluster, applications,

data, etc.

Page 10: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 10

© 2014 IBM Corporation 20

BigInsights Applications Catalog (Web Console)

• Browse available applications

• Manage and deploy applications (administrators only)

• Execute (or schedule execution of ) a deployed application

• Monitor job (application) status

• Link or chain applications for sequential execution

© 2014 IBM Corporation 21

BigSheets

BigSheets

Page 11: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 11

© 2014 IBM Corporation 22

A Browser-Based Analytics Tool For Business Users.

Why BigSheets?

� Business users need an intuitive non-

technical approach for analyzing Big

Data.

� Translating untapped data into

actionable business insights is a

common requirement.

� Visualizing and drilling down into

enterprise and Web data promotes new

business intelligence.

How can BigSheets help?

� Spreadsheet-like interface enables

business users to gather and analyze

data easily.

� Built-in “readers” can work with data in

several common formats (JSON arrays,

CSV, TSV, Web crawler output, . . . )

� Users can combine and explore various

types of data to identify “hidden”

insights.

Why Did IBM Develop BigSheets?

© 2014 IBM Corporation 23

Accessing BigSheets

• Ensure BigInsights Enterprise is running

� Launch the Web console with URL http://<host>:<port> or

http://<host>:<port>/data/html/index.html

• Follow on-screen Task prompt or click on the BigSheets tab

Page 12: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 12

© 2014 IBM Corporation 24

BigSQL

BigSQL

© 2014 IBM Corporation 25

Big SQL

• Standard SQL syntax and data types

• Joins, unions, aggregates . . .

• VARCHAR, decimal, TIMESTAMP, . . .

• JDBC/ODBC drivers

• Prepared statements

• Cancel support

• Database metadata API support

• Secure socket connections (SSL)

• Optimization

• MapReduce parallelism

or…

• “Local” access for low-latency queries

• Varied storage mechanisms appropriate

for Hadoop ecosystem

• Integration

• Eclipse tools

• DB2, Netezza, Teradata (via LOAD)

• Cognos Business Intelligence

. . .

Page 13: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 13

© 2014 IBM Corporation 26

MS Excel: Big SQL integration via ODBC

© 2013 IBM Corporation26

© 2014 IBM Corporation 27

Demo

Demo

Page 14: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 14

© 2014 IBM Corporation 28

Analyst Comments Regarding BigInsights

Analysts

Comments

BigInsights

© 2014 IBM Corporation 29

The Forrester Wave™ - Hadoop Solutions Q1 2014

• Hadoop momentum is unstoppable

• It’s open source roots grow deeply and wildly into the enterprise. Its

refreshingly unique approach is transforming how companies process,

analyze and share big data

• Hadoop vendors face a cut-throat market

• The buying cycle is on the upswing, and Hadoop vendors know it.

Pure-play upstarts must capture market share quickly to make

investors happy; stalwart enterprise vendors need to avoid being

disintermediated; cloud vendors must make solutions cheaper.

• Hadoop is open, but vendors add differentiated features

• Hadoop is an Apache open-source project that anyone can download

for free. Vendors all support, extend and augment Apache Hadoop and

add differentiated features.

Page 15: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 15

© 2014 IBM Corporation 30

� Distributed computing platforms not new to IBM

� Advanced analytic tools

� Global presence

� Deep implementation services

� Complete big data solution

� Compelling roadmap

http://www.forrester.com/pimages/

rws/reprints/document/112461/oid/

1-PBE69P

The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.

The Forrester Wave™ - Hadoop Solutions Q1 2014

© 2014 IBM Corporation 31

InfoSphere BigInsights 3.0 – Worth a look!

Capability IBM InfoSphere

BigInsights

Cloudera CDH5 HortonWorks HDP

2.1

MAP-R 3.1 Pivotal HD 2.0 Amazon Elastic

MapReduce

Open Source Hadoop Components – PIG, Hive,

HBASE, Oozie, Avro etc ..

Big SQL – Rich, high-performance ANSI compliant

SQL on Hadoop

BigSheets – Spreadsheet style visualization tool for

business users

Text Analytics Accelerator – Simplified development

for text analytics (AQL)

Social Data Accelerator – Developer toolkit for social

media applications

Machine Data Accelerator – Developer toolkit for

building log analytics apps

Adaptive MapReduce– High-performance MR with

recoverable jobs

GPFS-FPO –POSIX, HDFS compatible file system

with enterprise features

IDE – ECLIPSE based integrated development

environment

Big R – full R language integration

Watson Explorer – search and index all data within

BigInsights

Page 16: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 16

© 2014 IBM Corporation 32

BigInsights On-Line Resources

BigInsights

On-Line

Resources

© 2014 IBM Corporation 33

InfoSphere BigInsights 3.0 – QuickStart Edition

� Free, no limit, non-production version of BigInsights

� Big SQL, BigSheets, Text Analytics, Big R, management

console, development tools

� Tutorials and education

� Installable images or VM

• Single or multi-node clusters

• Over 53,000 downloads to date

http://IBM.co/QuickStarthttp://www.ibm.com/developerworks/downloads/im/biginsightsquick/http://www.ibm.com/software/data/infosphere/biginsights/quick-start/

Page 17: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 17

© 2014 IBM Corporation 34

External Hadoop Resource

• IBM.com/Hadoop

• Messaging aimed at Hadoop and open source enthusiasts

• Extensive resources, links to other IBM Big Data sites

External BigInsights Resource

• Developer.IBM.com/Hadoop• Referred to as “Hadoop.dev”• Site and resources tailored to technical

buyers and evaluators

Web Resources

© 2014 IBM Corporation 35

BigSQL Value Add To Hadoop

• SQL on Hadoop without Compromise

• http://public.dhe.ibm.com/common/ssi/ecm/en/sww14019usen/SW

W14019USEN.PDF

• New Big SQL Datasheet – Covers key value propositions &

differentiation + HIVE 0.12 vs. Big SQL 3.0 benchmarks

(20x performance advantage on average)

• Key Big SQL advantages

• Enterprise features

• Compatibility

• Performance

• Federation

Page 18: SD Big Data Monthly Meetup #4 - Session 1 - IBM

IBM Big Data & Analytics© 2013 IBM Corporation 18

© 2014 IBM Corporation 36

IBM BigInsights on Cloud

• Enterprise Hadoop as a Service

Focus on analyzing data using BigInsights features including Big

SQL, BigSheets and text analytics rather than managing

infrastructure

• High performance hardware environment

Hadoop specific reference architecture implemented on dedicated

bare metal nodes

• Auto-provision BigInsights on nodes through a simple web

interface

InfoSphere BigInsights

© 2014 IBM Corporation 37

Thank You