30
© 2014 IBM Corporation Innovations in Big Data & Analytics Phil Thomas IBM Big Data & Analytics Architect [email protected]

Innovations in Big Data & Analytics

Embed Size (px)

DESCRIPTION

Phil Thomas, an IBM Big Data and Analytics Architect presented recently at Internet World - this is the presentation he used. Read his supporting blog post about the event here - http://ukbigdataevents.wordpress.com/2014/08/18/innovations-in-big-data-analytics-by-philip-thomas-software-client-architect/

Citation preview

Page 1: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Innovations in Big Data & Analytics

Phil Thomas

IBM Big Data & Analytics Architect

[email protected]

Page 2: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Big Data & Analytics is a journey.

Be proactive

about privacy,

security and

governance

Build a culture

that infuses

analytics

everywhere

Invest in a

big data &

analytics

platform

Imagine It. Realise It. Trust It.

Page 3: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Data at Scale Data in Many Forms Data in Motion Data Uncertainty

Big Data Is All Data

VolumeVolume VarietyVariety VelocityVelocity VeracityVeracity

But Without a Value Target and Case is Simply a Waste

ValueValue

Growth Efficiency Engagement Automation

Page 4: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Analytics and big data includes traditional and new techniques

• Reconcile data sources together

• Query relational warehouses

• Individual transaction records

• Surface data directly from its source

• Query specialized systems

• Data relationships and networks

• Graphs and reports

• Hierarchical navigation

• Managed and adhoc delivery

• Manual analysis and action

• Visualize masses of data

• Context and relationship navigation

• Exploration of what’s important

• Automated action

• Numeric data and text attributes

• Sample based models

• Data analyzed at rest

• Humans interpret patterns

• Linguistic interpretation of meaning

• More accurate models

• Analyze stream data in motion

• Algorithms uncover hidden patterns

User

Interaction

Traditional Analytics + Big Data Analytics

Data

Access

Applied

Analytics

4

Page 5: Innovations in Big Data & Analytics

© 2014 IBM Corporation

New/Enhanced

Applications

Why did it happen?Reporting, Analysis, Content Analytics

What did I learn, what’s best?Cognitive

What is happening?Exploration & Discovery

What action should I take?Decision Management

What could happen?Predictive Analytics & Modelling

BeMore Right, More Often

Realise It. Invest

Page 6: Innovations in Big Data & Analytics

© 2014 IBM Corporation

IBM Big Data & Analytics Platform

Systems, Security, Storage

IBM Big Data & Analytics Infrastructure

All Data

Reporting, Analysis, Content Analytics

Cognitive

Exploration & Discovery

Decision Management

Predictive Analytics & Modeling

Information Governance Zone

New/Enhanced

Applications

Real-time Analytics

Zone

Exploration, Landing &

Archive Zone

Information Ingestion & Operational Information

Zone

Enterprise Warehouse, Data Mart &

Analytic Appliance

Zone

Realize It. Invest in a Big Data & Analytics platform.

Page 7: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Unique – fuels journey to Cognitive

Innovative – easy to consume

Complete – enterprise-ready

Fast – start anywhere and grow

Watson Foundations te cornerstone of our IBMBig Data & Analytics Portfolio

WATSON FOUNDATIONS

Sales Marketing Finance Operations HRRisk ITFraud

IBM Watson™ and Industry Solutions

SOLUTIONS

CONSULTING AND IMPLEMENTATION SERVICES

BIG DATA & ANALYTICS INFRASTRUCTURE

DecisionManagement

Planning &Forecasting

Discovery &Exploration

Business Intelligence & Predictive Analytics

ContentAnalytics

Information Integration & Governance

Data Mgmt & Warehouse

HadoopSystem

StreamComputing

ContentManagement

WATSON FOUNDATIONS

Sales Marketing Finance Operations HRRisk ITFraud

IBM Watson™ and Industry Solutions

SOLUTIONS

CONSULTING AND IMPLEMENTATION SERVICES

BIG DATA & ANALYTICS INFRASTRUCTURE

DecisionManagement

Planning &Forecasting

Discovery &Exploration

Business Intelligence & Predictive AnalyticsBusiness Intelligence & Predictive Analytics

ContentAnalytics

Information Integration & Governance

Data Mgmt & Warehouse

HadoopSystem

StreamComputing

ContentManagement

Page 8: Innovations in Big Data & Analytics

© 2014 IBM Corporation

…Helps me discover fresh insights

� Predictive and content analytics to

uncover patterns not yet known

� Interactive exploration across all data

…Operates in a timely fashion

� Real-time analytics as data flows through an organisation

� Enterprise-class Hadoop that runs 4x faster

� In-memory computing for speed of thought analytics

…Establishes trust so I can act with confidence

� Governance across complete data lifecycle including Hadoop

� Security and privacy with compliance

� Transparency and context to decision-making process

WATSON FOUNDATIONS

Decision

Management

Planning &

Forecasting

Discovery &

Exploration

Business Intelligence & Predictive Analytics

Content

Analytics

Information Integration & Governance

Data Mgmt &Warehouse

HadoopSystem

StreamComputing

Content Management

WATSON FOUNDATIONS

Decision

Management

Planning &

Forecasting

Discovery &

Exploration

Business Intelligence & Predictive AnalyticsBusiness Intelligence & Predictive Analytics

Content

Analytics

Information Integration & Governance

Data Mgmt &Warehouse

HadoopSystem

StreamComputing

Content Management

Watson Foundations uniquely…

Page 9: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Information Integration & Governance

Exploration, landing and

archiveTrusted data

Reporting & interactive analysis

Deep analytics & modeling

Data types Real-time processing & analytics

Transaction andapplication data

Machine andsensor data

Enterprise content

Social data

Image and video

Third-party data

Operational systems

Actionable insight

Next generation architecture for delivering information and insights

Decision management

Predictive analytics and modeling

Reporting, analysis, content analytics

Discovery and exploration

Page 10: Innovations in Big Data & Analytics

© 2014 IBM Corporation

What Differentiates IBM’s Hadoop Offering?

BigInsights brings the power of Hadoop to the Enterprise by providing administration, discovery, development, security, and best-in-class analytic capabilities.

BigInsights(Blue Suit Hadoop)

Pure Open Source Code

+Optional Enterprise Class Extensions

IBM Support Infrastructure

+=

“Our customers send roughly 35 billion emails every year, and with every email they send, we have more data that we can analyse and feed back to them to help improve their success. Our work analysing email delivery times has already

given our customers a 15-25% lift in their email campaign performance – and that means more customers in their doors and increased revenue.”– Jesse Harriott, Chief Analytics Officer

Page 11: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Streams

BigInsights builds on open source Hadoop capabilities for Enterprise Class Deployments

Watson

Explorer

Cognos

BI

• Accelerators

InfoSphere BigInsights

Open source based

components

Workload Management

Security

Development Environment

Analytics

Extractors and APIs

Enterprise capabilities

performance gains* on average over open source Hadoop

General Parallel Filesystem

Big R

Open sourcebase

*. Audited STAC® Report Securities Technology Analysis Center

BigSheets

Watson

Explorer

Watson

Explorer

Cognos

BI

Watson

Explorer

Cognos

BI

Watson

Explorer

BigSheetsBigSheets

Streams

BigSheets

Streams

BigSheets

Streams

BigSheets

Streams Watson

Explorer

Watson

Explorer

Watson

Explorer

Watson

Explorer

Watson

ExplorerStreamsStreams Watson

Explorer

Streams

BigSheetsBigSheets Cognos

BI

BigSheets Cognos

BI

Page 12: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Big SQL 3.0: Native SQL Query Access for Hadoop

Big SQL EngineBig SQL Engine

BigInsights

Data Sources

SQL

Hive Tables HBase tables CSV Files

Application

JDBC / ODBC Server

JDBC / ODBC Driver

� Native SQL Access to data stored in

BigInsights

� Rich SQL support (ANSI, IBM, Oracle, Teradata)

� IBM Optimiser, Compiler and Runtime

ported to Hadoop

� Native Hadoop data formats

� High performance, highly scalable

� Federated query

� Granular row / column security

Get the technical white paper at

https://ibm.biz/BdRWsK

Page 13: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Big Data ExplorationQuick time to value for big data

discovery & exploration

•Locate and understand existing data sources

•Expose data for new uses, without copying the data to a central location

•Get up & running quickly; discover and tag relevant big data

•Develop new insights and hypotheses

•Connect employees with all of the data at the point of impact

•Use big data sources in new information-centric applications

13

Page 14: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Watson Explorer

14

CM, RM, DM RDBMS Feeds Web 2.0 Email Web CRM, ERP File Systems

ConnectorFramework

App Builder

BigInsights

Integration & Governance

UI / User

Streams WarehouseData Explorer

Find, visualise, understand all big data

to improve decision making

• Increase revenue, productivity

and efficiency by facilitating

navigation of Big Data (structured

& unstructured)

• Discover new insights by combining

and analysing various data types

residing in various federated data

repositories

Page 15: Innovations in Big Data & Analytics

© 2014 IBM Corporation

1515

Highly relevant, personalised

results

Access across many sources

Dynamic categorisation

Leveraging Structured and

unstructured content

Tagging and collaboration

Virtual folders for organising content

Refinements basedon structuredinformation

Expertise location

Page 16: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Information Integration and Governance in times of Big Data

Monitor Data ActivityMask and Redact

• De-identify sensitive data at source or within Hadoop

• Apply obfuscation techniques to both structured and unstructured data

• Monitor big data sources and Hadoop stack

• Real-time alerts

• Centralised reporting of audit data

IBM InfoSphere BigInsights

MDM BigInsights

Big Match EngineInfoSphere

OptimInfoSphere Guardium

Find & Integrate

Master Data• Probabilistic matching on big

data platform (BigInsights/Hadoop)

• Matching at a higher volume• Matching of a wider variety

of data sets

InfoSphere Master Data Management

Page 17: Innovations in Big Data & Analytics

© 2014 IBM Corporation

InfoSphere Streams - Real-Time Analytics on Big Data

� Volume− Gigabytes per second or more

− Terabyte per day or more

� Variety− All kinds of data

− All kinds of analytics

� Velocity− Insights in microseconds

� Agility− Dynamically responsive

− Rapid application development

© 2013 IBM Corporation17

Millions of events per

second

Microsecond Latency

Sensor, video, audio, text,and relational data sources

Just-in-time decisions

Powerful Analytics

Page 18: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Market changes driving the need for next generation databases

Are you ready to respond?

How to do it leveraging existing investments?

How to achieve the full potential without disrupting the business?

The scale and scope of big data present new

opportunities for innovation and

competitive advantage

Technology allows Technology allows us to consume more us to consume more

data and generate data and generate new insightnew insight

Fast access to Fast access to insight is a top insight is a top

requirementrequirement

These insights are These insights are sparking new & sparking new & rapidly evolving rapidly evolving analytic requestsanalytic requests

Businesses need to more quickly generate insight

from information to accelerate decision

making

Organisations need fast, simple and agile

technology strategies for manipulating data and

developing new applications

Page 19: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Multi-workload database software for the era of big data

DB2 10.5 with BLU Acceleration

� Everything you need for your business in ONE database

− Optimized for transactions and analytics

− Enterprise NoSQL for greater application flexibility – JSON, RDF-Graph, XML

� Always available, fast transactions

− Online rolling maintenance updates with no planned downtime1

− Designed for disaster recovery over distances of 1000s km2

� Real benefits, low risk

− In-memory speed and simplicity on existing infrastructure

− Optimized for SAP workloads

− Average 98% Oracle Database application compatibility3

1) Based on IBM design for normal operation with rolling maintenance updates of DB2 server software on a pureScale cluster. Individual results will vary depending on individual workloads, configurations and conditions, network availability and bandwidth.

2) Based on IBM design for normal operation under typical workload. Individual results will vary depending on individual workloads, configurations and conditions, network availability and bandwidth. 3) Available with DB2 Advanced Enterprise Server Edition..

Page 20: Innovations in Big Data & Analytics

© 2014 IBM Corporation

What makes BLU Acceleration different?

Unmatched innovations from IBM Research & Development labs

Instructions Data

Results

C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8

Next Generation In-MemoryIn-memory columnar processing with

dynamic movement of data from storage

Analyse Compressed DataPatented compression technique that preserves order

so data can be used without decompressing

CPU Acceleration

Multi-core and SIMD parallelism

(Single Instruction Multiple Data)

Data SkippingSkips unnecessary processing of irrelevant data

Encoded

Page 21: Innovations in Big Data & Analytics

© 2014 IBM Corporation

� Answers at the speed of thoughtfor growing revenue, reducing cost and lowering risk

� Next generation in-memory with IBM Research innovations

� 8x-25x faster analytics, with some queries running more than 1000x faster1,2

The benefits of DB2 with BLU Acceleration Analytics for the NOW business

� In-memory performance not limited by availability of memory

� Operational simplicity with “load and go” performance

� No need for indexes, aggregates, or tuning

� Compression savings, “10x. That's how much smaller our tables are with BLU Acceleration” – Andrew Juarez, Coca-Cola Bottling Co.

� Automatically adapts to any server, large or small

� Available for on premise or via the cloud

1 Based on internal IBM testing of sample analytic workloads comparing queries accessing row-based tables on DB2 10.1 vs. columnar tables on DB2 10.5. Performance improvement figures are cumulative of all queries in the workload. Individual results will vary depending on individual workloads, configurations and conditions.

2 Based on internal IBM tests of pure analytic workloads comparing queries accessing row-based tables on DB2 10.1 vs. columnar tables on DB2 10.5. Results not typical. Individual results will vary depending on individual workloads, configurations and conditions, including size and content of the table, and number of elements being queried from a given table.

FastFast

SimpleSimple AgileAgile

BLU Acceleration

Page 22: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Built-in Expertise� No indexes and minimal tuning

� Data model agnostic

� Fully parallel, optimised In Database Analytics

Integration by Design� Server, Storage, Database in one easy to use package

� Automatic parallelisation and resource optimisation to scale economically

� Enterprise-class security and platform management

Simplified Experience� Up and running in hours

� Minimal ongoing administration

� Standard interfaces to best of breed Analytics, BI, and data integration tools

� Built-in analytics capabilities allow users to derive insight from data quickly

� Easy connectivity to other IBM Big Data Platform components

IBM PureData System for Analytics

Page 23: Innovations in Big Data & Analytics

© 2014 IBM Corporation

� Animated charts enhance the user experience of general reporting and Cognos Active

Report and allow users to pinpoint trends faster.

� A paradigm shift for delivering value to users with the introduction of visualization

extensibility with RAVE (Rapid Adaptive Visualization Engine).

Interactive Visualisation

Cognos – mobile, interactive visualisation capabilities

Page 24: Innovations in Big Data & Analytics

© 2014 IBM Corporation24

Browse, find and download visualisations from the extensible visualisation community to quickly provide the best visual for your

reporting needs

� Scatter

� Gantt

� Area

� Radar

� Boxplot

� Dial

� Treemap / Heatmap

� Plus a continually growing set of visualisations

analyticszone.com/visualization

New visualisations are a simple download away

Page 25: Innovations in Big Data & Analytics

© 2014 IBM Corporation

IBM SPSS Modeler predictive analytics

Hadoop, Netezza, R, DB2 … support Graphical interface, rich visualisations

Real-time deployment / execution Analytic Catalyst – “Analyst in the software”

Page 26: Innovations in Big Data & Analytics

© 2014 IBM Corporation© 2014 International Business Machines Corporation

Watson is cognitive computing

Understands

natural

language

Generates

and

evaluates

hypotheses

Adapts

and learns

Watson understands me.

Watson engages me.

Watson learns and improves over time.

Watson helps me discover.

Watson establishes trust.

Watson has endless capacity for insight.

Watson operates in a timely fashion.

Page 27: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Know meLeverage profile data for personalized insight into

client wants and needs to contextualize experience

Client

Watson can transform the way people interact over the lifetime

of their relationship

Empower MeInteractive, informed natural

language dialogue that enables insights at the point of action

Engage meDynamic, evidence-based omni-channel experiences

that adapt to client preferences

Page 28: Innovations in Big Data & Analytics

© 2014 IBM Corporation

This will be Watson

Sees

Hears

Experiences

Understands natural language

Generates and evaluates hypotheses

Adapts and learns

Reasons

Explores

Visualizes

Page 29: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Thank You

Page 30: Innovations in Big Data & Analytics

© 2014 IBM Corporation

Legal Disclaimer

• © IBM Corporation 2014. All Rights Reserved.• The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained

in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are

subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

• References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to

future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

• Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.