13
© 2014 IBM Corporation The sensor data challenge Innovations (not only) for the Internet of Things Big Data Meetup Berlin 2014-10-23 Stephan Reimann IT Specialist Big Data [email protected] @stereimann de.linkedin.com/in/stephanreimann/

The sensor data challenge - Innovations (not only) for the Internet of Things

Embed Size (px)

DESCRIPTION

slides from my talk at Big Data meetup Berlin on Oct 23rd 2014 about innovations to simplify sensor (big) data analytics

Citation preview

Page 1: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Take action on sensor data in real-time based on analytics in R Easy streaming analytics with InfoSphere Streams Stephan Reimann – IT Specialist Big Data - [email protected] d Wilfried Hoge – IT Architect Big Data – [email protected]

The sensor data challenge Innovations (not only) for the Internet of Things Big Data Meetup Berlin – 2014-10-23 Stephan Reimann – IT Specialist Big Data – [email protected] @stereimann de.linkedin.com/in/stephanreimann/

Page 2: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Sensor data present an enormous business opportunity across industries

Manufacturing

IBM

uses automated quality

testing data for detection

of anomalies in the

semiconductor wafer

manufacturing process to

minimize wafer loss

Source and more details

Connected Car

PSA Peugot Citroën

uses IBM Big Data

technologies including

InfoSphere BigInsights as

the basis of their con-

nected services initiative

to bring additional services

to vehicle owners

Source

Industrial / Transport

Pratt & Wittney

“reduce maintenance costs by

up to 20 percent”

“... less disruptions, and remo-

vals, and when the engine is in

the shop, targeted repairs so

the engine can come out of the

shop quickly”

Source

Industrie 4.0 Energy & Utilities Connected Car Healthcare

... It

im

pro

ve

s e

ffic

ien

cy &

qu

ali

ty

an

d e

na

ble

s n

ew

bu

sin

es

s m

od

els

..

. a

nd

th

ere

are

ma

ny m

ore

op

po

tun

itie

s

Your fitness devices are also part of it!

2

Page 3: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Making sense means ...

– detecting hidden correlations

– predicting future behavior predictive maintenance

– detecting outliers

It is not about having the data, it is about using analytics to make sense of it and creating value

The hard thing of making sense is doing it, because ...

– the topic is relatively new, there are not so much out-of-the-box solutions, so you

probably have to create your own solution

– it is a great opportunity to be innovative and gain competitive advantage

– creating your own solution will typically require using tools such as the Hadoop

framework, R, probably something for data preparation and reporting, ...

– this often means heavy programming, ... Think of available skills and time to market

3

Page 4: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation 4

Analyzing large historical sensor data sets

require flexible and easy to use tools

Innovation #1: SQL on Hadoop

Page 5: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Sensor is very special structured data

• A lot of different sources

• Structure differs between sources, e.g. number

of attributes, value encodings, ... And is usually

evolving

• (very) high volume

Use SQL on Hadoop -> Big SQL

– widely used, leverage existing skills

– Declarative:

what you want vs. how to get it

– Use your existing tools

Sensor data usually requires flexible schemas Analytics on sensor data isn‘t special: it should be as simple as always

Source B Source B Source A

Databases are not the

primary choice, due to

the flexible schema

Hadoop is pretty well

suited to analyze sensor

data in its raw format

But databases have

SQL, which offers a

very easy way to

prepare and analyze

structured data

5

Page 6: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Big SQL combines the simplicity of SQL with the flexibility of Hadoop

Big SQL is an IBM innovation that provides rich, robust,

standards-based SQL support for data stored in

InfoSphere BigInsights (IBM’s Hadoop distribution)

– Full support for subqueries

– OLAP operations, grouping sets, analytic

aggregates, ...

– All standard join operators (get value from

combining data)

– Use your existing queries and tools

No propriety storage format

– Never need to copy data to a proprietary

representation

– It is not a database, it is running in Hadoop, on

standard data formats

Big SQL = easy to do SQL combined with the

flexiblity of Hadoop (like schema-on-read)

InfoSphere BigInsights

Big SQL SQL MPP Runtime

Data Sources

Parquet CSV Seq RC

Avro ORC JSON Custom

SQL-based

Application

6

Page 7: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Big SQL is architected for performance

InfoSphere BigInsights

Big SQL SQL MPP Runtime

Data Sources

Parquet CSV Seq RC

Avro ORC JSON Custom

SQL-based

Application

Uses its own engine, replace MapReduce with a

modern MPP architecture

– Compiler and runtime are native code (not java)

– Big SQL worker daemons live directly on cluster

– Continuously running (no startup latency)

– Processing happens locally at the data

Architected from the ground up for performance

– low latency and high throughput

– Comprehensive query rewrite and

optimization (cost based optimizer)

Operations occur in memory with the ability

to spill to disk

– Supports aggregations and sorts larger than

available RAM

7

Page 9: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Innovation #2: Streaming Analytics

9

Data don‘t have to be stored to be analyzed

Streaming analytics is the key enabler for

real time use cases with sensor data

Page 10: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Traditional approach

– Historical fact finding

– Analyze persisted data

– (Micro-) Batch philosophy

– PULL approach

Streaming analytics

– Analyze the current moment / the now

– Analyze data directly “in Motion” – without

storing it

– Analyze data at the speed it is created

– PUSH approach

Data don‘t need to be persisted to be analyzed, streaming analytics represents a paradigm shift to enable real time use cases

Repository Insight Analysis Data Insight Analysis Data

10

Page 11: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

InfoSphere Streams is the result of an IBM research project, designed for high-throughput, low latency and to make streaming analytics easy

Scale out

Millions of Events per Second

Complex Data & Analytics

All kinds of data

Complex analytics: Everything you

can express via an algorithm

Low Latency

Analyzes data at the speed it is

created

Latencies down to µs

Immediate action in real time

+ +

Info

Sp

he

re S

tre

am

s

Ca

pa

bil

itie

s

Ho

w it

wo

rks

– Define apps as flow graphs consisting of

sources (inputs), operators & sinks (outputs)

– Extend the functionality with your code if

required for full flexibility

– The clustered, distributed runtime on

commodity HW scales nearly limitless

– GUIs for rapid development and

operations make streaming analytics easy

11

Page 12: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Free Quickstart Edition

Developer Community

Streaming analytics is about analyzing all the data, continously, just in time, it enables a completely new generation of big data apps

www.ibmdw.net/streamsdev/ ibm.co/streamsqs

Stop just dreaming of real time big data

Start with streaming analytics!!!

+

Radio astronomy Healthcare TelCo Transport Smart Grid IoT

Streaming Analytics is already reality ... and is a key component of many

innovations

...

Tutorials,

Labs,

Forum, ...

Connected Car

GitHub Community

github.com/IBMStreams

+ Toolkits,

Toolkits,

Toolkits

12

Page 13: The sensor data challenge - Innovations (not only) for the Internet of Things

© 2014 IBM Corporation

Where technology meets business potential: Start making sense of your sensor data, everything is prepared!

Big SQL

Easy to do SQL analytics

combined with fully

flexible schema-on-read

and Hadoop capabilities

InfoSphere Streams

Analyzes data at the

speed it is created with

maximum simplicity and

minimum latency

Many more, such as

• Time series functionality

• Efficient transport

protocols

• Cloud services (Bluemix)

Gain

valu

e f

rom

yo

ur

data

13

tech

no

log

y

Inn

ovati

on

s

ma

ke

it

easy

There are many opportu-

nities to gain value from

(not only) sensor data.

Let‘s talk how to make

sense of your data! http://www-05.ibm.com/de/events/workshop/bigdata/