37
Big Data Application Architectures - IoT Nishant Thacker Technical Product Manager – Big Data Microsoft @nishantthacker

Big Data Application Architectures - IoT

Embed Size (px)

Citation preview

Big Data Application Architectures - IoT

Nishant ThackerTechnical Product Manager – Big DataMicrosoft

@nishantthacker

Big Data Application Architectures - IoT

2

Big Data Application Architectures - IoT

3

“Information is the oil of the 21st century,

and analytics is the combustion engine.”

- Peter Sondergaard - Gartner

Today: More “Connected Things“ Than Toothbrushes In The World…

Category 2013 2014 2015 2020

Automotive 96 190 372 3,511

Consumer 1,842 2,245 2.875 13,173

Generic

Business395 479 624 5,159

Vertical

Business699 837 1,009 3,164

Grand Total 3,032 3,750 4,881 25,007

The Internet of Things Story

3

Customer Examples *

From the $ 1 WiFi Module…

1 x CPU

160 MHz

80 kByte usable RAM

… to the $ 1000+ Automotive Supercomputer

2 x CPU

2 x GPU

8 Teraflops

Get To Know Your Things!Device Supplier Processor Memory IOs Network OS Price*

ESP8266

modules

Espressif 1 x 160

MHz

128 kB RAM, 1

MB flash

12 GPIO

1 ADC, I2C, I2C

WiFi 2.4 GHz n/a $ 2.5

Photon Particle.io 1 x 120

MHz

128 kB RAM, 1

MB flash

18 GPIO, 2 SPI, I2S, I2C, CAN,

USB, 9 PWM, ADC, DAC

WiFi 2.4 GHz n/a $ 19

Electron Particle.io 1 x 120

MHz

128 kB RAM

1 MB flash

28 GPIO 3G UMTS n/a $ 39

WiLink 8

family

Texas

Instruments

n/a n/a n/a WiFi 2.4/5 GHz,

Bluetooth 4.1

LE

n/a $ 10 – 25

(industrial

grade)

Arduino

Leonardo

Arduino LLC,

Arduino Sarl

1 x 16 MHz 2.5 kB RAM 32

KB flash

20 GPIO, 7 PWM, 10 ADC, USB - n/a $ 10

Raspberry Pi

Zero

Raspberry Pi

Foundation

1 x 1 GHz 512 MB RAM

micro-SD

10 GPIO, Mini HDMI, USB - Linux $ 5

Raspberry Pi

2

Raspberry Pi

Foundation

4 x 900

MHz,

GPU

1 GB RAM

Micro-SD

40 GPIO, 1 PWM, 1 ADC, HDMI, 4

USB

Ethernet Windows 10,

Linux, RiscOS

$ 35

Beaglebone

Black

Beagleboard.o

rg

1 x 1 GHz 512 MB RAM, 4

GB flash

69 GPIO, 2 CAN, 10 ADC, 8 PWM,

HDMI, USB

Ethernet Linux $ 55

Drive PX 2

(H2/CY16)

NVIDIA 2 x CPU

2 x GPU

8 Tflops

tbd 12 cameras, LIDAR, RADAR,

Ultrasonic, …

Tbd tbd $ 1000+ ?

Big Data Application Architectures - IoT

12

IoT Reference Architecture

Low power devices

Existing IoTdevices

IoT Client

Solution UX

Provisioning API

Identity and Registry Stores

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Gateway

Data Lake

Gateway

App Backend

Data Path

Optional solution component

IoT solution component

IoT Client

Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity

Personal mobile devices

IP capable devices

IoT Client

Business systems

Reference Architecture & Azure Services

Low power devices

Existing IoTdevices

IoT Client

Solution UX

Provisioning API

Device Registry

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Gateway

Data Lake

Gateway

App Backend

IoT Client

Personal mobile devices

IP capable devices

IoT Client

Business systems

Data Path

Optional solution component

Azure IoT solution component

Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity

Device Connectivity Options

Field Gateway

CoAP, AllJoyn, OPC

Custom Cloud Gateway

(Cloud Service, VM)VPN/ExpressRoute

OPC, HTTP, CoAP

Field Gateway

CoAP, AllJoyn, OPC

IoT Hub

Custom Cloud Gateway

(Cloud Service, VM)

AMQP, MQTT, HTTPS

Custom Protocols

Data Path

Optional solution component

Azure IoT solution component

Device

IoT Client

Device

IoT Client

Device

IoT ClientDevice

Device

Device

AMQP, MQTT, HTTPS

Device Stores

App Backend Solution UX

Provisioning API

Device Registry Store

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Data Lake

Gateway(Kafka,

IoT Hub,Event Hubs)

Gateway

IP capable devices

IoT Client

Data Path

Optional solution component

Azure IoT solution component

IoT Client

Existing IoTdevices

IoT Client

Low power devices

Device Identity

Store

Device Identity, Registry and State Stores

Identity StoreAuthority for all registered devices

Stores identity information and authentication secrets

Registry StoreIndex in addition to the identity store

Contains discovery and reference data related to devices

Can define a schema model or use a vertical industry standard schema for metadata

Can contain structured metadata and links to externally stored operational data

Device State StoreContains operational data related to the devices:

- “Last known values” for each device

- Aggregated or computed values

- Stream of device data events

Device Provisioning

Provisioning API is the common external interface for changes on device identity and device registry stores.

Workflow for processing individual and bulk requests:Registering new devices

Updating or removing existing devices

Activation or access control

May also include interactions with external systems:Billing systems

Business support systems

Connectivity management systems

Stream Processors

App Backend

Gateway

IP capable devices

IoT Client

Data Path

Optional solution component

Azure IoT solution component

IoT Client

Existing IoTdevices

IoT Client

Low power devices

Solution UX

Provisioning API

Identity and Registry Stores

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Data Lake

Cloud Gateway

Stream Processing: Data FlowAfter ingress through the IoT Hub, the flow of data through the system is facilitated by data pumps and analytics tasks

Data flow can be driven by:

• Apache Storm on Azure HDInsight

• Apache Spark on Azure HDInsight

• Azure Stream Analytics

• Custom Event Processors

Each can perform tasks

in flight:

• Data aggregation

• Data enrichment

• Complex event processing

… and can output data

to:

• Azure Data Lake

• Azure Blobs/Tables

• HDInsight / HBase

• Azure SQL DB

• Time Series Databases

• Event Hub

• Service Bus Queues

Stream Processor Examples

Queue

Gateway

IP capable devices

IoT Client

Data Path

Optional solution component

Azure IoT solution component

IoT Client

Existing IoTdevices

IoT Client

Low power devices

Device Registry StoreDevice Metadata

Processor

Data Lake

Cloud Gateway

Device State StoreDevice State

Processor

Notification Processor

Raw Telemetry Processor

App Backend

Rules Processor

Event HubStream Transformation

ProcessorSecondary Stream

Processor

App Backend

App Backend

Gateway

IP capable devices

IoT Client

Data Path

Optional solution component

Azure IoT solution component

IoT Client

Existing IoTdevices

IoT Client

Low power devices

Solution UX

Provisioning API

Identity and Registry Stores

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Storage

Cloud Gateway

High-Scale Compute Models

Scale-appropriate compute modelsActor Frameworks / Service Fabric Reliable Actors: distributed compute fabric hosting device actors.

Service Fabric Reliable Collections: highly available with replicated and local state management.

Azure Batch: job scheduling and compute management for highly parallelizable compute workloads.

Simple programming logic in vastly scalable compute nodes

Data Analytics

App Backend

Gateway

IP capable devices

IoT Client

Data Path

Optional solution component

Azure IoT solution component

IoT Client

Existing IoTdevices

IoT Client

Low power devices

Solution UX

Provisioning API

Identity and Registry Stores

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Data Lake

Cloud Gateway

Data Analytics

Ingestion Gateway

Stream Processing

(ASA, Storm or Spark)

Batch Events / Logs

Fetching & Updating

Reference Data

Interceptor (Rules)

Spark

Hive/Pig

U-SQL

Azure Data Lake Store Azure Data Lake Analytics

SQL DB

R, Azure ML and/or

Spark

Reports and Dashboards

Real Time Scoring

Training and Scoring

ML Models

Azure SQL DW

Federated Query

NRT Events

Transactional Data

Alerts

Data Analytics

Real-Time Analysis Aggregation/Reduction, Temporal Queries, State Correlation, Threshold Detection, Alerting

Data-At-Rest AnalysisTime-Series, Map/Reduce, Correlation

Machine LearningPattern Detection, Behavior Prediction

Plausibility Analysis, Anomaly and Fraud Detection

Power BI

HDInsight

Stream Analytics

Data Factory

Machine Learning

WebHDFS

YARN

U-SQL

Analytics Service HDInsight

(managed Hadoop Clusters)Analytics

Store

Azure Data Lake

Cortana Intelligence Suite

Action

People

Automated Systems

Apps

Web

Mobile

Bots

Intelligence

Dashboards &

Visualizations

Cortana

Bot

Framework

Cognitive

Services

Power BI

Information

Management

Event Hubs

Data Catalog

Data Factory

Machine Learning

and Analytics

HDInsight

(Hadoop and

Spark)

Stream Analytics

Intelligence

Data Lake

Analytics

Machine

Learning

Big Data Stores

SQL Data

Warehouse

Data Lake Store

Data Sources

Apps

Sensors and devices

Data

Presentation and Business Connectivity

App Backend

Gateway

IP capable devices

IoT Client

Data Path

Optional solution component

Azure IoT solution component

IoT Client

Existing IoTdevices

IoT Client

Low power devices

Solution UX

Provisioning API

Identity and Registry Stores

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Data Lake

Cloud Gateway

Reference arch. with component services

Low power devices

Existing IoTdevices

IoT Client

Solution UX

Provisioning API

Device Registry

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Gateway

Data Lake

Gateway

App Backend

IoT Client

Personal mobile devices

IP capable devices

IoT Client

Business systems

Data Path

Optional solution component

Azure IoT solution component

Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity

Big Data Application Architectures - IoT

31

Reference Architecture Guiding Principles

HeterogeneityAccommodates for a vast variety of scenarios, environments, devices, and processing patterns

SecurityConsiders security and privacy measures across all areas

Hyper-scaleSupports millions of connected devices

FlexibilityAllows for composability and extensibility to enable the usage of various first-party or third-party technologies

Big Data Application Architectures - IoT

33

© 2016 Microsoft Corporation. All rights reserved.