Building A Scalable Big Data System for the Internet of Things (IoT))

The Internet of Things

Company

Vinay Nathan, CEO 15 years of varied experience across sales,

marketing, engineering and PM Most recently, VP Sales at Persistent

Systems

Yogesh Kulkarni, COO 16 years of product engineering

experience in global product companies Most recently, Director - Product

Development at BMC Software

Ranjit Nair, CTO 16 years of software architecture and

engineering experience Most recently, Engineering Manager at

Amazon

About Altizon

And this is what we do

My Motivation

IoT is the integration of the physical world into the computing world

IoT is a Big-Data problem

• Massive amounts of data

• Machines are commonly sampled for data at millisecond intervals.

• Volume, variety and velocity.

• That needs to be analyzed in real time

• Condition based monitoring

• Anomaly detection

• That need to be analyzed for actionable insights

• Efficiency, utilization, machine health

• That need supervised and unsupervised machine learning

• Predictive maintenance, proactive support

Cloud EdgeCloud EdgeCloud Edge

Topology

EdgeEdge

The Edge

Sensors

• Network and connectivity

• Wifi, BLE, Zigbee, 6LoWPAN

• Protocols

• MQTT, CoAP, AMQP

• Low Complexity

• Security

• Upgrades

• Network Protocols

• Higher Complexity

• Bidirectional communication

• Integration

• Device context

• Security

Be cloud agnostic

The Cloud Edge

• Protocol Adapters• Edge to cloud protocols

• Filtering rules and aggregations• Batching• Local controller• Highly available• Load balanced

Event Ingestion at Scale

• Device auto-discovery• Metadata driven device discovery

• Device Telemetry Data• Time-series data• Which can be out of sequence

• Alerts and logs• Event validation• Bandwidth and backpressure

• Portable deployment of applications as a single

object versus process sandboxing• Application-centric versus machine/server-

centric• Supports for automatic container builds• Built-in version tracking• Reusable components• Public registry for sharing containers• A growing tools ecosystem from the published

API. https://www.docker.com/what-docker

backend datonis-events balance source server event1 event1.datonis.io:80 check server event2 event2.datonis.io:80 check

backend datonis-api balance roundrobin mode http server api1 api1.datonis.io:80 check server api2 api2.datonis.io:80 check

frontend http bind *:80 mode http

acl events path_beg /event use_backend datonis-events if events

default_backend datonis-api

HAProxy

• Entities• Broker, Topic, Producer, Consumer

• A sharded write ahead log• Contiguous memory allocation• Index and offset• Messages are not deleted on read

• But on an SLA• Data reloads

• Log replication for fault tolerance• Making reads faster

• Kafka-Spark consumer

• Caching.

• Redis can be used in the same manner as memcache• Counting stuff. Atomic counters • Show latest items.

• This is a live in-memory cache and is very fast. • Deletion and filtering.

• If a cached article is deleted it can be removed from the cache using.

• Leaderboards and related problems. • Implement expires on items.• Unique N items in a given amount of time. • Pub/Sub. • Queues.

Real time CEP

• Apache Spark• Unified stream, batch processing and

machine learning

• RDDs• Immutable, resilient, distributed collection

of records.

• DStreams• A continuous sequence of RDDs

val textFile = sc.textFile("hdfs://...")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("hdfs://...")

val ssc = new StreamingContext(sparkConf, Seconds(1))

val lines = ssc.socketTextStream(args(0), args(1))val words = lines.flatMap(_.split(" "))val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)wordCounts.print()ssc.start()ssc.awaitTermination()

Spark Streaming

Actually this is CEP

Why do we love Spark

• Common logic for stream and batch processing• No separate architectures and approaches• Storm would have been appropriate for absolute real-time

• A hit with data-scientists• Rapid iterations on large data sets

• Language support• Python, Java, Scala and R• R syntax is extremely baffling (or maybe I’m just too old)

• Spark MLIB• Statistics, classification, filtering, clustering, feature extraction • The list is constantly growing

Persistence

• Why Mongo?• Concerns around separate databases for transactional data and

event data• Premature optimization

• Path• Started with 2.x. Collection level locking• Now at 3.2. Document level locking• WiredTiger storage engine. 5x with snappy compression.

• Extreme convenience for configuration objects• Design patterns for time-series data• Great toolsets

• Shout out to Mongoid

• Easy data migration

Replica Sets

https://docs.mongodb.org/manual/core/replication-introduction/

• Multiple copies on servers• Provides fault tolerance• All writes to primary

• Secondaries replicate primary oplog.• Asynchronous replications

• Improved read performance• You can specify reading from a replica.

• Automatic failover• Election if the primary goes down

Sharding

https://docs.mongodb.org/manual/core/sharding-introduction/

• Horizontal scaling• Divides and distributes data over shards

• Entities• Shards store data• Query routers route requests to shards• Config servers. Metadata about the shards.

• Shard keys• Range based sharding. Efficient querying• Hash bases sharding. Efficient distribution

• Maintenance• Splitting and balancer

http://xkcd.com/

Questions

Email: ranjit@altizon.com

is hiring

Building A Scalable Big Data System for the Internet of Things (IoT))

Technology

Internet of things ( IoT )

Internet Things of (IoT)

Fosdem IoT devroom, 2015, open scalable IoT systems with XMPP

Internet of Things IoT - Suomen Standardisoimisliitto SFS ry · Characteristics of next-generation IoT M2M apps • Open protocols, standard technologies • Scalable bandwidth •

Internet of Things Iot

Internet of Things (IoT)

Internet of things – opportunities & challenges...Intel® IoT Platform Secure, Scalable, Interoperable The Intel® IoT Platform is an end-to-end reference architecture and family

Internet of Things (IoT) - ewh.ieee.orgewh.ieee.org/r4/fort_wayne/camp/documents/slides/intro-iot.pdfRaspberry Pi Camp IoT 5 What is Internet of Things (IoT) - 2 Things, in the IoT,

Scalable and Secure Internet of Things Connectivity...1. Introduction The Internet of things (IoT) era—in which various objects are connected through information and communication

Fault-Tolerant, Scalable and Interoperable IoT Platform

IoT - Internet of Things

Creator Ci40 IoT kit & Framework - scalable LWM2M IoT dev platform for business

Accelerating the Internet of Things - Amazon Web …...Accelerating the Internet of Things Meeting Market Demand with Scalable, Interoperable IoT Solutions By Rose Schooler, Vice President,

Creating Scalable IoT Platforms · Source: 'ess.com ... Abstracting Sleep from the 10T Application Developer ... Title: Creating Scalable IoT Platforms Author:

IoT: Why Things Should Stay Things · 2015-06-26 · IoT: Why Things Should Stay Things Shane Dyer CEO, Arrayent, Inc

Internet of Things · Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches ... Machines (ELM), scalable in Apache Spark cloud framework is employed

Iot(internet of things)

IOT DEVICE MANAGEMENT: SECURE AND SCALABLE … · IOT DEVICE MANAGEMENT: SECURE AND SCALABLE DEPOYMENTS WITH DIGI REMOTE MANAGER® 5 Managing Security in Your IoT Deployment IoT security

Internet of Things(IOT)

Iot Line Fair 2015 /New Things About IoT