Upload
nasscom-product-connect
View
497
Download
0
Embed Size (px)
Citation preview
The Internet of Things
Company
Vinay Nathan, CEO 15 years of varied experience across sales,
marketing, engineering and PM Most recently, VP Sales at Persistent
Systems
Yogesh Kulkarni, COO 16 years of product engineering
experience in global product companies Most recently, Director - Product
Development at BMC Software
Ranjit Nair, CTO 16 years of software architecture and
engineering experience Most recently, Engineering Manager at
Amazon
About Altizon
And this is what we do
My Motivation
IoT
IoT is the integration of the physical world into the computing world
IoT is a Big-Data problem
• Massive amounts of data
• Machines are commonly sampled for data at millisecond intervals.
• Volume, variety and velocity.
• That needs to be analyzed in real time
• Condition based monitoring
• Anomaly detection
• That need to be analyzed for actionable insights
• Efficiency, utilization, machine health
• That need supervised and unsupervised machine learning
• Predictive maintenance, proactive support
Cloud
Cloud EdgeCloud EdgeCloud Edge
Edge
Topology
EdgeEdge
The Edge
Sensors
• Network and connectivity
• Wifi, BLE, Zigbee, 6LoWPAN
• Protocols
• MQTT, CoAP, AMQP
• Low Complexity
• Security
• Upgrades
Edge
• Network Protocols
• Higher Complexity
• Bidirectional communication
• Integration
• Device context
• Security
Be cloud agnostic
The Cloud Edge
• Protocol Adapters• Edge to cloud protocols
• Filtering rules and aggregations• Batching• Local controller• Highly available• Load balanced
Event Ingestion at Scale
• Device auto-discovery• Metadata driven device discovery
• Device Telemetry Data• Time-series data• Which can be out of sequence
• Alerts and logs• Event validation• Bandwidth and backpressure
• Portable deployment of applications as a single
object versus process sandboxing• Application-centric versus machine/server-
centric• Supports for automatic container builds• Built-in version tracking• Reusable components• Public registry for sharing containers• A growing tools ecosystem from the published
API. https://www.docker.com/what-docker
backend datonis-events balance source server event1 event1.datonis.io:80 check server event2 event2.datonis.io:80 check
backend datonis-api balance roundrobin mode http server api1 api1.datonis.io:80 check server api2 api2.datonis.io:80 check
frontend http bind *:80 mode http
acl events path_beg /event use_backend datonis-events if events
default_backend datonis-api
HAProxy
• Entities• Broker, Topic, Producer, Consumer
• A sharded write ahead log• Contiguous memory allocation• Index and offset• Messages are not deleted on read
• But on an SLA• Data reloads
• Log replication for fault tolerance• Making reads faster
• Kafka-Spark consumer
• Caching.
• Redis can be used in the same manner as memcache• Counting stuff. Atomic counters • Show latest items.
• This is a live in-memory cache and is very fast. • Deletion and filtering.
• If a cached article is deleted it can be removed from the cache using.
• Leaderboards and related problems. • Implement expires on items.• Unique N items in a given amount of time. • Pub/Sub. • Queues.
Real time CEP
• Apache Spark• Unified stream, batch processing and
machine learning
• RDDs• Immutable, resilient, distributed collection
of records.
• DStreams• A continuous sequence of RDDs
val textFile = sc.textFile("hdfs://...")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("hdfs://...")
val ssc = new StreamingContext(sparkConf, Seconds(1))
val lines = ssc.socketTextStream(args(0), args(1))val words = lines.flatMap(_.split(" "))val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)wordCounts.print()ssc.start()ssc.awaitTermination()
Spark
Spark Streaming
Actually this is CEP
Why do we love Spark
• Common logic for stream and batch processing• No separate architectures and approaches• Storm would have been appropriate for absolute real-time
• A hit with data-scientists• Rapid iterations on large data sets
• Language support• Python, Java, Scala and R• R syntax is extremely baffling (or maybe I’m just too old)
• Spark MLIB• Statistics, classification, filtering, clustering, feature extraction • The list is constantly growing
Persistence
• Why Mongo?• Concerns around separate databases for transactional data and
event data• Premature optimization
• Path• Started with 2.x. Collection level locking• Now at 3.2. Document level locking• WiredTiger storage engine. 5x with snappy compression.
• Extreme convenience for configuration objects• Design patterns for time-series data• Great toolsets
• Shout out to Mongoid
• Easy data migration
Replica Sets
https://docs.mongodb.org/manual/core/replication-introduction/
• Multiple copies on servers• Provides fault tolerance• All writes to primary
• Secondaries replicate primary oplog.• Asynchronous replications
• Improved read performance• You can specify reading from a replica.
• Automatic failover• Election if the primary goes down
Sharding
https://docs.mongodb.org/manual/core/sharding-introduction/
• Horizontal scaling• Divides and distributes data over shards
• Entities• Shards store data• Query routers route requests to shards• Config servers. Metadata about the shards.
• Shard keys• Range based sharding. Efficient querying• Hash bases sharding. Efficient distribution
• Maintenance• Splitting and balancer