Upload
claus-matzinger
View
98
Download
1
Embed Size (px)
Citation preview
Containerized DBsIn a Machine Data Environment
Docker NYC, 9th March 2017@claus__m
Claus (me)1.5yrs at Crate.ioDevRel/Field Engineering/Support/Integrations/…
Talk to me aboutRust, Raspberry Pis, Tech!
@claus__m
Crate.ioFounded in 2013
~25 people and growing
OfficesSan Francisco, Berlin, Dornbirn (AT)
Dockerized1M+ downloads on Docker hub
@claus__m
Machine Data
@claus__m
Source: HPE Jun 2016 http://www.slideshare.net/penumuru/harness-the-power-of-big-data-with-oracle-63438438/9
@claus__m
Machine Data CharacteristicsMillions of data points/second Streaming in from sensors, devices, logs, etc.
Data diversityStructured & unstructured JSON, Blobs
Real-time query performanceMonitoring & alerting
Complex queries of big data volumesWith Terabytes of historic data
GrowthAdding sources often means exponential growth @claus__m
Machine DataInternet of ThingsSensors, cameras, ...
Wearables, GadgetsLocation data, interaction data, ...
Logs & Monitoring dataComponent health monitoring, access logs, ...
Industry 4.0, DigitizationProduction line insights, automation, ...
VehiclesLocation data, health data, ...
@claus__m
Clickdrive.ioFleet management & vehicle trackingVehicle health and tracking data
2,000 data points per car, per second
In-depth & real-time analysis
● Predictive maintenance ● Fix cars before catastrophe strikes● Route & driver efficiency● Accident reconstruction
@claus__m
RoomonitorSmart apartments for property managers
● Climate monitoring & control● Occupancy● Noise level● Access monitoring & control
Alerts
● AC is on with window open!● Make rental properties more efficient,
safe, profitable
@claus__m
Skyhigh NetworksCloud access security broker (CASB)
● Logging access data from cloud services● Billions of accesses per day from 600+
customers, 10s of thousands of concurrent TCP connections
Machine data is the fingerprint of fraud
● Run AI algorithms to define “normal”● Flag deviations/anomalies in real-time
@claus__m
ArchitectureFor Machine Data
@claus__m
MicroservicesContainersIsolation by default
FlexibilityBuilding blocks
Horizontally scalableMostly
Stateful containersDatabases?
@claus__m
An Open StackExample
@claus__m
Consumer
Visualizer
An Example
Sensor
SensorProduces data
ConsumerReceives and enriches data
VisualizerDraws stuff
@claus__m
LOAD BALANCER
V
C
Deploy!S
U
SS
C
V
C
V
Load balancerFor TLS, reverse proxying, load balancing
High availability3 instances
A few sensorsOne user to actually use it
@claus__m
Go Live
More users!More sensors and users
Data storageSlow and fast
Monitoring & AnalyticsTwo different subsystems
LOAD BALANCER
V
C
S
S
U
S SU
NoSQL DBMessage Queue
SQL DB
US
S
C
V
C
MO
NITO
RING
V
S
AN
ALY
TICS
@claus__m
But ...
Even more users?Horizontal scaling?
Maintenance & bug hunting?Mostly via scheduled downtimes
Reporting?Kafka? Elasticsearch?
Security?Access control?
Expertise?Knowledge transfer?
LOAD BALANCER
V
C
S
U
S SU
NoSQL DBMessage Queue
SQL DB
US
S
C
V
C
MO
NITO
RING
V
S
AN
ALY
TICS
S
@claus__m
A Database for MicroservicesShared nothing
Replication
Sane defaults
Resilient
Cross-functional
@claus__m
Another DB?Yay!
@claus__m
Yay!
…which one though?
@claus__m
CrateDBgithub.com/crate/crate
hub.docker.com/r/_/crate
@claus__m
Yay!
@claus__m
CrateDBShared nothing
Partitioning & auto-sharding
Replication
(Almost) Zero config
Multi model: Structured & unstructured
SQL
@claus__m
CrateDB FundamentalsDisk-based index with in-memory cachingFast and efficient OS caching
Shards: Units of dataConcurrency by distributing shards
Distributed query execution engine“Push down” queries
@claus__m
Postgres Wire Protocol
Presto Parser
Distributed Query Planner
Query Execution Engine
Elasticsearch
Lucene
CLIENT
@claus__m
A better setup!Horizontal scalabilityScale out everything
Reduced tech stackGet to know it quicker
Live reportingUse ad-hoc queries on production data
FlexibilitySchema Evolution not required @claus__m
LOAD BALANCER
V
C
SS
U
S SU
US
S
C
V
C
MO
NITO
RING
V
S
AN
ALY
TICS
A better setup!No single point of failureAs highly available as your service
Reduced network trafficBetter reliability
No queueWork with real data
DB isolationAccessible onlyfrom the host
@claus__m
LOAD BALANCER
V
C
SS
U
S SU
US
S
C
V
C
MO
NITO
RING
V
S
AN
ALY
TICS
Demo Time!
@claus__m
An Open Stack for Machine DataAd-hoc analysis with SQLInstant reporting on production data
Integrates wellLegacy SQL applications included
Horizontally scalableContainer native, highly availability
@claus__m
Thanks!
Links
https://github.com/celaus
https://github.com/crate
https://hub.docker.com/r/_/crate
https://crate.io
@crateio
@claus__m