Transcript

Monitoring microservices: Docker, Mesos and

Kubernetes visibility at scale

Me

Alessandro Gallotta Software Engineer @sysdig

@alex_gallotta

@sysdig

Introducing Sysdig

• Capture system events, filter them, run useful scripts • Lua scripting • Open Source • Nice curses UI

lsof

nets

tat

tcpd

ump

htopps

stra

ce

and more

• track user activity • top files/processes/connections by • cpu • bytes • …

• logs • containers • tracers • you name it, we track it

Design Goals

• Production-ready • Simple • lightweight

• Rich data • Natural workflow • Native support for containers • Native support for and more…

Demo time

Containers are Great…

• Simple • Scalable • Isolated • Service-oriented • Elastic • Flexible • Separation of concerns

But Some Things Are Becoming More Complex

CacheWebserverDatabase

Legacy Monolitic App

But Some Things Are Becoming More Complex

Computing Node

Computing Node

Computing Node

Service1Service2Service3

Computing Node

Computing Node

Computing Node

Container-based App

But Some Things Are Becoming More Complex

Computing Node

Computing Node

Computing Node

Computing Node

Computing Node

Computing Node

Container-based App

Service1Service2Service3

But Things Are Becoming More ComplexComputing Node

Computing Node

Computing Node

Service1Service2Service3

Computing Node

Computing Node

Computing Node

Container-based App

Two Problems

Problem #1: How Do We Get Data Out of These Guys?

Computing Node

Computing Node

Computing Node

Service1Service2Service3

Computing Node

Computing Node

Computing Node

Container-based App

• System • Network • Process • JVM • Response Time • Requests • Errors

Problem #2: How Do We Get Make Sense of the Data?

Computing Node

Computing Node

Computing Node

Service1Service2Service3

Computing Node

Computing Node

Computing Node

Container-based App

Complexity Calls for Great Monitoring

• Isolated •Automated •Orchestration-aware • Simple • Scalable

The Orchestrated Version of This

Complexity Also Calls for Great Troubleshooting

What‘s the network activity of my

Marathon group?

What’s using the CPU the Wordpress

task?

How the hell does my Mesos task

work?!

Where’s the bottleneck?What’s the response

time of my login service?

What transactions is my Redis service serving?

Hypervisor

How Do I Get Data Out of These Things: VMs

VM1 VM3 VM2

Hypervisor

Monitoring VMs, Option 1

VM1 VM3 VM2

Hypervisor-level instrumentation, Amazon CloudWatch

Hypervisor

Monitoring VMs, Option 2

VM1 VM3 VM2

Monitoring Agent

OS

Monitoring Containers

Container1 Container3 Container2

OS

Monitoring Containers, Option 1

Container1 Container3 Container2

Monitoring Agent

OS

Monitoring Containers, Option 1

Container1 Container3 Container2

Monitoring Agent

• Not scalable • Not composable • Adds dependencies/size • Kills the concept of one process per container

OS

Monitoring Containers, Option 2

Container1 Container3 Container2

Container runtime – level monitoring Kernel-level instrumentation

OS

Monitoring Containers, Option 3

Container1 Monitoring Container

Container2

Sysdig Data Collection

Kernel

Container1

Docker

Container2

Docker

Container3

LXCAppApp

Sysdig Data Collection

Kernel

Container1

Docker

Container2

Docker

Container3

LXCAppApp

Instrumentation through kernel module

Sysdig Data Collection

Kernel

Container1

Docker

Container2

Docker

Container3

LXCAppApp

sysdig

Docker

Capture and analysis

Sky cloud is the limit

• Correlate data • Scale with your infrastructure • Alerts, notifications, visualization tools • Continuous data collection and retention from production systems

Sysdig Cloud

• Sysdig evolution for the cloud • Preserve the premises • production ready • natural workflow • ease of use • 0 to low config needed

Out of the box support

Demo time 2

How About Security?

Did someone log into one of our containers?

Has something been installed in

one of the containers?

Have we been hacked?Were configuration files

changed?

How About Security?

Did someone log into one of our containers?

Have we been hacked?Were configuration files

changed?

Has something been installed in

one of the containers?

An anomaly detection system built on top of the sysdig engine

Falco Architecture

Kernel

Container1

Docker

Container2

rkt

Container3

LXCAppApp

Rule system

Docker

• File activity • Network Activity • User Activity • Process execution • IPC • …

Rules Examples

rule: shell_in_container desc: a shell running in a container condition: container.id != host and proc.name = bash output: “Shell running in container (user=%user.name container_id=%container.id container_name=%container.name shell=%proc.name parent=%proc.pname)” priority: WARNING

Rules Examples

rule: mysqld_spawn_process desc: mysqld spawning a new process after startup. condition: spawn_process and proc.name = mysqld and not proc_is_new output: “mysqld spawned new process after startup (user=%user.name command=%proc.cmdline file=%fd.name)” priority: WARNING

Rules Examples

macro: open_connection condition: syscall.type=connect and evt.dir=< and fd.sockfamily =ip

rule: system_binaries_network_activity desc: any network connection initiated by system binaries that are not expected to send or receive any network traffic condition: open_connection and proc.name in (ls, ps, mkdir, … ) output: Known system binary made network connection (user=%user.name command=%proc.cmdline connection=%fd.name) priority: WARNING"

Thank You!www.sysdig.org

www.sysdig.org/falco

@alex_gallotta

@sysdig

github.com/draios

www.sysdig.com


Recommended