Monitoring microservices: Docker, Mesos and Kubernetes visibility at scale

  • Published on
    06-Jan-2017

  • View
    400

  • Download
    4

Embed Size (px)

Transcript

<ul><li><p>Monitoring microservices: Docker, Mesos and </p><p>Kubernetes visibility at scale </p></li><li><p>Me</p><p>Alessandro Gallotta Software Engineer @sysdig </p><p>@alex_gallotta</p><p>@sysdig</p></li><li><p>Introducing Sysdig</p><p> Capture system events, filter them, run useful scripts Lua scripting Open Source Nice curses UI</p><p>lsof</p><p>nets</p><p>tat</p><p>tcpd</p><p>ump</p><p>htopps</p><p>stra</p><p>ce</p></li><li><p>and more</p><p> track user activity top files/processes/connections by cpu bytes </p><p> logs containers tracers you name it, we track it</p></li><li><p>Design Goals</p><p> Production-ready Simple lightweight </p><p> Rich data Natural workflow Native support for containers Native support for and more</p></li><li><p>Demo time</p></li><li><p>Containers are Great</p><p> Simple Scalable Isolated Service-oriented Elastic Flexible Separation of concerns</p></li><li><p>But Some Things Are Becoming More Complex</p><p>CacheWebserverDatabase</p><p>Legacy Monolitic App</p></li><li><p>But Some Things Are Becoming More Complex</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Service1Service2Service3</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Container-based App</p></li><li><p>But Some Things Are Becoming More Complex</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Container-based App</p><p>Service1Service2Service3</p></li><li><p>But Things Are Becoming More ComplexComputing Node</p><p>Computing Node</p><p>Computing Node</p><p>Service1Service2Service3</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Container-based App</p><p>Two Problems</p></li><li><p>Problem #1: How Do We Get Data Out of These Guys?</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Service1Service2Service3</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Container-based App</p><p> System Network Process JVM Response Time Requests Errors</p></li><li><p>Problem #2: How Do We Get Make Sense of the Data?</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Service1Service2Service3</p><p>Computing Node</p><p>Computing Node</p><p>Computing Node</p><p>Container-based App</p></li><li><p>Complexity Calls for Great Monitoring</p><p> Isolated Automated Orchestration-aware Simple Scalable </p></li><li><p>The Orchestrated Version of This</p></li><li><p>Complexity Also Calls for Great Troubleshooting</p><p>Whats the network activity of my </p><p>Marathon group?</p><p>Whats using the CPU the Wordpress </p><p>task?</p><p>How the hell does my Mesos task </p><p>work?!</p><p>Wheres the bottleneck?Whats the response </p><p>time of my login service?</p><p>What transactions is my Redis service serving?</p></li><li><p>Hypervisor</p><p>How Do I Get Data Out of These Things: VMs</p><p>VM1 VM3 VM2 </p></li><li><p>Hypervisor</p><p>Monitoring VMs, Option 1</p><p>VM1 VM3 VM2 </p><p>Hypervisor-level instrumentation, Amazon CloudWatch</p></li><li><p>Hypervisor</p><p>Monitoring VMs, Option 2</p><p>VM1 VM3 VM2 </p><p>Monitoring Agent</p></li><li><p>OS</p><p>Monitoring Containers</p><p>Container1 Container3 Container2 </p></li><li><p>OS</p><p>Monitoring Containers, Option 1</p><p>Container1 Container3 Container2 </p><p>Monitoring Agent</p></li><li><p>OS</p><p>Monitoring Containers, Option 1</p><p>Container1 Container3 Container2 </p><p>Monitoring Agent</p><p> Not scalable Not composable Adds dependencies/size Kills the concept of one process per container </p></li><li><p>OS</p><p>Monitoring Containers, Option 2</p><p>Container1 Container3 Container2 </p><p>Container runtime level monitoring Kernel-level instrumentation</p></li><li><p>OS</p><p>Monitoring Containers, Option 3</p><p>Container1 Monitoring Container</p><p>Container2 </p></li><li><p>Sysdig Data Collection</p><p>Kernel</p><p>Container1</p><p>Docker</p><p>Container2</p><p>Docker</p><p>Container3</p><p>LXCAppApp</p></li><li><p>Sysdig Data Collection</p><p>Kernel</p><p>Container1</p><p>Docker</p><p>Container2</p><p>Docker</p><p>Container3</p><p>LXCAppApp</p><p>Instrumentation through kernel module</p></li><li><p>Sysdig Data Collection</p><p>Kernel</p><p>Container1</p><p>Docker</p><p>Container2</p><p>Docker</p><p>Container3</p><p>LXCAppApp</p><p>sysdig</p><p>Docker</p><p>Capture and analysis</p></li><li><p>Sky cloud is the limit</p><p> Correlate data Scale with your infrastructure Alerts, notifications, visualization tools Continuous data collection and retention from production systems</p></li><li><p>Sysdig Cloud</p><p> Sysdig evolution for the cloud Preserve the premises production ready natural workflow ease of use 0 to low config needed</p></li><li><p>Out of the box support</p></li><li><p>Demo time 2</p></li><li><p>How About Security?</p><p>Did someone log into one of our containers?</p><p>Has something been installed in </p><p>one of the containers?</p><p>Have we been hacked?Were configuration files </p><p>changed?</p></li><li><p>How About Security?</p><p>Did someone log into one of our containers?</p><p>Have we been hacked?Were configuration files </p><p>changed?</p><p>Has something been installed in </p><p>one of the containers?</p></li><li><p>An anomaly detection system built on top of the sysdig engine</p></li><li><p>Falco Architecture</p><p>Kernel</p><p>Container1</p><p>Docker</p><p>Container2</p><p>rkt</p><p>Container3</p><p>LXCAppApp</p><p>Rule system</p><p>Docker</p><p> File activity Network Activity User Activity Process execution IPC </p></li><li><p>Rules Examples</p><p>rule: shell_in_container desc: a shell running in a container condition: container.id != host and proc.name = bash output: Shell running in container (user=%user.name container_id=%container.id container_name=%container.name shell=%proc.name parent=%proc.pname) priority: WARNING</p></li><li><p>Rules Examples</p><p>rule: mysqld_spawn_process desc: mysqld spawning a new process after startup. condition: spawn_process and proc.name = mysqld and not proc_is_new output: mysqld spawned new process after startup (user=%user.name command=%proc.cmdline file=%fd.name) priority: WARNING</p></li><li><p>Rules Examples</p><p>macro: open_connection condition: syscall.type=connect and evt.dir=&lt; and fd.sockfamily =ip </p><p>rule: system_binaries_network_activity desc: any network connection initiated by system binaries that are not expected to send or receive any network traffic condition: open_connection and proc.name in (ls, ps, mkdir, ) output: Known system binary made network connection (user=%user.name command=%proc.cmdline connection=%fd.name) priority: WARNING"</p></li><li><p>Thank You!www.sysdig.org </p><p>www.sysdig.org/falco</p><p>@alex_gallotta</p><p>@sysdig</p><p>github.com/draios </p><p>www.sysdig.com </p><p>http://www.sysdig.org/http://www.sysdig.org/falco/https://github.com/draios/sysdighttps://github.com/draios/sysdighttp://www.sysdig.com/</p></li></ul>