Transcript
Page 1: Vayacondios: Divine into Complex Systems

Vayacondios: Divine into Complex Systems

Huston Hoburg & Flip Kromer Infochimps, a CSC Company

MongoDB Austin 2014 March 24th

Page 2: Vayacondios: Divine into Complex Systems

Infochimps• Big Data Platform for Large Companies

• Cloud::Queries (ElasticSearch, MongoDB, HBase)

• Cloud::Hadoop (Dynamic Hadoop)

• Cloud::Streams (Storm+Trident)

• Managed Service, Enterprise Features

• Recently sold to CSC, and it’s quite awesome

• We’re Hiring (natch)

Page 3: Vayacondios: Divine into Complex Systems

Vayacondios

• Built for our Visibility Stack…

• … but we think it has wider use

!

• “Data Goes In, the Right Thing Happens”

• Prompt, Comprehensive and Faithful

Page 4: Vayacondios: Divine into Complex Systems
Page 5: Vayacondios: Divine into Complex Systems
Page 6: Vayacondios: Divine into Complex Systems

Circulatory

Immune

Clotting

Page 7: Vayacondios: Divine into Complex Systems

OK, Glass

“OK Glass, Show me a skeuomorphism”

Page 8: Vayacondios: Divine into Complex Systems

Immune

Circulatory

Digestive

Respiratory

Page 9: Vayacondios: Divine into Complex Systems

Non-Numeric Metrics

Page 10: Vayacondios: Divine into Complex Systems

Target INR = 2-3

Low Platelets = H.I.T. (bad)

Heparin (Blood Thinner)

Page 11: Vayacondios: Divine into Complex Systems

Low Platelets

• Folic Acid, Vitamin B12

• Medication (Valproic Acid, Singulair, Heparin)

• Sepsis

• HIV

• (about three dozen others)

Page 12: Vayacondios: Divine into Complex Systems

Systems• Anatomical Systems: Circulatory, Immune, etc

• Interventions: Drugs, Surgeries, …

• Course of Treatment: topline progress indicators

• Diagnosis

• Practitioner

• Medical Devices

Page 13: Vayacondios: Divine into Complex Systems

ICU

• Model the patient, not the data source

• Highlight Interactions among systems

• Highlight Interactions among numbers

• Broaden your view of “systems”

Page 14: Vayacondios: Divine into Complex Systems

Monitoring Sucks

Page 15: Vayacondios: Divine into Complex Systems

Operations

Page 16: Vayacondios: Divine into Complex Systems

System != Machine• Whole-System MongoDB:

• Machines it runs on, Volumes it uses

• Systems writing to it

• Applications and Collections

• Data Files, Logs, Repl Sets, Oplog, Arbiters

• Codebase repo, Cookbooks, Configuration

• Issue Tracker Tickets, Change Events

Page 17: Vayacondios: Divine into Complex Systems

Operations• Cognitive model for Humans, not from Robots

• Go beyond the Time-series Graph

• Highlight Interactions

• Link to Systems that write to this DB

• Link to Github for Repos & Cookbooks

• Drill into System

• Issues in Issue Tracker

• Broaden your view of “systems”

Page 18: Vayacondios: Divine into Complex Systems

• 15 clients, 15 architectures

• < 1 operator per client, 2 continents

• 1500 machines in 150 clusters

• 30+ technologies (HBase, MongoDB, Storm, …)

• 4 Providers (AWS, Metal, VCE, OpenStack)

• 3 Virtualizations (AWS, VMWare, OpenStack)

• Max 21 minutes downtime / month (99.95% SLA)

Our Challenge

Page 19: Vayacondios: Divine into Complex Systems

Systems to Instrument• WholeSystems: ZookeeperSystem, ElasticsearchSystem, HbaseSystem, HadoopMapredSystem, HadoopHdfsSystem,

KafkaSystem, MysqlSystem, MysqlClientSystem, ListenerSetSystem, StormTridentSystem, MongodbSystem, NfsSystem, VayacondiosSystem, TachyonSystem, SplunkSystem, S3System, RdsSystem, PigSystem, HiveSystem, HueSystem

• Machines: ZookeeperMachine, ElasticsearchDatanodeMachine, HBaseRegionserverMachine, HBaseMasterMachine, HadoopDnttMachine, HadoopTtonlyMachine, HadoopNamenodeMachine, HadoopJobtrackerMachine, HadoopSecondaryNamenodeMachine, HadoopFailoverMonitorMachine, MysqlServerMachine, KafkaBrokerMachine, PlatformListenerMachine, StormBolterMachine, StormMasterMachine, MongodbMachine, NfsServerMachine, VayacondiosServerMachine, PlatformApiMachine, TachyonServerMachine, HueMachine

• Daemons: n, ElasticsearchDaemon, HbaseRegionserverDaemon, HbaseMasterDaemon, HadoopDatanodeDaemon, HadoopTasktrackerDaemon, HadoopNamenodeDaemon, HadoopJobtrackerDaemon, HadoopSecondaryNamenodeDaemon, HadoopFailoverDaemon, KafkaBrokerDaemon, MysqlDaemon, PlatformListenerDaemon, StormNimbusDaemon, StormUiDaemon, StormSupervisorDaemon, MongodbDatanodeDaemon, NfsServerDaemon, NtpDaemon, NfsClientDaemon, VayacondiosServerDaemon, TachyonServerDaemon, PlatformApiServerDaemon, HueBeeswaxDaemon

• Providers: AwsProvider, CloudTrailProvider, OpenstackProvider, VceProvider, ChefServerProvider, Route53Provider, ElbProvider

• Manifests: most of the above have a planned version and the realized version • Events: MachineLifecycle, CronJobLifecycle, ChefClientLifecycle • Build Artifacts:: FitDeployArtifact, DebArtifact, RpmArtifact, GemArtifact, AmiArtifact, OpenstackImageArtifact,

VceTemplateArtifact, NpmArtifact, TarballArtifact • PlatformApps: HadoopJobLifecycle (Hive, Pig, Wukong), TridentJobLifecycle, MountweaselLifecycle • OpsProcesses: IncidentLifecycle, ChangeRequestLifecycle, FiredrillLifecycle, GitCommitLifecycle, ProblemLifecycle (JIRA),

LunchladyLifecycle

Page 20: Vayacondios: Divine into Complex Systems

Vayacondios

• Visibility Stack for our operations team

• Open-sourcing this summer

• Internals in Ruby

• Access anywhere (HTTP or log file)

• MongoDB (but now please forget that fact)

Page 21: Vayacondios: Divine into Complex Systems
Page 22: Vayacondios: Divine into Complex Systems

Cognitive Model• MongoDB:

• is_a Data store

• has_many Network Services

• has_many Daemons

• has_many Machines

• has_many Volumes

• has_many Collections

• …etc

Page 23: Vayacondios: Divine into Complex Systems

Model DSL (domain-specific language)

Page 24: Vayacondios: Divine into Complex Systems

Model DSL (domain-specific language)

Page 25: Vayacondios: Divine into Complex Systems

Faithful• Whiteboard rule: how do folks talk about system?

• If you need it, it’s in the system

Prompt• As fast as joint laws of Economics & Physics allow

Comprehensive

Page 26: Vayacondios: Divine into Complex Systems
Page 27: Vayacondios: Divine into Complex Systems
Page 28: Vayacondios: Divine into Complex Systems

Biographizing Isn’t Pretty

Page 29: Vayacondios: Divine into Complex Systems
Page 30: Vayacondios: Divine into Complex Systems

Faithful to Source

• crap data => well-formed data

• uniform JSON-ready hash

• syntax cleaned up

• semantically unchanged

• encouraged to model it, but let Wookiee win

Page 31: Vayacondios: Divine into Complex Systems
Page 32: Vayacondios: Divine into Complex Systems

Write Contract

• Vaya Con Dios, “Go with God”. As the kingdom of heaven is unknowable, so is further fate of data:

• How used

• By Whom

• How Processed

• Where Stored

Page 33: Vayacondios: Divine into Complex Systems
Page 34: Vayacondios: Divine into Complex Systems

Reporters/Reports

• Assemble Biographies into Reports

• Faithful to application

• Don’t know when will be run, why, etc

Page 35: Vayacondios: Divine into Complex Systems

Presentation

Page 36: Vayacondios: Divine into Complex Systems

Dashboarding

Page 37: Vayacondios: Divine into Complex Systems

text metrictext metric

text metric text metrictext metric text metric text metric

text metric

Model-Driven Templates

Page 38: Vayacondios: Divine into Complex Systems

Repeatable Partials

Page 39: Vayacondios: Divine into Complex Systems

Model/Presenter/View

• Report == Model

• Reporter == Presenter

• Dashboard .xml == View

Page 40: Vayacondios: Divine into Complex Systems

Model/Presenter/View

• More targets that just dashboard!

• Splunk+PagerDuty Alerts

• Cucumber tests

• Auditing reports (Security, Good Manners)

Page 41: Vayacondios: Divine into Complex Systems

System Checks

• Correctness, Consistency

• Attached Directly to the Model

• No worthwhile distinction between QA (integration tests) and live Alerts

• Drive Splunk+Pager Duty for Alerts

• Author Cucumber specs(!) for QA tests

Page 42: Vayacondios: Divine into Complex Systems

Safe Systems

Page 43: Vayacondios: Divine into Complex Systems

System Drift

• Cognitive Model

• Discoverable Interface

• Testable Contract

Page 44: Vayacondios: Divine into Complex Systems

Inevitability• If configured and reported, consistency checks

• If reported, dashboard exists

• If is_a generic system (eg filesystem), gets correctness tests (eg “capacity < 75%”)

• If system A discovers system B:

• dashboard has link from A to B

• connectivity & security checks from A to B

Page 45: Vayacondios: Divine into Complex Systems

Interaction

• Monitoring systems do a terrible job here

• Hard sources of failure:

• Drift conceived != realized

• Interaction unexpected consequences

• Change oops

Page 46: Vayacondios: Divine into Complex Systems

Application Design

Page 47: Vayacondios: Divine into Complex Systems

Application Design• Visibility into complex systems:

• Biography of raw parts (raw Model) => Reporter (Presenter) => Summary of Systems (View-ready Model)

• Database-driven Application • Model =>

Presenter =>View

Page 48: Vayacondios: Divine into Complex Systems

Simple Blog

Page 49: Vayacondios: Divine into Complex Systems

Blog: Views

Author Page

Post Page

Index Page

Page 50: Vayacondios: Divine into Complex Systems

Blog: ViewsAuthor Page

Post Page

Index Page

PostSynopsisReport

PostReport

UserReport

CommentReport

Page 51: Vayacondios: Divine into Complex Systems

“Query on the way In”!

• New/Updated Post: Update Post triggers…

• Update PostReport

• Update SynopsisReport

• Update UserReport

Page 52: Vayacondios: Divine into Complex Systems

“Query on the way In”!

• User fullname changes: Update User triggers…

• Update UserReport

• Update their SynopsisReports

• Update their PostReports

• Update their CommentReports

Page 53: Vayacondios: Divine into Complex Systems

Vayacondios Contract

Page 54: Vayacondios: Divine into Complex Systems

Faithful• Whiteboard rule: how do folks talk about system?

• If you need it, it’s in the system

Prompt• As fast as joint laws of Economics & Physics allow

Comprehensive

Page 55: Vayacondios: Divine into Complex Systems

Faithful• Single concern: subject of the biography

• look at what’s offered, look at what reports need

Prompt• Run as often as needed (not your concern)

Comprehensive

Page 56: Vayacondios: Divine into Complex Systems

Faithful• One Reporter per Application (*) & Topic

• USCE Method: Utiliz’n, Saturat’n, Connections, Errors

Prompt• Run as often as needed (not your concern)

Comprehensive

Page 57: Vayacondios: Divine into Complex Systems

Benefits

• Separation of concerns:

• Source complexity (API, parsing, translation)

• Timing

• Transport

• Individual Applications

• Reliability

Page 58: Vayacondios: Divine into Complex Systems

Benefits

• Separation of concerns: Source, Timing, Transport, Individual Applications, Reliability

• No external libraries in application

• Uniform access times

• Reduce risk from multiple-dependencies

Page 59: Vayacondios: Divine into Complex Systems

So What?

• There’s not much to it: shims and conventions

• VCD is not MongoDB

• just like MongoDB is not mmap tables

• Power through constraint:


Recommended