Metrics Everywhere Codahale

Embed Size (px)

Citation preview

  • METRICS

    METRICS EVERYWHERESaturday, April 9, 2011

  • METRICS

    METRICS EVERYWHERESaturday, April 9, 2011

  • Make better decisions by using numbers.

    Saturday, April 9, 2011

  • Coda Hale@coda

    github.com/codahale

    Saturday, April 9, 2011

  • www.yammer.comThe enterprise social network.

    Saturday, April 9, 2011

  • I write code.

    Saturday, April 9, 2011

  • But thats notactually my job.

    Saturday, April 9, 2011

  • code

    Saturday, April 9, 2011

  • codebusiness

    value

    Saturday, April 9, 2011

  • What the hell is business value?

    Saturday, April 9, 2011

  • A new feature.

    Saturday, April 9, 2011

  • An improvedexisting feature.

    Saturday, April 9, 2011

  • Fewer bugs.

    Saturday, April 9, 2011

  • Not pissing our users off with a slow site.

    Saturday, April 9, 2011

  • Not pissing our users off with a slow site.

    ugly

    Saturday, April 9, 2011

  • Not pissing our users off with a slow site.

    uglypretty

    Saturday, April 9, 2011

  • Making futurechanges easier.

    Saturday, April 9, 2011

  • Adding a unit test before fixing that bug.

    Saturday, April 9, 2011

  • Business value is anything which makes people more likely to

    give us money.

    Saturday, April 9, 2011

  • We want to generate more business value.

    Saturday, April 9, 2011

  • We need to makebetter decisionsabout our code.

    Saturday, April 9, 2011

  • Our code generates business valuewhen it runs.

    Saturday, April 9, 2011

  • Our code generates business valuewhen it runs,

    not when we write it.

    Saturday, April 9, 2011

  • We need to knowwhat our code does

    when it runs.

    Saturday, April 9, 2011

  • We cant do this unless we measure it.

    Saturday, April 9, 2011

  • Why measure it?

    Saturday, April 9, 2011

  • territorymap

    Saturday, April 9, 2011

  • cityofSanFrancisco

    mapof

    SanFrancisco

    Saturday, April 9, 2011

  • thewayitis

    thewaywetalk

    Saturday, April 9, 2011

  • thethinginitself

    thething

    wethink of

    Saturday, April 9, 2011

  • realityperception

    Saturday, April 9, 2011

  • MIND THE GAP

    Saturday, April 9, 2011

  • We have amental model

    of what our code does.

    Saturday, April 9, 2011

  • Its a mental model.Its not the code.

    Saturday, April 9, 2011

  • It is often wrong.

    Saturday, April 9, 2011

  • Confusion.

    Saturday, April 9, 2011

  • This code cant possibly work.

    Saturday, April 9, 2011

  • (It works.)

    Saturday, April 9, 2011

  • MIND THE GAP

    Saturday, April 9, 2011

  • This code cant possibly fail.

    Saturday, April 9, 2011

  • (It fails.)

    Saturday, April 9, 2011

  • MIND THE GAP

    Saturday, April 9, 2011

  • Which is faster?

    Saturday, April 9, 2011

  • Which is faster?items.sort_by { |i| i.name }

    Saturday, April 9, 2011

  • Which is faster?items.sort_by { |i| i.name }

    items.sort { |a, b| a.name b.name }

    Saturday, April 9, 2011

  • We dont know.

    Saturday, April 9, 2011

  • We dont know.

    def sort_by(&blk) sleep(100) # FIXME: I AM POISON super(&blk)end

    Saturday, April 9, 2011

  • We dont know.

    def sort_by(&blk) sleep(100) # FIXME: I AM POISON super(&blk)end

    def sort(&blk) # TODO: make not explode raise Exception.new("Haw haw!")end

    Saturday, April 9, 2011

  • We cant know untilwe measure it.

    Saturday, April 9, 2011

  • This affects how we make decisions.

    Saturday, April 9, 2011

  • Our application is slow. This page takes 500ms.

    Fix it.

    Saturday, April 9, 2011

  • Find the bottleneck!

    Saturday, April 9, 2011

  • Find the bottleneck!

    SQL Query

    Saturday, April 9, 2011

  • Find the bottleneck!

    SQL Query

    Template Rendering

    Saturday, April 9, 2011

  • Find the bottleneck!

    SQL Query

    Template Rendering

    Session Storage

    Saturday, April 9, 2011

  • We dont know.

    Saturday, April 9, 2011

  • Find The Bottleneck 2.0!

    SQL Query

    Template Rendering

    Session Storage

    Saturday, April 9, 2011

  • Find The Bottleneck 2.0!

    SQL Query

    Template Rendering

    Session Storage

    53ms

    Saturday, April 9, 2011

  • Find The Bottleneck 2.0!

    SQL Query

    Template Rendering

    Session Storage

    53ms

    1ms

    Saturday, April 9, 2011

  • Find The Bottleneck 2.0!

    SQL Query

    Template Rendering

    Session Storage

    53ms

    1ms

    315ms

    Saturday, April 9, 2011

  • Find The Bottleneck 2.0!

    SQL Query

    Template Rendering

    Session Storage

    53ms

    1ms

    315ms

    Saturday, April 9, 2011

  • Confusion.

    Saturday, April 9, 2011

  • Saturday, April 9, 2011

  • We made a better decision.

    Saturday, April 9, 2011

  • We improve our mental model by measuringwhat our code does.

    Saturday, April 9, 2011

  • territorymap

    Saturday, April 9, 2011

  • territorymap

    Saturday, April 9, 2011

  • We use ourmental model

    to decide what to do.

    Saturday, April 9, 2011

  • A bettermental model

    makes us better at deciding what to do.

    Saturday, April 9, 2011

  • A bettermental model

    makes us better at generating

    business value.

    Saturday, April 9, 2011

  • Measuring makes your decisions better.

    Saturday, April 9, 2011

  • But only if were measuring

    the right thing.

    Saturday, April 9, 2011

  • We need to measure our code where it

    matters.

    Saturday, April 9, 2011

  • In the wild.

    Saturday, April 9, 2011

  • Generatingbusiness value.

    Saturday, April 9, 2011

  • Saturday, April 9, 2011

  • PRODUCTION

    Saturday, April 9, 2011

  • Continuously measuring code in production.

    Saturday, April 9, 2011

  • Metrics

    Saturday, April 9, 2011

  • MetricsJava/Scala

    Saturday, April 9, 2011

  • github.com/codahale/metrics

    MetricsJava/Scala

    Saturday, April 9, 2011

  • GaugesCountersMeters

    HistogramsTimers

    Saturday, April 9, 2011

  • Each metric is associated with a class

    and has a name.

    Saturday, April 9, 2011

  • An autocomplete service for city names.

    Saturday, April 9, 2011

  • An autocomplete service for city names.

    > GET /complete?q=San%20Fra

    Saturday, April 9, 2011

  • An autocomplete service for city names.

    > GET /complete?q=San%20Fra

    < HTTP/1.1 200 RAD86 billion values

    Saturday, April 9, 2011

  • 1,000 req/sec

    1,000 actions/req

    1 day

    =>86 billion values

    >640GB of data/day

    Saturday, April 9, 2011

  • 1,000 req/sec

    1,000 actions/req

    1 day

    =>86 billion values

    >640GB of data/dayNot gonna happen.

    Saturday, April 9, 2011

  • COGNITIVE HAZARD

    Saturday, April 9, 2011

  • Reservoir sampling.Keep a statistically representative sample

    of measurements as they happen.

    Saturday, April 9, 2011

  • Vitters Algorithm R.

    Vitter, J. (1985).Random sampling with a reservoir.

    ACM Transactions on Mathematical Software (TOMS), 11(1), 57.Saturday, April 9, 2011

  • time

    # ofcities

    Saturday, April 9, 2011

  • time

    # ofcities

    Saturday, April 9, 2011

  • time

    # ofcities

    Saturday, April 9, 2011

  • time

    # of cities

    Saturday, April 9, 2011

  • time

    # of cities

    Saturday, April 9, 2011

  • MIND THE GAP

    Saturday, April 9, 2011

  • Vitters Algorithm R produces uniform

    samples.

    Saturday, April 9, 2011

  • Recency.

    Saturday, April 9, 2011

  • SUPER-DUPERCOGNITIVE HAZARD

    Saturday, April 9, 2011

  • Saturday, April 9, 2011

  • Forward-decaying priority sampling.

    Cormode, G., Shkapenyuk, V., Srivastava, D., & Xu, B. (2009).Forward Decay: A Practical Time Decay Model for Streaming Systems.

    ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering.Saturday, April 9, 2011

  • Maintain a statistically representative sample of the last 5 minutes.

    Saturday, April 9, 2011

  • time

    # of cities

    Saturday, April 9, 2011

  • time

    # of cities

    Saturday, April 9, 2011

  • time

    # of cities

    Saturday, April 9, 2011

  • time

    # of cities

    Saturday, April 9, 2011

  • Uniform Biased

    Saturday, April 9, 2011

  • 95% of autocomplete results return 3 cities or

    less.

    Saturday, April 9, 2011

  • GaugesCountersMeters

    HistogramsTimers

    Saturday, April 9, 2011

  • GaugesCountersMeters

    HistogramsTimers

    Saturday, April 9, 2011

  • TimerA histogram of durations and

    a meter of calls.

    Saturday, April 9, 2011

  • # of ms to respond

    Saturday, April 9, 2011

  • val timer = metrics.timer("requests", MILLISECONDS, SECONDS)

    timer.time { handle(req, resp) }

    Saturday, April 9, 2011

  • val timer = metrics.timer("requests", MILLISECONDS, SECONDS)

    timer.time { handle(req, resp) }

    Saturday, April 9, 2011

  • val timer = metrics.timer("requests", MILLISECONDS, SECONDS)

    timer.time { handle(req, resp) }

    Saturday, April 9, 2011

  • val timer = metrics.timer("requests", MILLISECONDS, SECONDS)

    timer.time { handle(req, resp) }

    Saturday, April 9, 2011

  • val timer = metrics.timer("requests", MILLISECONDS, SECONDS)

    timer.time { handle(req, resp) }

    Saturday, April 9, 2011

  • At ~2,000 req/sec, our 99% latency jumps

    from 13ms to 453ms.

    Saturday, April 9, 2011

  • GaugesCountersMeters

    HistogramsTimers

    Saturday, April 9, 2011

  • GaugesCountersMeters

    HistogramsTimers

    Saturday, April 9, 2011

  • Now what?

    Saturday, April 9, 2011

  • Instrument it.

    Saturday, April 9, 2011

  • Instrument it.If it could affect your codes

    business value, add a metric.

    Saturday, April 9, 2011

  • Instrument it.If it could affect your codes

    business value, add a metric.Our services have 40-50 metrics.

    Saturday, April 9, 2011

  • Collect it.

    Saturday, April 9, 2011

  • Collect it.JSON via HTTP.

    Saturday, April 9, 2011

  • Collect it.JSON via HTTP.Every minute.

    Saturday, April 9, 2011

  • Monitor it.

    Saturday, April 9, 2011

  • Monitor it.Nagios/Zabbix/Whatever

    Saturday, April 9, 2011

  • Monitor it.Nagios/Zabbix/Whatever

    If it affects business value, someone should get woken up.

    Saturday, April 9, 2011

  • Aggregate it.

    Saturday, April 9, 2011

  • Aggregate it.Ganglia/Graphite/Cacti/Whatever

    Saturday, April 9, 2011

  • Aggregate it.Ganglia/Graphite/Cacti/Whatever

    Place current values in historical context.

    Saturday, April 9, 2011

  • Aggregate it.Ganglia/Graphite/Cacti/Whatever

    Place current values in historical context.See long-term patterns.

    Saturday, April 9, 2011

  • Go faster.

    Saturday, April 9, 2011

  • Shorten ourdecision-making cycle.

    Saturday, April 9, 2011

  • Observe

    Saturday, April 9, 2011

  • ObserveOrient

    Saturday, April 9, 2011

  • ObserveOrientDecide

    Saturday, April 9, 2011

  • ObserveOrientDecideAct

    Saturday, April 9, 2011

  • ObserveOrientDecideAct

    Saturday, April 9, 2011

  • Observe

    What is the 99% latency of our autocomplete service right now?

    Saturday, April 9, 2011

  • Observe

    What is the 99% latency of our autocomplete service right now?

    ~500ms

    Saturday, April 9, 2011

  • Orient

    How does this compare toother parts of our system,

    both currently and historically?

    Saturday, April 9, 2011

  • Orient

    How does this compare toother parts of our system,

    both currently and historically?

    way slower

    Saturday, April 9, 2011

  • Decide

    Should we make it faster?Or should we add feature X?

    Saturday, April 9, 2011

  • Decide

    Should we make it faster?Or should we add feature X?

    make it faster

    Saturday, April 9, 2011

  • Act!

    Write some code.

    Saturday, April 9, 2011

  • Act!

    Write some code.

    def sort_by(&blk) #sleep(100) # WTF DUDE super(&blk)end

    Saturday, April 9, 2011

  • 10 Print "Rinse"20 Print "Repeat"30 Goto 10

    Saturday, April 9, 2011

  • If we do this fasterwe will win.

    Saturday, April 9, 2011

  • Fewer bugs.

    Saturday, April 9, 2011

  • More features.

    Saturday, April 9, 2011

  • Happier users.

    Saturday, April 9, 2011

  • Money.Saturday, April 9, 2011

  • tl;dr

    Saturday, April 9, 2011

  • We might write code.

    Saturday, April 9, 2011

  • We have to generatebusiness value.

    Saturday, April 9, 2011

  • In order to know how well our code is generating

    business value, we need metrics.

    Saturday, April 9, 2011

  • GaugesCountersMeters

    HistogramsTimers

    Saturday, April 9, 2011

  • Monitor them for current problems.

    Saturday, April 9, 2011

  • Aggregate them for historical perspective.

    Saturday, April 9, 2011

  • territorymap

    Saturday, April 9, 2011

  • territorymap

    Saturday, April 9, 2011

  • Improve our mental model of our code.

    Saturday, April 9, 2011

  • MIND THE GAP

    Saturday, April 9, 2011

  • ObserveOrientDecideAct

    Saturday, April 9, 2011

  • If youre on the JVM, use Metrics.

    Saturday, April 9, 2011

  • If youre on the JVM, use Metrics.

    github.com/codahale/metrics

    Saturday, April 9, 2011

  • If not,you can build this.

    Saturday, April 9, 2011

  • Please build this.

    Saturday, April 9, 2011

  • Make better decisions by using numbers.

    Saturday, April 9, 2011

  • Thank you.

    Saturday, April 9, 2011

  • Saturday, April 9, 2011