20
Metrics Simplified Mark Lin [email protected]

Metrics simplified

Embed Size (px)

Citation preview

Page 1: Metrics simplified

Metrics SimplifiedMark Lin

[email protected]

Page 2: Metrics simplified

why?

"If you can not measure it, you can not improve it" -Lord Kelvin

99.999% ("five nines") = 5.26 minutes

Page 3: Metrics simplified

previously ...

Sending/Collecting is complicated. Single collection server. Tedious to configure new metric collection or creation.Calculating metric from file is expensive.

Page 4: Metrics simplified

bottlenecks ...

Poll based collection server

Not easy (!fun) to configure new metric collection or creation.

=grunt work for ops-engineer

uhhhh....

Page 5: Metrics simplified

enabling technology

Graphite

RabbitMQ

Graphite Local Proxy

RockSteady ( w/ Esper )

Page 6: Metrics simplified

path to graph

1min.juicer.output.apple.sc1.jcr1 20 1276822626

echo "1min.juicer.output.apple.sc1.jcr1 20 1276822626" | nc localhost 3400

Page 7: Metrics simplified

path to graph

1min.juicer.output.apple.sc1.jcr1 20 1276822626

echo "1min.juicer.output.apple.sc1.jcr1 20 1276822626" | nc localhost 3400

Page 8: Metrics simplified

graph

Page 9: Metrics simplified

graph

Page 10: Metrics simplified

graph

Page 11: Metrics simplified

graph = post event forensic

Page 12: Metrics simplified

Rocksteady, metric as event

1min.juicer.common.version.sc1.jcr1 100 1276822626 INSERT INTO Deploy SELECT * FROM Metric(name='common.revision') MATCH_RECORNIZE ( partition by colo, hostname measures A.value as revision, A.colo as colo, A.hostname as hostname, A.app as app, A.timestamp as timestamp pattern (A) define A as A.value > prev(A.value))

Page 13: Metrics simplified

Rocksteady, metric as event

1min.juicer.common.version.sc1.jcr1 100 1276822626 INSERT INTO Deploy SELECT * FROM Metric(name='common.revision') MATCH_RECORNIZE ( partition by colo, hostname measures A.value as revision, A.colo as colo, A.hostname as hostname, A.app as app, A.timestamp as timestamp pattern (A) define A as A.value > prev(A.value))

Page 14: Metrics simplified

auto threshold, prediction

Page 15: Metrics simplified

correlation

Deployment related problem.

Capture sets of metrics when important ones crossed threshold.

Determine dependencies such as cpu to request to second or response time.

Page 16: Metrics simplified

correlation

Deployment related problem.

Capture sets of metrics when important ones crossed threshold.

Determine dependencies such as cpu to request to second or response time.

Page 17: Metrics simplified

revelation

Page 18: Metrics simplified

beyond simple metric

Timing info per request.

Actual time spent in each component in an application.Map out dependency, find exact area of problem.

Page 19: Metrics simplified

beyond simple metric

Timing info per request.

Actual time spent in each component in an application.Map out dependency, find exact area of problem.

Page 20: Metrics simplified

what we learned?

1. Make metric sending simple.2. Nice UI to make sense of data.3. Real time processing of metric rocks.