51
distributed tracing

distributed tracing in 5 minutes

Embed Size (px)

DESCRIPTION

lightning talk from surge 2012

Citation preview

Page 1: distributed tracing in 5 minutes

distributed tracing

Page 2: distributed tracing in 5 minutes

twitter zipkingoogle dapper

x-tracetracelytics... more!

Page 3: distributed tracing in 5 minutes

motivation

Page 4: distributed tracing in 5 minutes

what is slow?

Page 5: distributed tracing in 5 minutes

what is slow?

Page 6: distributed tracing in 5 minutes

causal flow of control

Page 7: distributed tracing in 5 minutes

causal flow of control

Page 8: distributed tracing in 5 minutes

how to

Page 9: distributed tracing in 5 minutes

possible approaches

Page 10: distributed tracing in 5 minutes

possible approaches•Unique identifier

Page 11: distributed tracing in 5 minutes

possible approaches•Unique identifier•propagate throughout

Page 12: distributed tracing in 5 minutes

possible approaches•Unique identifier•propagate throughout•write instrumentation for various

transports

Page 13: distributed tracing in 5 minutes

possible approaches•Unique identifier•propagate throughout•write instrumentation for various

transports

Page 14: distributed tracing in 5 minutes

possible approaches•Unique identifier•propagate throughout•write instrumentation for various

transports

•Observe and correlate

Page 15: distributed tracing in 5 minutes

possible approaches•Unique identifier•propagate throughout•write instrumentation for various

transports

•Observe and correlate•always on the outside - black box

Page 16: distributed tracing in 5 minutes

possible approaches•Unique identifier•propagate throughout•write instrumentation for various

transports

•Observe and correlate•always on the outside - black box•difficult to get threaded + evented

processes right

Page 17: distributed tracing in 5 minutes

1BD57B58AE7E315BBEAB6795F0BDC198296357

Page 18: distributed tracing in 5 minutes
Page 19: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 20: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 21: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 22: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 23: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 24: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 25: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 26: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 27: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 28: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 29: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 30: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 31: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 32: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

Page 33: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

t = end

Page 34: distributed tracing in 5 minutes

piggyback rides•More Doable•HTTP: x-headers• Thrift: secret argument• Internal RPC protocol: you’re the

boss

• Less Doable• SQL: one way ticket, also you’re

not percona•memcache: not extensible so not

backwards compatible

Page 35: distributed tracing in 5 minutes

nginx

pythoncache

db internet

the java

t = start

t = end

Page 36: distributed tracing in 5 minutes

timing and structure• Timing• distributed = clock skew

• Structure -- two approaches• Encode in ID• Encode in back-pointers

Page 37: distributed tracing in 5 minutes

encode in ID?• nginx1• nginx1python1• nginx1python1cache1• nginx1python1cache1python2• nginx1python1cache1python2sql

1• nginx1python1cache1python2sql

1python3• ...

Page 38: distributed tracing in 5 minutes

encode in back-pointer?

nginx python cache python

Page 39: distributed tracing in 5 minutes

reporting

Page 40: distributed tracing in 5 minutes

reporting

Page 41: distributed tracing in 5 minutes
Page 42: distributed tracing in 5 minutes
Page 43: distributed tracing in 5 minutes
Page 44: distributed tracing in 5 minutes
Page 45: distributed tracing in 5 minutes
Page 46: distributed tracing in 5 minutes
Page 47: distributed tracing in 5 minutes
Page 48: distributed tracing in 5 minutes
Page 49: distributed tracing in 5 minutes

other things worth figuring out

• sampling

• reporting

• aggregate analysis