30
Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH Zurich Mihai Budiu, Microsoft Research 1 Fay

Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

Embed Size (px)

Citation preview

Page 1: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

1

Extensible Distributed Tracing from Kernels to Clusters

Úlfar Erlingsson, Google Inc.Marcus Peinado, Microsoft Research

Simon Peter, Systems Group, ETH ZurichMihai Budiu, Microsoft Research

Fay

Page 2: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

2

Wouldn’t it be nice if…

• We could know what our clusters were doing?

• We could ask any question,… easily, using one simple-to-use system.

• We could collect answers extremely efficiently… so cheaply we may even ask

continuously.

Page 3: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

3

Let’s imagine...

• Applying data-mining to cluster tracing• Bag of words technique– Compare documents w/o structural knowledge– N-dimensional feature vectors– K-means clustering

• Can apply to clusters, too!

Page 4: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

4

Cluster-mining with Fay

• Automatically categorize cluster behavior, based on system call activity

Page 5: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

5

Cluster-mining with Fay

• Automatically categorize cluster behavior, based on system call activity – Without measurable overhead on the execution– Without any special Fay data-mining support

Page 6: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

6

Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; }

var kernelFunctionFrequencyVectors =

cluster.Function(kernel, “syscalls!*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr }) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}

Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}

Fay K-Means Behavior-Analysis Code

Page 7: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

7

var kernelFunctionFrequencyVectors =

cluster.Function(kernel, “syscalls!*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr }) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

Fay K-Means Behavior-Analysis Code

Page 8: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

8

Fay vs. Specialized Tracing

• Could’ve built a specialized tool for this– Automatic categorization of behavior (Fmeter)

• Fay is general, but can efficiently do– Tracing across abstractions, systems (Magpie)– Predicated and windowed tracing (Streams)– Probabilistic tracing (Chopstix)– Flight recorders, performance counters, …

Page 9: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

9

Key Takeaways

Fay: Flexible monitoring of distributed executions– Can be applied to existing, live Windows servers

1. Single query specifies both tracing & analysis– Easy to write & enables automatic optimizations

2. Pervasively data-parallel, scalable processing– Same model within machines & across clusters

3. Inline, safe machine-code at tracepoints– Allows us to do computation right at data source

Page 10: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

10

Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (|pt – c| < |pt – near|) near = c; return near; }

var kernelFunctionFrequencyVectors =

cluster.Function(kernel, “*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = MachineID(), Interval = w.Cycles / CPS, Function = w.CallerAddr}) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}

Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}

K-Means: Single, Unified Fay Queryvar kernelFunctionFrequencyVectors =

cluster.Function(kernel, “*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr}) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; }

Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}

Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}

Page 11: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

11

Fay is Data-Parallel on Cluster

• View trace query as distributed computation• Use cluster for analysis

Page 12: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

12

Fay is Data-Parallel on Cluster

System call trace events• Fay does early aggregation & data reduction• Fay knows what’s needed for later analysis

Page 13: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

13

Fay is Data-Parallel on Cluster

System call trace events• Fay does early aggregation & data reduction

K-Means analysis• Fay builds an efficient processing plan from query

Page 14: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

14

Fay is Data-Parallel within Machines

• Early aggregation• Inline, in OS kernel• Reduce dataflow & kernel/user transitions

• Data-parallel per each core/thread

Page 15: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

15

Processing w/o Fay Optimizations

• Collect data first (on disk)• Reduce later• Inefficient, can suffer data overload

K-Means: System calls K-Means: Clustering

Page 16: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

16

Traditional Trace Processing

• First log all data (a deluge)• Process later (centrally)• Compose tools via scripting

K-Means: System calls K-Means: Clustering

Page 17: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

17

Takeaways so far

Fay: Flexible monitoring of distributed executions

1. Single query specifies both tracing & analysis

2. Pervasively data-parallel, scalable processing

Page 18: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

18

Safety of Fay Tracing Probes

• A variant of XFI used for safety [OSDI’06]

– Works well in the kernel or any address space– Can safely use existing stacks, etc.– Instead of language interpreter (DTrace)– Arbitrary, efficient, stateful computation

• Probes can access thread-local/global state• Probes can try to read any address– I/O registers are protected

Page 19: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

19

Key Takeaways, Again

Fay: Flexible monitoring of distributed executions

1. Single query specifies both tracing & analysis

2. Pervasively data-parallel, scalable processing

3. Inline, safe machine-code at tracepoints

Page 20: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

20

Target

Installing and Executing Fay Tracing

• Fay runtime on each machine• Fay module in each traced address space• Tracepoints at hotpatched function boundary

Tracing Runtime

Fay

User-Space

Kernel

Probe

XFI

Createprobe

Hotpatching

query

ETW

200 cycles

Page 21: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

21

Low-level Code Instrumentation

Caller: ... e8ab62ffff call Foo ...

ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6 ccccccFoo2: 57 push rdi

...

c3 ret

Module with a traced function Foo

• Replace 1st opcode of functions

Page 22: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

22

Low-level Code Instrumentation

Caller: ... e8ab62ffff call Foo ...

ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6 ccccccFoo2: 57 push rdi

...

c3 ret

Module with a traced function Foo Fay platform module

Dispatcher: t = lookup(return_addr) ...

call t.entry_probes ...

call t.Foo2_trampoline ...

call t.return_probes ... return /* to after call Foo */

• Replace 1st opcode of functions• Fay dispatcher called via trampoline

Page 23: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

23

Low-level Code Instrumentation

PF5

PF3

PF4

Caller: ... e8ab62ffff call Foo ...

ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6 ccccccFoo2: 57 push rdi

...

c3 ret

Module with a traced function Foo Fay platform module

Dispatcher: t = lookup(return_addr) ...

call t.entry_probes ...

call t.Foo2_trampoline ...

call t.return_probes ... return /* to after call Foo */

Fay probes

XFI XFI

XFI

• Replace 1st opcode of functions• Fay dispatcher called via trampoline• Fay calls the function, and entry & exit probes

Page 24: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

24

• Fay adds 220 to 430 cycles per traced function • Fay adds 180% CPU to trace all kernel functions• Both approx 10x faster than Dtrace, SystemTap

What’s Fay’s Performance & Scalability?

Fay Solaris Dtrace

OS X Dtrace

Stap Linux

0

2000

4000

6000

8000

10000

Fay Solaris Dtrace

OS X Dtrace

Stap Linux

05

1015202530

2.8

17.2

26.7 CrashNull-probe overhead Slowdown (x)

Cycl

es

Page 25: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

25

Fay Scalability on a Cluster

• Fay tracing memory allocations, in a loop:– Ran workload on a 128-node, 1024-core cluster– Spread work over 128 to 1,280,000 threads– 100% CPU utilization

• Fay overhead was 1% to 11% (mean 7.8%)

Page 26: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

26

More Fay Implementation Details

• Details of query-plan optimizations• Case studies of different tracing strategies• Examples of using Fay for performance analysis

• Fay is based on LINQ and Windows specifics– Could build on Linux using Ftrace, Hadoop, etc.

• Some restrictions apply currently– E.g., skew towards batch processing due to Dryad

Page 27: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

27

Conclusion

• Fay: Flexible tracing of distributed executions

• Both expressive and efficient– Unified trace queries– Pervasive data-parallelism– Safe machine-code probe processing

• Often equally efficient as purpose-built tools

Page 28: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

28

Backup

Page 29: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

29

A Fay Trace Query

from io in cluster.Function("iolib!Read")where io.time < Now.AddMinutes(5)let size = io.Arg(2) // request size in bytesgroup io by size/1024 into gselect new { sizeInKilobytes = g.Key,

countOfReadIOs = g.Count() };

• Aggregates read activity in iolib module• Across cluster, both user-mode & kernel• Over 5 minutes

Page 30: Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH

30

A Fay Trace Query

from io in cluster.Function("iolib!Read")where io.time < Now.AddMinutes(5)let size = io.Arg(2) // request size in bytesgroup io by size/1024 into gselect new { sizeInKilobytes = g.Key,

countOfReadIOs = g.Count() };

• Specifies what to trace• 2nd argument of read function in iolib

• And how to aggregate• Group into kb-size buckets and count 1024 2048 4096 8192

0200040006000