57
Skua: Extending Distributed Tracing Vertically into the Linux Kernel Harshal Sheth and Andrew Sun DevConf 2018 1

Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

  • Upload
    vuque

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua: Extending Distributed Tracing Vertically into the Linux Kernel

Harshal Sheth and Andrew Sun

DevConf 2018

!1

Page 2: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Distributed Systems• Complex applications are no longer

monolithic

• Modular/agile development

• Continuous deployment

• Independent scaling

!2

Page 3: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Distributed Systems• Complex applications are no longer

monolithic

• Modular/agile development

• Continuous deployment

• Independent scaling

• Increasingly seen in large companies

• Hard to debug

!2

Twitter, 2013

Page 4: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 5: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 6: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 7: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 8: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 9: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 10: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 11: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 12: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 13: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 14: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

Page 15: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

#*$@%!

Page 16: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Example Distributed System

!3

Web Results

Images

PageRank

VisualRank

Frontend

Legend

request

application within a distributed system

request pathway

??

?? ? ?

? ??

?

??

?

#*$@%!

Page 17: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Distributed Tracing

• Monitoring and troubleshooting distributed systems

• Discovering latency issues

• Graphing service dependencies

• Root-cause analysis of backend issues

• Tracing a specific request through the entire system

!4

Page 18: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

What does distributed tracing miss?• There’s more to performance than

meets the eye of existing distributed tracing tools

• Contention between applications

• Kernel bugs

• Security patches (e.g. Meltdown/Spectre)

• Can we gain visibility regarding these issues via the kernel?

!5

User Application

Distributed

Tracing Client

Kernel

Page 19: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Our goal: extend distributed tracing into the kernel

!6

Page 20: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Our Approach

!7

+ =Jaeger

distributed tracing framework from Uber

LTTngLinux kerneltrace toolkit

Skua

Page 21: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Traces and Spans

!8

Frontend

Tim

e

Frontend Request Trace ID:

1337 Parent ID:

(none) Span ID:

7893

Page 22: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Traces and Spans

!8

Frontend

Tim

e

Frontend Request Trace ID:

1337 Parent ID:

(none) Span ID:

7893

Page 23: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Traces and Spans

!8

Frontend

Tim

e

Context: trace ID

parent ID span ID sampled

Frontend Request Trace ID:

1337 Parent ID:

(none) Span ID:

7893

Page 24: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Web Results Trace ID: 1337 Parent ID: 7893 Span ID: 3460

Traces and Spans

!8

Web Results

Frontend

Tim

e

Context: trace ID

parent ID span ID sampled

Frontend Request Trace ID:

1337 Parent ID:

(none) Span ID:

7893

Page 25: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Web Results Trace ID: 1337 Parent ID: 7893 Span ID: 3460

Traces and Spans

!8

Web Results PageRank

Frontend

Tim

e

Context: trace ID

parent ID span ID sampled

PageRank Trace ID: 1337 Parent ID: 3460 Span ID: 1231Frontend

Request Trace ID:

1337 Parent ID:

(none) Span ID:

7893

Page 26: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Web Results Trace ID: 1337 Parent ID: 7893 Span ID: 3460

Traces and Spans

!8

Web Results

Images

PageRank

Frontend

Tim

e

Context: trace ID

parent ID span ID sampled

PageRank Trace ID: 1337 Parent ID: 3460 Span ID: 1231

Images Trace ID: 1337 Parent ID: 7893 Span ID: 8652

Frontend Request Trace ID:

1337 Parent ID:

(none) Span ID:

7893

Page 27: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

VisualRank Trace ID: 1337 Parent ID: 8652 Span ID: 3460

Web Results Trace ID: 1337 Parent ID: 7893 Span ID: 3460

Traces and Spans

!8

Web Results

Images

PageRank

VisualRank

Frontend

Tim

e

Context: trace ID

parent ID span ID sampled

PageRank Trace ID: 1337 Parent ID: 3460 Span ID: 1231

Images Trace ID: 1337 Parent ID: 7893 Span ID: 8652

Frontend Request Trace ID:

1337 Parent ID:

(none) Span ID:

7893

Page 28: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

VisualRank Trace ID: 1337 Parent ID: 8652 Span ID: 3460

Web Results Trace ID: 1337 Parent ID: 7893 Span ID: 3460

Traces and Spans

!8

Web Results

Images

PageRank

VisualRank

Frontend

Tim

e

Context: trace ID

parent ID span ID sampled

PageRank Trace ID: 1337 Parent ID: 3460 Span ID: 1231

Images Trace ID: 1337 Parent ID: 7893 Span ID: 8652

Frontend Request Trace ID:

1337 Parent ID:

(none) Span ID:

7893

🎲

Page 29: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

VisualRank Trace ID: 1337 Parent ID: 8652 Span ID: 3460

Web Results Trace ID: 1337 Parent ID: 7893 Span ID: 3460

Traces and Spans

!8

Web Results

Images

PageRank

VisualRank

Frontend

Tim

e

Context: trace ID

parent ID span ID sampled

PageRank Trace ID: 1337 Parent ID: 3460 Span ID: 1231

Images Trace ID: 1337 Parent ID: 7893 Span ID: 8652

Frontend Request Trace ID:

1337 Parent ID:

(none) Span ID:

7893

Span Aggregator

🎲

Page 30: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Jaeger

!9

https://eng.uber.com/distributed-tracing/

Page 31: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Page 32: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Jaeger Framework

Page 33: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Linux Kernel

Jaeger Framework

sysc

alls

Kernel Events

Page 34: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Linux Kernel

LTTng Kernel Modules

Jaeger Framework

sysc

alls

Kernel Events

Page 35: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Linux Kernel

Kernel Module using procfs

LTTng Kernel Modules

Jaeger Framework

sysc

alls

Kernel Events

Page 36: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Linux Kernel

Kernel Module using procfs

LTTng Kernel Modules

Jaeger Framework

task_structsy

scal

ls

Kernel Events

Page 37: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Linux Kernel

Kernel Module using procfs

LTTng Kernel Modules

Jaeger Framework

task_structsy

scal

ls

Kernel Events

Page 38: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Linux Kernel

Kernel Module using procfs

LTTng Kernel Modules

LTTng Adapter

Jaeger Framework

task_structsy

scal

ls

Kernel Events

Page 39: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Linux Kernel

Kernel Module using procfs

LTTng Kernel Modules

LTTng Adapter

Jaeger Framework

task_structsy

scal

ls

Kernel Events

Page 40: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Design

!10

User ApplicationJaeger Client

Linux Kernel

Kernel Module using procfs

LTTng Kernel Modules

LTTng Adapter

Jaeger Framework

task_structsy

scal

ls

Kernel Events

2

1

6

4

3

6

5

Page 41: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Skua Details• Jaeger C++ client sends its context into the kernel

• Treats the Linux kernel as the next level of the span hierarchy

• Each syscall is considered a span

• Tracepoint events become span logs

• LTTng kernel modules tag each span and log with the context information

• Custom adapter sends kernel data into the Jaeger

!11

80 LOC

25 LOC

250 LOC

Page 42: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Evaluation

!12

Page 43: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

CorrectnessSetup

• Small C++ program

• Spawns a few threads

• Makes 10 different syscalls

• Verifies that Skua is correctly recording syscalls

!13

Results

• Syscalls recorded in Jaeger as spans

• Misses a few syscalls

• vDSO — gettimeofday

• LTTng instrumentation

• Tracepoint events recorded properly as logs

Page 44: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Performance Benchmarks• Tests run on KVM virtual machine assigned 24 vCPUs (2 × Intel Xeon X5670), 16 GB

RAM, Linux kernel 4.15.14 with Skua modifications

• Traced 0.1% of requests

!14

Benchmark Scenario Program Instrumentation Kernel Tracing via Modified LTTng

No Tracing None No

Unmodified Jaeger Original Jaeger client No

Jaeger + procfs Jaeger client modified to send trace context into kernel No

LTTng None Yes, but output filtered using LTTng filters

Skua Jaeger client modified to send trace context into kernel Yes, output sent to adapter

Page 45: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Tiny HTTP Server

• Created a small C++ Web server using uWebSockets

• Used benchmarking tool autocannon

• Sent 1 million GET requests using 10 connections

• Evaluated throughput and latency under different tracing scenarios

!15

Page 46: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

!16

Web Server ThroughputRe

ques

ts/S

ec

0

3000

6000

9000

12000

Benchmark Scenario

No Trac

ing

Unmod

ified Ja

eger

Jaeg

er + proc

fsLT

Tng

Skua

1020510310111111162911628

Web Server Latency

Late

ncy

(ms)

0

0.15

0.3

0.45

0.6

Benchmark Scenario

No Trac

ing

Unmod

ified Ja

eger

Jaeg

er + proc

fsLT

Tng

Skua

0.53

0.410.45

0.360.33

4.0%11.3% 12.2%

0.0%

+0.12+0.08

+0.20

+0.03

Page 47: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Fortunes Benchmark• Code stolen borrowed from TechEmpower’s Web Frameworks benchmark

• Retrieves list of fortunes from database and renders HTML page

• Uses Spring Boot, Kotlin, OpenJDK 10, PostgreSQL 10.4, OpenTracing integration

• “Real-world” Web application

• Similar benchmarking process, but autocannon run twice for JIT warmup and with 100 connections

!17

Page 48: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

!18

Page 49: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

!19

Fortunes ThroughputRe

ques

ts/S

ec

0

1350

2700

4050

5400

Benchmark Scenario

No Trac

ing

Unmod

ified Ja

eger

Jaeg

er + proc

fsLT

Tng

Skua

50265184535253525352Fortunes Latency

Late

ncy

(ms)

0

5

10

15

20

Benchmark Scenario

No Trac

ing

Unmod

ified Ja

eger

Jaeg

er + proc

fsLT

Tng

Skua

19.2318.6318.0418.0418.050.0%3.1%

6.1%0.0%

-0.01+0.58

+1.18-0.01

Page 50: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Performance Overheads• Unmodified Jaeger has a negligible impact on performance

• LTTng causes a moderate decrease in throughput and a small increase in latency

• This could be improved by enabling a subset of available instrumentation points

• Our modifications to Jaeger cause additional latency (depending on scenario)

• Performing syscalls to propagate the trace context is expensive

• Ingestion of kernel events is more work

• In the tiny HTTP benchmark, Web server transactions took under 1ms each, causing the latency impacts to appear large by comparison

!20

Page 51: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Future Work

• Improve performance

• Simplify installation process

• Adaptive sampling reconfiguration

• Attempt tracing Ceph with Skua

!21

Logging What Matters: Just-In-Time Instrumentation And Tracing

Lily Sturmann, Emre Ates Friday, Aug. 17 4:30pm

Tracing Ceph using Jaeger-Blkkin Mania Abdi

Saturday, Aug. 18 12:00pm

Page 52: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Conclusions

• Can use distributed tracing to monitor and debug complex distributed systems

• Current open source distributed tracing frameworks miss kernel information

• Skua integrates LTTng kernel data with Jaeger tracing

• Skua has some impact on throughput and latency

!22

https://github.com/docc-lab/skua

Page 53: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Acknowledgements

• Raja Sambasivan - Mentor

!23

Page 54: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Backup Slides

!24

Page 55: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Correctness Tester Syscalls• getpid

• getppid

• gettid

• gettimeofday

• nanosleep

• open

• close

• write

• fstat

• futex

!25

Page 56: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

!26

Page 57: Skua: Extending Distributed Tracing Vertically into the ... · • Treats the Linux kernel as the next level of the span hierarchy • Each syscall is considered a span • Tracepoint

Existing Tracing Frameworks

• Dapper (2010) — Google

• Zipkin (2012) —Twitter

• Canopy (2017) — Facebook

• Jaeger (2017) — open sourced by Uber

!27