41
Tracing the Control Flow of Read/Write Operations in the Linux Kernel Matt Weir

Tracing the Control Flow of Read/Write Operations in the Linux Kernel

  • Upload
    aldis

  • View
    46

  • Download
    1

Embed Size (px)

DESCRIPTION

Tracing the Control Flow of Read/Write Operations in the Linux Kernel. Matt Weir. Our Original Goal. To create a data logging system across the kernel with accurate timing that will monitor data as it moves up and down the data path. The Feasibility of that Goal. The Current Goal. - PowerPoint PPT Presentation

Citation preview

Page 1: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Tracing the Control Flow of Read/Write Operations in

the Linux KernelMatt Weir

Page 2: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Our Original Goal

• To create a data logging system across the kernel with accurate timing that will monitor data as it moves up and down the data path.

Page 3: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The Feasibility of that Goal

Page 4: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The Current Goal

• Produce a framework that will assist in tracing the control flow of read/write operations in the Linux kernel using kernel markers

Page 5: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

A Brief History of the Project

Page 6: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Week of May 26th3rd Week of Class

• Created our group• Decided upon our basic goals• Did research on previous efforts into this field

Page 8: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Week of June 2nd4th Week of Class

• Talked to Dr. Wang and various graduate students to try and figure out how file IO works in Linux

• This is a generalization from my own imperfect ability to fully follow the conversations but...– There’s a lot of mystery about how the current

version of Linux really works.• Started playing around with printk

Page 9: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

First Experience using prinkJun 2 12:20:30 device85 kernel: DATATAGGING: Someone called kmalloc(message repeated 212341 times)

• Not so bad

Page 10: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Second Experience using prink

• Decided to add a timestampJun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123112321Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123128742Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123132342Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123132323Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123141424Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123164353Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123164353Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123173433Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123185454Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123198567Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123206566Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123209877Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123213421Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123223167Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123228744Jun 2 12:20:30 device85 kernel: DATATAGGING: "Someone called kmalloc at 123232148

Page 11: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Size of the Log File

Page 12: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

In Defense of printk

• When I added them to multiple functions, it does show the control flow

• Can grep through the log file to get a smaller snapshot of what is going on

• No noticeable performance issues from the user standpoint

• They work

Page 13: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

That being said…• It is hard to manage a large number of them• Adding/removing printks is time consuming• They require an external structure to turn

them “on/off” during run time– Didn’t even think of this option until I was using

markers• When inserting them make sure you don’t add

one right after an “if” statement that doesn’t use {}

Page 14: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Week of June 9th5th Week of Class

• Decided to move from printk to markers• Upgraded my kernel version to 2.6.26.6 so

that we would be using the same code• Dr. Baker walked through the control flow of

read statements with us• Figured out how to implement markers and

designed some basic test cases

Page 15: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Markers

• Added very recently to the Linux kernel• In the creator’s own words– “It makes sense to offer an instrumentation set of

the most relevant events occurring in the Linux kernel that can have the smallest performance cost possible when not active while not requiring a reboot of a production system to activate”

Page 16: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Adding Marker Support• In menuconfig– General->Activate markers

Page 17: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The Marker Structure

• The Marker– A hook in the code to call a function in an attached

probe• The Probe– A function that can be attached to markers

• The Manager– A kernel module that manages/arms and disarms

probes

Page 18: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Friday Night

• Started to worry since all we had was glorified prink’s

• Decided to have a few drinks…

Page 19: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Came up with an Idea

• Focus on the marker management kernel module

• Modify the marker code to support finer grained logging

• Try to trace the control flow in read/write statements

Page 20: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

PirateAcorn• The management kernel module that I wrote

Page 21: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

PirateAcorn (continued)

• Acorn– Counter Intelligence term: Slang for someone who

is performing traffic analysis• Pirate– Because they are way cooler than ninjas

Page 22: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

PirateAcorn (continued)

• Manages all the probes via ioctl commands– Breaks up the probes into read and write groups– Can enable them individually or at the same time– Supports the ability to have additional groups

added to it– Can turn off monitoring for certain threads, such

as other logging programs– Can be set to monitor a specific thread or all

threads

Page 23: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Registering Probes

Page 24: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Arming the Probes

Page 25: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The Probe Function

Page 26: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Modification to marker.c

• Needed to add support so it would only fire if the marker was called by a thread that is being logged

• Didn’t want to put the check in the probe function since that was called only after the marker fires

• Instead made a quick function that checks to see if a marker should fire

Page 27: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The Marker Check Code

Page 28: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The Marker Code

Page 29: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Adding Markers• For the most part I concentrated on mapping

the VFS layer and the File system

Page 30: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Finding the current PID

• Most of the time it was easy– current->pid

• But in some cases I wasn’t allowed to reference current– seq_read() in linux/fs/seq_file.c– Called by vfs_read() as file->f_op->read

Page 31: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Possible Solutions• Just print out all calls to seq_read(), and filter

them out when processing the log file• Don’t bother to log seq_read() at all• Implement a binary value in marker.c that is set

true when a previous marker is allowed to fire, and false when a marker is denied

• Create a wrapper function• Include the PID value in a structure that is

already being passed to it

Page 32: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Well what values are passed to it?

• File->f_op->read(file, buf, count, pos)– Count and pos are integer values– Buf is the buffer from the user, really don’t want

to mess with that– What about file?

Page 33: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The File Structure

Page 34: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Yes it’s a Bad Idea

• But what would happen if I added a PID field?

Page 35: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

Answer:PIRATEACORN: sys_read_start: Pid=3055 Time=500170187414PIRATEACORN: vfs_read_start: Pid=3055 Time=500170192998PIRATEACORN: vfs_read_fop_read: Pid=3055 Time=500170196783PIRATEACORN: seq_read_start: Pid=3055 Time=500170200492PIRATEACORN: sys_read_end: Pid=3055 Time=500170239089

Yes, though I can see possible issues with this implementation, the biggest being multiple threads accessing the same file

Page 36: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The Mystery Call

• This one still has me stumped• For several threads, such as metacity I hit a

roadblock when I trace their read control flow

Page 37: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The Mystery Call (continued)

• The current control flow goes– sys_read_start– vfs_read_start– vfs_read_fop_aioread– do_sync_read_start– do_sync_read_forloop– do_sync_read_end– sys_read_end

Page 38: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

The Mystery Call (continued)

• I loose the trace in the following call in do_sync_read– Filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos)

• This means there is a aio_read function associated with an f_op that I don’t know about

Page 39: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

It Should be Pretty Easy to Track that Down…

Page 40: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

My Current Guess

• There is an non standard kernel module that is installed and has its own aio_read handler

• Metacity is the gnome window manger– I can see it doing some funky stuff

• When adding markers I found lots of similar examples where the control flow didn’t go as I thought it would

Page 41: Tracing the Control Flow of Read/Write Operations in the Linux Kernel

QUESTIONS / COMMENTS?