21
Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors : Prof. Shlomi Dolev Dr. Reuven Yagel

Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

Embed Size (px)

Citation preview

Page 1: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

Self stabilizing Linux Kernel Mechanism

Doron Mishali, Alex Plits

Supervisors:

Prof. Shlomi Dolev

Dr. Reuven Yagel

Page 2: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The Linux Kernel• The kernel is the central component of most

computer operating systems. Its responsibilities include managing the system's resources (the communication between hardware and software components).

• The main task of the kernel is to allow the execution of applications and support them with features such as hardware abstractions.

Page 3: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The Kernel (Cont.)• The kernel also responsible of the high level

scheduler, the one who decides which processes will be in the memory and which on the disk.

• The process is the main kernel abstraction, it defines which memory portions the application can access.

Page 4: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The SchedulerThe low level scheduler decides which of the

ready, in-memory processes are to be executed (allocated a CPU) next following a clock interrupt, an IO interrupt, an operating system call or another form of signal.

In the project we will deal with the Completely Fair Scheduler implementation of the Linux Scheduler.

Page 5: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

RB-Tree as Scheduler data structure

The run queue is kept sorted by the Runnable threads' virtual runtimes by storing it in a Red-Black Tree, which is a variant of a binary search tree. When the scheduler decides to switch threads, it switches to the leftmost thread in the red-black tree, that is, the one with the earliest virtual runtime.

Page 6: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The RB-Tree

Page 7: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The RB-Tree (Cont.)The RB-Tree data structure must keep on the

following properties:• Every node is either black or red.• All leafs are black.• Both children of a red node are black.• Every path from a node to a leaf should have

the same count of black nodes.

Page 8: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The RB-Tree (Cont.)Each violation of one of those properties can

cause a crash in the next tree operation which means, possible corruption of data!

In order to keep track on these properties we are executing our mechanism which keeps a close eye on this data structure by running frequent tests on it upon a corruption detection, The program can self stabilize the system which means – Auto recovery.

Page 9: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

Self Stabilizing• Self-stabilization is a concept of fault-tolerance in

distributed computing. • A self-stabilizing system will end up in a correct state

no matter what state it is initialized with, and no matter what execution steps it will take.

• The ability to recover without external intervention is very desirable in modern computer since it would enable them to repair errors and return to normal operations on their own. Computers can thus be made fault-tolerant.

Page 10: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The projectGoal

Detect & Recover from a corruption of the Scheduler’s RB-Tree

data structure.

Method • Perform series of tests on the Scheduler at the following scenarios:

Periodic (i.e. every couple of time units). On a Page Fault occurrence(which may indicate memory corruption).

The tests include legitimacy tests and memory tests.

• In case of a failure we will stabilize the system – by Auto recovering which is done by rebuilding the data structure from currently running processes which in turn will let the processes run normally on the next scheduling.

Page 11: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The project – Some examples

In this example one of the nodes in the tree points to some garbage in memory, A thing that definitely can happen in an operating system.

Page 12: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

The project – Some examples

The next figure demonstrates a case when the root changed it’s color from black (mandatory on RB-tree) to red. This will probably cause a corrupt on the next insertion of the process to the tree which can end up in a wrong structure of the tree in the best case or a system crash in the worst case.

Page 13: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

Running the recovery• After detecting the inconsistency, we run the

recovery procedure which yields the following results for the above examples:

For example 1 For example 2

Page 14: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

Project ProcessThe project was divided into 3 stages: • In depth understanding on how the CFS

kernel scheduler works and the data structures it uses.

• Understanding the self stabilizing mechanism• “Hacking” the Linux Kernel,

changing/adding data structures and interaction with existing kernel code.

Page 15: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

Difficulties:• Coding inside the Kernel

– Data structure's encapsulation (Modules vs. Embedded Code).– Synchronization (Spin Locks, Scheduler preemption).– Enormous amount of interconnectivity.

• Debugging the kernel – (our advice - AVOID IT IF YOU CAN)– Embedded Debugger KDB ( new versions support KGDB ).– UML (User Mode Linux) vs. VMware.– Asynchronous system ( Interrupts & Exceptions ).

Page 16: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

Demonstration – corruption without recovery

Page 17: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel
Page 18: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

Demonstration – corruption with recovery

Page 19: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel
Page 20: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

References and Utilities:

S. Dolev - Self stabilization, the MIT Press, 2000.

Understanding the Linux Kernel - Daniel P. Bovet & Marco Cesati.

Kernel Hacking - Kong, Joseph. KDB – embedded kernel debugger. VMware – virtual machine for the

simulations. UML – User Mode Linux (embedded VM).

Page 21: Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel

Q & A