Linux Locking Mechanisms

Linux locking mechanismsMark Veltzer

[email protected]

Who am I?

● Linux kernel hacker● Current maintainer of gnu grep(1)● Free Source evangelist● CTO of Hinbit● Political philosopher (checkout my book “ןוטלשלטון

(...at book stores near you ”ההמון● Jazz piano player

Why locking?

● To avoid race conditions in accessing shared memory.● These occur because of: user space pre-emption which is

based on timer interrupts (userspace), multi-core (userspace), interrupts in general (kernelspace), multi-core (kernelspace).

● Locking is not the only way to avoid such race conditions● But this presentation is about locking and only about locking...● In general locking is bad because it blocks your programs from

executing and so slows your program● Avoid it when you can.

Avoiding locking - techniques

● Have each thread/CPU have it's own data.● Use atomic operations (hardware) instead of locking (software).● Lock free programming.● RCU/COW.● Readers/Writer locks.● Not using the shared memory model but rather the actor model

for multi-processing/multi-threading.● And many more techniques.● Alas, we are here to talk about locking.

User space vs kernel space locking

● Is completely different● Different mechanisms, different performance

considerations, different API● But ultimately they work in concert.

User space locking

User space locking mechanisms

● Are not allowed to block interrupts. Ever!● This is derived from the definition of what a secure operating system

is.● If you have code in the kernel you can expose an API to user space

to block and allow interrupts.● This is considered a bad idea.● First of all because it allows user space bugs to lock up your

system.● Second because it interferes with other kernel mechanisms (like

watchdogs, RCU and more).● DON'T DO IT!

User space locking primitives

● pthread Spin lock● Futex● pthread mutex● pthread Readers/writer lock.● POSIX semaphore● SYS V semaphore

User space spin lock - intro

● Is implemented as a simple TAS/CAS loop with CPU relaxing and memory barrier.

● Pure user space implementation.● DOES NOT DISABLE INTERRUPTS!● Did I mention that it DOES NOT DISABLE

INTERRUPTS?!?● It is interesting to note that IT DOES NOT DISABLE

INTERRUPTS.● And finally note that NO INTERRUPTS ARE DISABLED

User space spin lock - issues

● The API is straight forward.● The problem with this API is that IT DOES NOT

DISABLE INTERRUPTS● This means that you may end up spinning for a

whole time slice (~1ms) if the two racing contexts are on the same core.

● This may also happen if two context are on different cores but one is pre-empted by some other context.

● This is really bad.

User space spin lock – when to use?

● Use only when the two racing contexts are running on two different cores and are the highest priority contexts on these two cores.

● Usually this is only fulfilled on a dedicated RT patched Linux system.

● Otherwise you get period spinning episodes.● Kapish?!?

Futex – Fast user space locking

● The idea is to avoid trips to the kernel in the non contended case.

● A mutex build half in user space and half in kernel space.

● State of the lock is in user space.● Wait list is in kernel space.● Allows to lock/unlock without calling kernel space in

the non contended case.● A Masterpiece of Linux engineering!

What happens when you die with a lock held?

● Here are some suggestions:– OS does nothing → deadlocks

– OS releases the lock → other contexts die because of inconsistent data

– OS releases the lock and notifies the next context locking the lock that the previous owner died → This is what Linux does.

● This feature of locks is called robustness.

Linux has no threads

● Do you remember that Linux has no concept of a “thread”?● Threads are just processes which happen to share a lot of

memory created with the clone(2) system call.● Don't tell this to user space developers in your company (they

tend to freak out about this).● This means that every locking mechanism in Linux can be

used for multi-processing as well as for multi-threading.● This is why futexes were made robust.● Futexes are robust by doing postmortem on dead processes and

examining the locks they leave behind in order to unlock them and mark them as suspicious.

pthread_mutex

● Is now days just a wrapper for a futex.● Could be used between processes (strange, but oh so true).● Could be made robust using the undocumented API

pthread_mutexattr_setrobust(3).● I found the documentation for this API on MSDN, of all

places…:)● Supports recursiveness, two types of priority inheritance,

sharing between processes, priority ceiling and more.● Makes lousy coffee, though...

Pthread readers/writer lock

● Is based on the futex.● This means good performance.● Standard, feature poor implementation.● Build your own if you need more features.● Could be used to synchronize processes and

threads.

POSIX semaphores

● Based on the futex.● Again, good performance.● Use this and not the Sys V version unless

you need the Sys V particular features.● Could be used to synchronize both processes

and threads.

Sys V semaphore

● Reminder: Sys V is AT&T's version of UNIX dating to circa 1983. In that version important API's like this one were first introduced into the UNIX world.

● Sys V semaphores are, however, crap.● This is because they always go to the kernel.

Even in the non contended case.● Do not use. Use POSIX semaphores instead.

Kernel locking

Kernel locking primitives

● Mutex● Spinlock (3 types)● Semaphore● RW semaphores

Mutexes

● Go to sleep when finding the lock locked.● This means they can only be used in contexts where you are

allowed to go to sleep.● This means passive(user) context, kernel thread or workqueue

context and threaded IRQ context (?!?).● Not allowed in IRQ handlers or tasklets.● Has 3 modes: interruptible, killable and uninterruptible.● Try to use interruptible as much as possible as bugs in kernel

code may cause non killable processes.● Support priority inheritance under the RT patch.

Spin locks

● Most common kernel locking primitive.● Are divided into 3 types: regular, BH and IRQ.● Regular spin locks just turn off scheduling on the current CPU (in addition

to being a spin lock).● BH turn off Bottom half mechanisms (including tasklets) on the current

CPU (in addition to being a spin lock).● IRQ ones turn off interrupts on the local CPU (in addition to being a spin

lock).● Sleeping, waiting or doing heavy computation with spin locks held is

considered reason for being banned from LKML.● Turn into Mutexes under the RT patch and then support priority

inheritance.

Spin lock (irq version)

● Turning of IRQs is quite fast (IF, CLI, STI are really fast on INTEL).

● Very brutal as it increases latency in real time implementations.

● Try not to access data structure from IRQ context so you won't have to use this.

● However, this is still one of the most common locking primitives

When to use each?● Passive vs Passive

– Use a mutex in interruptible mode (you are allowed to sleep in both).

– Or semaphore in interruptible mode.

– Or a regular spin lock. You are in no danger of spinning for long since scheduling on the current CPU is disabled. Interrupts may come in and so do tasklets but these are quick.

● Passive vs BH– spinlock_bh

● Passive vs IRQ– Use spin lock irq.

– The irq part prevents races on the current CPU.

– The spin lock part prevents races with other CPUs.

● BH vs BH– Spinlock

● BH vs IRQ– Spinlock irq

● IRQ vs IRQ– Spinlock irq.

● See “Rusty Russells Unreliable Guide to Locking”

Semaphores

● semaphore.h● Usually used as a mutex and not as a semaphore.● Up and down methods do not accept ticket number but always

increase and decrease by 1.● Ticket/permit count can be determined at creation time.● Semaphores do not offer priority inheritance. Even under the

RT patch.● This means that any system call that uses this is unfit to be

used in the critical path of a real time application.● 3 modes of operation (like the mutex).

RW semaphores

● rwsem.h● Offer more performance when number of readers

outnumbers number of writers.● Again, does not support priority inheritance. Even

under the RT patch.● Famous is the current->mm->mmap_sem that

protects each processes virtual memory description.● Good reason not to use malloc(3) in real time

systems.

RW lock

● rwlock.h● By Ingo Molnar (author of the real time patch).● Supports priority inheritance.● Use this instead of RW semaphores.● Again, gives better performance when number

of readers out numbers number of writers.

The RT patch

● Runs all irq handlers in their own threads with other interrupts enabled.

● Turns all spinlocks into mutexes to reduce latency and allow high priority tasks to get the CPU ASAP.

● If you really need a spin lock you can use the raw spinlock API which will give you a true spinlock even under the RT patch.

Software

Linux Locking Mechanisms