A Brief Introduction to Race Conditions

Now that you understand how scull’s memory management works, here is a scenario to consider. Two processes, A and B, both have the same scull device open for writing. Both attempt simultaneously to append data to the device. A new quantum is required for this operation to succeed, so each process allocates the required memory and stores a pointer to it in the quantum set.

The result is trouble. Because both processes see the same scull device, each will store its new memory in the same place in the quantum set. If A stores its pointer first, B will overwrite that pointer when it does its store. Thus the memory allocated by A, and the data written therein, will be lost.

This situation is a classic race condition; the results vary depending on who gets there first, and usually something undesirable happens in any case. On uniprocessor Linux systems, the scull code would not have this sort of problem, because processes running kernel code are not preempted. On SMP systems, however, life is more complicated. Processes A and B could easily be running on different processors and could interfere with each other in this manner.

The Linux kernel provides several mechanisms for avoiding and managing race conditions. A full description of these mechanisms will have to wait until Chapter 9, but a beginning discussion is appropriate here.

A semaphore is a general mechanism for controlling access to resources. In its simplest form, a semaphore may be used for mutual exclusion; processes using semaphores in the mutual exclusion mode are prevented from simultaneously running the same code or accessing the same data. This sort of semaphore is often called a mutex, from “mutual exclusion.”

Semaphores in Linux are defined in <asm/semaphore.h>. They have a type of struct semaphore, and a driver should only act on them using the provided interface. In scull, one semaphore is allocated for each device, in the Scull_Dev structure. Since the devices are entirely independent of each other, there is no need to enforce mutual exclusion across multiple devices.

Semaphores must be initialized prior to use by passing a numeric argument to sema_init. For mutual exclusion applications (i.e., keeping multiple threads from accessing the same data simultaneously), the semaphore should be initialized to a value of 1, which means that the semaphore is available. The following code in scull’s module initialization function (scull_init) shows how the semaphores are initialized as part of setting up the devices.

 for (i=0; i < scull_nr_devs; i++) {
  scull_devices[i].quantum = scull_quantum;
  scull_devices[i].qset = scull_qset;
  sema_init(&scull_devices[i].sem, 1);
 }

A process wishing to enter a section of code protected by a semaphore must first ensure that no other process is already there. Whereas in classical computer science the function to obtain a semaphore is often called P, in Linux you’ll need to call down or down_interruptible. These functions test the value of the semaphore to see if it is greater than 0; if so, they decrement the semaphore and return. If the semaphore is 0, the functions will sleep and try again after some other process, which has presumably freed the semaphore, wakes them up.

The down_interruptible function can be interrupted by a signal, whereas down will not allow signals to be delivered to the process. You almost always want to allow signals; otherwise, you risk creating unkillable processes and other undesirable behavior. A complication of allowing signals, however, is that you always have to check if the function (here down_interruptible) was interrupted. As usual, the function returns 0 for success and nonzero in case of failure. If the process is interrupted, it will not have acquired the semaphores; thus, you won’t need to call up. A typical call to invoke a semaphore therefore normally looks something like this:

 if (down_interruptible (&sem))
	return -ERESTARTSYS;

The -ERESTARTSYS return value tells the system that the operation was interrupted by a signal. The kernel function that called the device method will either retry it or return -EINTR to the application, according to how signal handling has been configured by the application. Of course, your code may have to perform cleanup work before returning if interrupted in this mode.

A process that obtains a semaphore must always release it afterward. Whereas computer science calls the release function V, Linux uses up instead. A simple call like

 up (&sem);

will increment the value of the semaphore and wake up any processes that are waiting for the semaphore to become available.

Care must be taken with semaphores. The data protected by the semaphore must be clearly defined, and all code that accesses that data must obtain the semaphore first. Code that uses down_interruptible to obtain a semaphore must not call another function that also attempts to obtain that semaphore, or deadlock will result. If a routine in your driver fails to release a semaphore it holds (perhaps as a result of an error return), any further attempts to obtain that semaphore will stall. Mutual exclusion in general can be tricky, and benefits from a well-defined and methodical approach.

In scull, the per-device semaphore is used to protect access to the stored data. Any code that accesses the data field of the Scull_Dev structure must first have obtained the semaphore. To avoid deadlocks, only functions that implement device methods will try to obtain the semaphore. Internal routines, such as scull_trim shown earlier, assume that the semaphore has already been obtained. As long as these invariants hold, access to the Scull_Dev data structure is safe from race conditions.

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.