Tasklets and Bottom-Half Processing

One of the main problems with interrupt handling is how to perform longish tasks within a handler. Often a substantial amount of work must be done in response to a device interrupt, but interrupt handlers need to finish up quickly and not keep interrupts blocked for long. These two needs (work and speed) conflict with each other, leaving the driver writer in a bit of a bind.

Linux (along with many other systems) resolves this problem by splitting the interrupt handler into two halves. The so-called top half is the routine that actually responds to the interrupt—the one you register with request_irq. The bottom half is a routine that is scheduled by the top half to be executed later, at a safer time. The use of the term bottom half in the 2.4 kernel can be a bit confusing, in that it can mean either the second half of an interrupt handler or one of the mechanisms used to implement this second half, or both. When we refer to a bottom half we are speaking generally about a bottom half; the old Linux bottom-half implementation is referred to explicitly with the acronym BH.

But what is a bottom half useful for?

The big difference between the top-half handler and the bottom half is that all interrupts are enabled during execution of the bottom half—that’s why it runs at a safer time. In the typical scenario, the top half saves device data to a device-specific buffer, schedules its bottom half, and exits: this is very fast. The bottom half then performs whatever other work is required, such as awakening processes, starting up another I/O operation, and so on. This setup permits the top half to service a new interrupt while the bottom half is still working.

Every serious interrupt handler is split this way. For instance, when a network interface reports the arrival of a new packet, the handler just retrieves the data and pushes it up to the protocol layer; actual processing of the packet is performed in a bottom half.

One thing to keep in mind with bottom-half processing is that all of the restrictions that apply to interrupt handlers also apply to bottom halves. Thus, bottom halves cannot sleep, cannot access user space, and cannot invoke the scheduler.

The Linux kernel has two different mechanisms that may be used to implement bottom-half processing. Tasklets were introduced late in the 2.3 development series; they are now the preferred way to do bottom-half processing, but they are not portable to earlier kernel versions. The older bottom-half (BH) implementation exists in even very old kernels, though it is implemented with tasklets in 2.4. We’ll look at both mechanisms here. In general, device drivers writing new code should choose tasklets for their bottom-half processing if possible, though portability considerations may determine that the BH mechanism needs to be used instead.

The following discussion works, once again, with the short driver. When loaded with a module option, short can be told to do interrupt processing in a top/bottom-half mode, with either a tasklet or bottom-half handler. In this case, the top half executes quickly; it simply remembers the current time and schedules the bottom half processing. The bottom half is then charged with encoding this time and awakening any user processes that may be waiting for data.

Tasklets

We have already had an introduction to tasklets in Chapter 6, so a quick review should suffice here. Remember that tasklets are a special function that may be scheduled to run, in interrupt context, at a system-determined safe time. They may be scheduled to run multiple times, but will only run once. No tasklet will ever run in parallel with itself, since they only run once, but tasklets can run in parallel with other tasklets on SMP systems. Thus, if your driver has multiple tasklets, they must employ some sort of locking to avoid conflicting with each other.

Tasklets are also guaranteed to run on the same CPU as the function that first schedules them. An interrupt handler can thus be secure that a tasklet will not begin executing before the handler has completed. However, another interrupt can certainly be delivered while the tasklet is running, so locking between the tasklet and the interrupt handler may still be required.

Tasklets must be declared with the DECLARE_TASKLET macro:

DECLARE_TASKLET(name, function, data);

name is the name to be given to the tasklet, function is the function that is called to execute the tasklet (it takes one unsigned long argument and returns void), and data is an unsigned long value to be passed to the tasklet function.

The short driver declares its tasklet as follows:

void short_do_tasklet (unsigned long);
DECLARE_TASKLET (short_tasklet, short_do_tasklet, 0);

The function tasklet_schedule is used to schedule a tasklet for running. If short is loaded with tasklet=1, it installs a different interrupt handler that saves data and schedules the tasklet as follows:

void short_tl_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
    do_gettimeofday((struct timeval *) tv_head); /* cast to stop
    'volatile' warning */
    short_incr_tv(&tv_head);
    tasklet_schedule(&short_tasklet);
    short_bh_count++; /* record that an interrupt arrived */
}

The actual tasklet routine, short_do_tasklet, will be executed shortly at the system’s convenience. As mentioned earlier, this routine performs the bulk of the work of handling the interrupt; it looks like this:

void short_do_tasklet (unsigned long unused)
{
    int savecount = short_bh_count, written;
    short_bh_count = 0; /* we have already been removed from queue */
    /*
     * The bottom half reads the tv array, filled by the top half,
     * and prints it to the circular text buffer, which is then consumed
     * by reading processes
     */

    /* First write the number of interrupts that occurred before 
       this bh */

    written = sprintf((char *)short_head,"bh after %6i\n",savecount);
    short_incr_bp(&short_head, written);

    /*
     * Then, write the time values. Write exactly 16 bytes at a time,
     * so it aligns with PAGE_SIZE
     */

    do {
        written = sprintf((char *)short_head,"%08u.%06u\n",
			(int)(tv_tail->tv_sec % 100000000),
			(int)(tv_tail->tv_usec));
	short_incr_bp(&short_head, written);
	short_incr_tv(&tv_tail);
    } while (tv_tail != tv_head);

    wake_up_interruptible(&short_queue); /* wake any reading process */
}

Among other things, this tasklet makes a note of how many interrupts have arrived since it was last called. A device like short can generate a great many interrupts in a brief period, so it is not uncommon for several to arrive before the bottom half is executed. Drivers must always be prepared for this possibility, and must be able to determine how much work there is to perform from the information left by the top half.

The BH Mechanism

Unlike tasklets, old-style BH bottom halves have been around almost as long as the Linux kernel itself. They show their age in a number of ways. For example, all BH bottom halves are predefined in the kernel, and there can be a maximum of 32 of them. Since they are predefined, bottom halves cannot be used directly by modules, but that is not actually a problem, as we will see.

Whenever some code wants to schedule a bottom half for running, it calls mark_bh. In the older BH implemention, mark_bh would set a bit in a bit mask, allowing the corresponding bottom-half handler to be found quickly at runtime. In modern kernels, it just calls tasklet_schedule to schedule the bottom-half routine for execution.

Marking bottom halves is defined in <linux/interrupt.h> as

void mark_bh(int nr);

Here, nr is the “number” of the BH to activate. The number is a symbolic constant defined in <linux/interrupt.h> that identifies the bottom half to run. The function that corresponds to each bottom half is provided by the driver that owns the bottom half. For example, when mark_bh(SCSI_BH) is called, the function being scheduled for execution is scsi_bottom_half_handler, which is part of the SCSI driver.

As mentioned earlier, bottom halves are static objects, so a modularized driver won’t be able to register its own BH. There’s no support for dynamic allocation of BH bottom halves, and it’s unlikely there ever will be. Fortunately, the immediate task queue can be used instead.

The rest of this section lists some of the most interesting bottom halves. It then describes how the kernel runs a BH bottom half, which you should understand in order to use bottom halves properly.

Several BH bottom halves declared by the kernel are interesting to look at, and a few can even be used by a driver, as introduced earlier. These are the most interesting BHs:

IMMEDIATE_BH

This is the most important bottom half for driver writers. The function being scheduled runs (with run_task_queue) the tq_immediate task queue. A driver (like a custom module) that doesn’t own a bottom half can use the immediate queue as if it were its own BH. After registering a task in the queue, the driver must mark the BH in order to have its code actually executed; how to do this was introduced in Section 6.4.3.4, in Chapter 6.

TQUEUE_BH

This BH is activated at each timer tick if a task is registered in tq_timer. In practice, a driver can implement its own BH using tq_timer. The timer queue introduced in Section 6.4.3.3 in Chapter 6 is a BH, but there’s no need to call mark_bh for it.

TIMER_BH

This BH is marked by do_timer, the function in charge of the clock tick. The function that this BH executes is the one that drives the kernel timers. There is no way to use this facility for a driver short of using add_timer.

The remaining BH bottom halves are used by specific kernel drivers. There are no entry points in them for a module, and it wouldn’t make sense for there to be any. The list of these other bottom halves is steadily shrinking as the drivers are converted to using tasklets.

Once a BH has been marked, it is executed when bh_action (kernel/softirq.c) is invoked, which happens when tasklets are run. This happens whenever a process exits from a system call or when an interrupt handler exits. Tasklets are always executed as part of the timer interrupt, so a driver can usually expect that a bottom-half routine will be executed at most 10 ms after it has been scheduled.

Writing a BH Bottom Half

It’s quite apparent from the list of available bottom halves in Section 9.5.2 that a driver implementing a bottom half should attach its code to IMMEDIATE_BH by using the immediate queue.

When IMMEDIATE_BH is marked, the function in charge of the immediate bottom half just consumes the immediate queue. If your interrupt handler queues its BH handler to tq_immediate and marks the IMMEDIATE_BH bottom half, the queued task will be called at just the right time. Because in all kernels we are interested in you can queue the same task multiple times without trashing the task queue, you can queue your bottom half every time the top-half handler runs. We’ll see this behavior in a while.

Drivers with exotic configurations—multiple bottom halves or other setups that can’t easily be handled with a plain tq_immediate—can be satisfied by using a custom task queue. The interrupt handler queues the tasks in its own queue, and when it’s ready to run them, a simple queue-consuming function is inserted into the immediate queue. See Section 6.4.4 in Chapter 6 for details.

Let’s now look at the short BH implementation. When loaded with bh=1, the module installs an interrupt handler that uses a BH bottom half:

void short_bh_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
    /* cast to stop 'volatile' warning */
    do_gettimeofday((struct timeval *) tv_head);
    short_incr_tv(&tv_head);

    /* Queue the bh. Don't care about multiple enqueueing */
    queue_task(&short_task, &tq_immediate);
    mark_bh(IMMEDIATE_BH);

    short_bh_count++; /* record that an interrupt arrived */
}

As expected, this code calls queue_task without checking whether the task is already enqueued.

The BH, then, performs the rest of the work. This BH is, in fact, the same short_do_tasklet that was shown previuosly.

Here’s an example of what you see when loading short by specifying bh=1:

morgana% echo 1122334455 > /dev/shortint ; cat /dev/shortint
bh after      5
50588804.876653
50588804.876693
50588804.876720
50588804.876747
50588804.876774

The actual timings that you will see will vary, of course, depending on your particular system.

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.