O'Reilly logo

Understanding the Linux Kernel, 3rd Edition by Marco Cesati, Daniel P. Bovet

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Exception Handling

Most exceptions issued by the CPU are interpreted by Linux as error conditions. When one of them occurs, the kernel sends a signal to the process that caused the exception to notify it of an anomalous condition. If, for instance, a process performs a division by zero, the CPU raises a "Divide error " exception, and the corresponding exception handler sends a SIGFPE signal to the current process, which then takes the necessary steps to recover or (if no signal handler is set for that signal) abort.

There are a couple of cases, however, where Linux exploits CPU exceptions to manage hardware resources more efficiently. A first case is already described in the section "Saving and Loading the FPU, MMX, and XMM Registers" in Chapter 3. The "Device not available " exception is used together with the TS flag of the cr0 register to force the kernel to load the floating point registers of the CPU with new values. A second case involves the "Page Fault " exception, which is used to defer allocating new page frames to the process until the last possible moment. The corresponding handler is complex because the exception may, or may not, denote an error condition (see the section "Page Fault Exception Handler" in Chapter 9).

Exception handlers have a standard structure consisting of three steps:

  1. Save the contents of most registers in the Kernel Mode stack (this part is coded in assembly language).

  2. Handle the exception by means of a high-level C function.

  3. Exit from the handler by means of the ret_from_exception( ) function.

To take advantage of exceptions, the IDT must be properly initialized with an exception handler function for each recognized exception. It is the job of the trap_init( ) function to insert the final values—the functions that handle the exceptions—into all IDT entries that refer to nonmaskable interrupts and exceptions. This is accomplished through the set_trap_gate( ), set_intr_gate( ), set_system_gate( ), set_system_intr_gate( ), and set_task_gate( ) functions:

    set_trap_gate(0,&divide_error);
    set_trap_gate(1,&debug);
    set_intr_gate(2,&nmi);
    set_system_intr_gate(3,&int3);
    set_system_gate(4,&overflow);
    set_system_gate(5,&bounds);
    set_trap_gate(6,&invalid_op);
    set_trap_gate(7,&device_not_available);
    set_task_gate(8,31);
    set_trap_gate(9,&coprocessor_segment_overrun);
    set_trap_gate(10,&invalid_TSS);
    set_trap_gate(11,&segment_not_present);
    set_trap_gate(12,&stack_segment);
    set_trap_gate(13,&general_protection);
    set_intr_gate(14,&page_fault);
    set_trap_gate(16,&coprocessor_error);
    set_trap_gate(17,&alignment_check);
    set_trap_gate(18,&machine_check);
    set_trap_gate(19,&simd_coprocessor_error);
    set_system_gate(128,&system_call);

The "Double fault" exception is handled by means of a task gate instead of a trap or system gate, because it denotes a serious kernel misbehavior. Thus, the exception handler that tries to print out the register values does not trust the current value of the esp register. When such an exception occurs, the CPU fetches the Task Gate Descriptor stored in the entry at index 8 of the IDT. This descriptor points to the special TSS segment descriptor stored in the 32nd entry of the GDT. Next, the CPU loads the eip and esp registers with the values stored in the corresponding TSS segment. As a result, the processor executes the doublefault_fn() exception handler on its own private stack.

Now we will look at what a typical exception handler does once it is invoked. Our description of exception handling will be a bit sketchy for lack of space. In particular we won't be able to cover:

  1. The signal codes (see Table 11-8 in Chapter 11) sent by some handlers to the User Mode processes.

  2. Exceptions that occur when the kernel is operating in MS-DOS emulation mode (vm86 mode), which must be dealt with differently.

  3. "Debug " exceptions.

Saving the Registers for the Exception Handler

Let's use handler_name to denote the name of a generic exception handler. (The actual names of all the exception handlers appear on the list of macros in the previous section.) Each exception handler starts with the following assembly language instructions:

    handler_name:
        pushl $0 /* only for some exceptions */
        pushl $do_handler_name
        jmp error_code

If the control unit is not supposed to automatically insert a hardware error code on the stack when the exception occurs, the corresponding assembly language fragment includes a pushl $0 instruction to pad the stack with a null value. Then the address of the high-level C function is pushed on the stack; its name consists of the exception handler name prefixed by do_.

The assembly language fragment labeled as error_code is the same for all exception handlers except the one for the "Device not available " exception (see the section "Saving and Loading the FPU, MMX, and XMM Registers" in Chapter 3). The code performs the following steps:

  1. Saves the registers that might be used by the high-level C function on the stack.

  2. Issues a cld instruction to clear the direction flag DF of eflags , thus making sure that autoincreases on the edi and esi registers will be used with string instructions .[*]

  3. Copies the hardware error code saved in the stack at location esp+36 in edx. Stores the value -1 in the same stack location. As we'll see in the section "Reexecution of System Calls" in Chapter 11, this value is used to separate 0x80 exceptions from other exceptions.

  4. Loads edi with the address of the high-level do_handler_name( ) C function saved in the stack at location esp+32; writes the contents of es in that stack location.

  5. Loads in the eax register the current top location of the Kernel Mode stack. This address identifies the memory cell containing the last register value saved in step 1.

  6. Loads the user data Segment Selector into the ds and es registers.

  7. Invokes the high-level C function whose address is now stored in edi.

The invoked function receives its arguments from the eax and edx registers rather than from the stack. We have already run into a function that gets its arguments from the CPU registers: the _ _switch_to( ) function, discussed in the section "Performing the Process Switch" in Chapter 3.

Entering and Leaving the Exception Handler

As already explained, the names of the C functions that implement exception handlers always consist of the prefix do_ followed by the handler name. Most of these functions invoke the do_trap() function to store the hardware error code and the exception vector in the process descriptor of current, and then send a suitable signal to that process:

    current->thread.error_code = error_code;
    current->thread.trap_no = vector;
    force_sig(sig_number, current);

The current process takes care of the signal right after the termination of the exception handler. The signal will be handled either in User Mode by the process's own signal handler (if it exists) or in Kernel Mode. In the latter case, the kernel usually kills the process (see Chapter 11). The signals sent by the exception handlers are listed in Table 4-1.

The exception handler always checks whether the exception occurred in User Mode or in Kernel Mode and, in the latter case, whether it was due to an invalid argument passed to a system call. We'll describe in the section "Dynamic Address Checking: The Fix-up Code" in Chapter 10 how the kernel defends itself against invalid arguments passed to system calls. Any other exception raised in Kernel Mode is due to a kernel bug. In this case, the exception handler knows the kernel is misbehaving. In order to avoid data corruption on the hard disks, the handler invokes the die( ) function, which prints the contents of all CPU registers on the console (this dump is called kernel oops ) and terminates the current process by calling do_exit( ) (see "Process Termination" in Chapter 3).

When the C function that implements the exception handling terminates, the code performs a jmp instruction to the ret_from_exception( ) function. This function is described in the later section "Returning from Interrupts and Exceptions."



[*] A single assembly language "string instruction," such as rep;movsb , is able to act on a whole block of data (string).

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required