O'Reilly logo

Understanding the Linux Kernel, 3rd Edition by Marco Cesati, Daniel P. Bovet

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Returning from Interrupts and Exceptions

We will finish the chapter by examining the termination phase of interrupt and exception handlers. (Returning from a system call is a special case, and we shall describe it in Chapter 10.) Although the main objective is clear — namely, to resume execution of some program — several issues must be considered before doing it:

Number of kernel control paths being concurrently executed

If there is just one, the CPU must switch back to User Mode.

Pending process switch requests

If there is any request, the kernel must perform process scheduling; otherwise, control is returned to the current process.

Pending signals

If a signal is sent to the current process, it must be handled.

Single-step mode

If a debugger is tracing the execution of the current process, single-step mode must be restored before switching back to User Mode.

Virtual-8086 mode

If the CPU is in virtual-8086 mode, the current process is executing a legacy Real Mode program, thus it must be handled in a special way.

A few flags are used to keep track of pending process switch requests, of pending signals , and of single step execution; they are stored in the flags field of the thread_info descriptor. The field stores other flags as well, but they are not related to returning from interrupts and exceptions. See Table 4-15 for a complete list of these flags.

Table 4-15. The flags field of the thread_info descriptor (continues)

Flag name



System calls are being traced


Not used in the 80 × 86 platform


The process has pending signals


Scheduling must be performed


Restore single step execution on return to User Mode


Force return from system call via iret rather than sysexit


System calls are being audited


The idle process is polling the TIF_NEED_RESCHED flag


The process is being destroyed to reclaim memory (see the section "The Out of Memory Killer" in Chapter 17)

The kernel assembly language code that accomplishes all these things is not, technically speaking, a function, because control is never returned to the functions that invoke it. It is a piece of code with two different entry points: ret_from_intr( ) and ret_from_exception( ). As their names suggest, the kernel enters the former when terminating an interrupt handler, and it enters the latter when terminating an exception handler. We shall refer to the two entry points as functions, because this makes the description simpler.

The general flow diagram with the corresponding two entry points is illustrated in Figure 4-6. The gray boxes refer to assembly language instructions that implement kernel preemption (see Chapter 5); if you want to see what the kernel does when it is compiled without support for kernel preemption, just ignore the gray boxes. The ret_from_exception( ) and ret_from_intr( ) entry points look quite similar in the flow diagram. A difference exists only if support for kernel preemption has been selected as a compilation option: in this case, local interrupts are immediately disabled when returning from exceptions.

Returning from interrupts and exceptions

Figure 4-6. Returning from interrupts and exceptions

The flow diagram gives a rough idea of the steps required to resume the execution of an interrupted program. Now we will go into detail by discussing the assembly language code.

The entry points

The ret_from_intr( ) and ret_from_exception( ) entry points are essentially equivalent to the following assembly language code:

        cli ; missing if kernel preemption is not supported
        movl $-8192, %ebp ; -4096 if multiple Kernel Mode stacks are used
        andl %esp, %ebp
        movl 0x30(%esp), %eax
        movb 0x2c(%esp), %al
        testl $0x00020003, %eax
        jnz resume_userspace
        jpm resume_kernel

Recall that when returning from an interrupt, the local interrupts are disabled (see step 3 in the earlier description of handle_IRQ_event( )); thus, the cli assembly language instruction is executed only when returning from an exception.

The kernel loads the address of the thread_info descriptor of current in the ebp register (see "Identifying a Process" in Chapter 3).

Next, the values of the cs and eflags registers, which were pushed on the stack when the interrupt or the exception occurred, are used to determine whether the interrupted program was running in User Mode, or if the VM flag of eflags was set.[*] In either case, a jump is made to the resume_userspace label. Otherwise, a jump is made to the resume_kernel label.

Resuming a kernel control path

The assembly language code at the resume_kernel label is executed if the program to be resumed is running in Kernel Mode:

        cli                 ; these three instructions are
        cmpl $0, 0x14(%ebp) ; missing if kernel preemption
        jz need_resched     ; is not supported
        popl %ebx
        popl %ecx
        popl %edx
        popl %esi
        popl %edi
        popl %ebp
        popl %eax
        popl %ds
        popl %es
        addl $4, %esp

If the preempt_count field of the thread_info descriptor is zero (kernel preemption enabled), the kernel jumps to the need_resched label. Otherwise, the interrupted program is to be restarted. The function loads the registers with the values saved when the interrupt or the exception started, and the function yields control by executing the iret instruction.

Checking for kernel preemption

When this code is executed, none of the unfinished kernel control paths is an interrupt handler, otherwise the preempt_count field would be greater than zero. However, as stated in "Nested Execution of Exception and Interrupt Handlers" earlier in this chapter, there could be up to two kernel control paths associated with exceptions (beside the one that is terminating).

        movl 0x8(%ebp), %ecx
        testb $(1<<TIF_NEED_RESCHED), %cl
        jz restore_all
        testl $0x00000200,0x30(%esp)
        jz restore_all
        call preempt_schedule_irq
        jmp need_resched

If the TIF_NEED_RESCHED flag in the flags field of current->thread_info is zero, no process switch is required, thus a jump is made to the restore_all label. Also a jump to the same label is made if the kernel control path that is being resumed was running with the local interrupts disabled. In this case a process switch could corrupt kernel data structures (see the section "When Synchronization Is Necessary" in Chapter 5 for more details).

If a process switch is required, the preempt_schedule_irq( ) function is invoked: it sets the PREEMPT_ACTIVE flag in the preempt_count field, temporarily sets the big kernel lock counter to -1 (see the section "The Big Kernel Lock" in Chapter 5), enables the local interrupts, and invokes schedule( ) to select another process to run. When the former process will resume, preempt_schedule_irq( ) restores the previous value of the big kernel lock counter, clears the PREEMPT_ACTIVE flag, and disables local interrupts. The schedule( ) function will continue to be invoked as long as the TIF_NEED_RESCHED flag of the current process is set.

Resuming a User Mode program

If the program to be resumed was running in User Mode, a jump is made to the resume_userspace label:

        movl 0x8(%ebp), %ecx
        andl $0x0000ff6e, %ecx
        je restore_all
        jmp work_pending

After disabling the local interrupts, a check is made on the value of the flags field of current->thread_info. If no flag except TIF_SYSCALL_TRACE, TIF_SYSCALL_AUDIT, or TIF_SINGLESTEP is set, nothing remains to be done: a jump is made to the restore_all label, thus resuming the User Mode program.

Checking for rescheduling

The flags in the thread_info descriptor state that additional work is required before resuming the interrupted program.

        testb $(1<<TIF_NEED_RESCHED), %cl
        jz work_notifysig
        call schedule
        jmp resume_userspace

If a process switch request is pending, schedule( ) is invoked to select another process to run. When the former process will resume, a jump is made back to resume_userspace.

Handling pending signals, virtual-8086 mode, and single stepping

There is other work to be done besides process switch requests:

        movl %esp, %eax
        testl $0x00020000, 0x30(%esp)
        je 1f
        pushl %ecx
        call save_v86_state
        popl %ecx
        movl %eax, %esp
        xorl %edx, %edx
        call do_notify_resume
        jmp restore_all

If the VM control flag in the eflags register of the User Mode program is set, the save_v86_state( ) function is invoked to build up the virtual-8086 mode data structures in the User Mode address space. Then the do_notify_resume( ) function is invoked to take care of pending signals and single stepping. Finally, a jump is made to the restore_all label to resume the interrupted program.

[*] When this flag is set, programs are executed in virtual-8086 mode; see the Pentium manuals for more details.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required