All Interrupt Requests (IRQs) issued by I/O devices give rise to maskable interrupts . A maskable interrupt can be in two states: masked or unmasked; a masked interrupt is ignored by the control unit as long as it remains masked.
Generated when the CPU detects an anomalous condition
while executing an instruction. These are further divided into
three groups, depending on the value of the
eip register that is saved on the
Kernel Mode stack when the CPU control unit raises the
Can generally be corrected; once corrected, the program
is allowed to restart with no loss of continuity. The saved
eip is the address
of the instruction that caused the fault, and hence that
instruction can be resumed when the exception handler
terminates. As we'll see in the section "Page Fault Exception
Handler" in Chapter
9, resuming the same instruction is necessary whenever
the handler is able to correct the anomalous condition that
caused the exception.
Reported immediately following the execution of the
trapping instruction; after the kernel returns control to the
program, it is allowed to continue its execution with no loss
of continuity. The saved value of
eip is the address of the
instruction that should be executed after the one that caused
the trap. A trap is triggered only when there is no need to
reexecute the instruction that terminated. The main use of
traps is for debugging purposes. The role of the interrupt
signal in this case is to notify the debugger that a specific
instruction has been executed (for instance, a breakpoint has
been reached within a program). Once the user has examined the
data provided by the debugger, she may ask that execution of
the debugged program resume, starting from the next
A serious error occurred; the control unit is in
trouble, and it may be unable to store in the
eip register the precise location of
the instruction causing the exception. Aborts are used to
report severe errors, such as hardware failures and invalid or
inconsistent values in system tables. The interrupt signal
sent by the control unit is an emergency signal used to switch
control to the corresponding abort exception handler. This
handler has no choice but to force the affected process to
Occur at the request of the programmer. They are
into (check for overflow) and
bound (check on address bound) instructions also give
rise to a programmed exception when the condition they are
checking is not true. Programmed exceptions are handled by the
control unit as traps; they are often called
software interrupts . Such exceptions have two common uses: to
implement system calls and to notify a debugger of a specific
event (see Chapter
Each interrupt or exception is identified by a number ranging from 0 to 255; Intel calls this 8-bit unsigned number a vector. The vectors of nonmaskable interrupts and exceptions are fixed, while those of maskable interrupts can be altered by programming the Interrupt Controller (see the next section).
Each hardware device controller capable of issuing interrupt requests usually has a single output line designated as the Interrupt ReQuest (IRQ) line.[*] All existing IRQ lines are connected to the input pins of a hardware circuit called the Programmable Interrupt Controller, which performs the following actions:
Monitors the IRQ lines, checking for raised signals. If two or more IRQ lines are raised, selects the one having the lower pin number.
If a raised signal occurs on an IRQ line:
Converts the raised signal received into a corresponding vector.
Stores the vector in an Interrupt Controller I/O port, thus allowing the CPU to read it via the data bus.
Sends a raised signal to the processor INTR pin—that is, issues an interrupt.
Waits until the CPU acknowledges the interrupt signal by writing into one of the Programmable Interrupt Controllers (PIC) I/O ports; when this occurs, clears the INTR line.
Goes back to step 1.
The IRQ lines are sequentially numbered starting from 0; therefore, the first IRQ line is usually denoted as IRQ 0. Intel's default vector associated with IRQ n is n+32. As mentioned before, the mapping between IRQs and vectors can be modified by issuing suitable I/O instructions to the Interrupt Controller ports.
Each IRQ line can be selectively disabled. Thus, the PIC can be programmed to disable IRQs. That is, the PIC can be told to stop issuing interrupts that refer to a given IRQ line, or to resume issuing them. Disabled interrupts are not lost; the PIC sends them to the CPU as soon as they are enabled again. This feature is used by most interrupt handlers, because it allows them to process IRQs of the same type serially.
Selective enabling/disabling of IRQs is not the same as global
masking/unmasking of maskable interrupts. When the
IF flag of the
eflags register is clear, each maskable
interrupt issued by the PIC is temporarily ignored by the CPU. The
sti assembly language instructions, respectively, clear and
set that flag.
Traditional PICs are implemented by connecting "in cascade" two 8259A-style external chips. Each chip can handle up to eight different IRQ input lines. Because the INT output line of the slave PIC is connected to the IRQ 2 pin of the master PIC, the number of available IRQ lines is limited to 15.
The previous description refers to PICs designed for uniprocessor systems. If the system includes a single CPU, the output line of the master PIC can be connected in a straightforward way to the INTR pin the CPU. However, if the system includes two or more CPUs, this approach is no longer valid and more sophisticated PICs are needed.
Being able to deliver interrupts to each CPU in the system is crucial for fully exploiting the parallelism of the SMP architecture. For that reason, Intel introduced starting with Pentium III a new component designated as the I/O Advanced Programmable Interrupt Controller (I/O APIC). This chip is the advanced version of the old 8259A Programmable Interrupt Controller; to support old operating systems, recent motherboards include both types of chip. Moreover, all current 80 × 86 microprocessors include a local APIC. Each local APIC has 32-bit registers, an internal clock; a local timer device; and two additional IRQ lines, LINT 0 and LINT 1, reserved for local APIC interrupts. All local APICs are connected to an external I/O APIC, giving rise to a multi-APIC system.
Figure 4-1 illustrates in a schematic way the structure of a multi-APIC system. An APIC bus connects the "frontend" I/O APIC to the local APICs. The IRQ lines coming from the devices are connected to the I/O APIC, which therefore acts as a router with respect to the local APICs. In the motherboards of the Pentium III and earlier processors, the APIC bus was a serial three-line bus; starting with the Pentium 4, the APIC bus is implemented by means of the system bus. However, because the APIC bus and its messages are invisible to software, we won't give further details.
The I/O APIC consists of a set of 24 IRQ lines, a 24-entry Interrupt Redirection Table, programmable registers, and a message unit for sending and receiving APIC messages over the APIC bus. Unlike IRQ pins of the 8259A, interrupt priority is not related to pin number: each entry in the Redirection Table can be individually programmed to indicate the interrupt vector and priority, the destination processor, and how the processor is selected. The information in the Redirection Table is used to translate each external IRQ signal into a message to one or more local APIC units via the APIC bus.
Interrupt requests coming from external hardware devices can be distributed among the available CPUs in two ways:
The IRQ signal is delivered to the local APICs listed in the corresponding Redirection Table entry. The interrupt is delivered to one specific CPU, to a subset of CPUs, or to all CPUs at once (broadcast mode).
The IRQ signal is delivered to the local APIC of the processor that is executing the process with the lowest priority.
Every local APIC has a programmable task priority register (TPR), which is used to compute the priority of the currently running process. Intel expects this register to be modified in an operating system kernel by each process switch.
If two or more CPUs share the lowest priority, the load is distributed between them using a technique called arbitration . Each CPU is assigned a different arbitration priority ranging from 0 (lowest) to 15 (highest) in the arbitration priority register of the local APIC.
Every time an interrupt is delivered to a CPU, its corresponding arbitration priority is automatically set to 0, while the arbitration priority of any other CPU is increased. When the arbitration priority register becomes greater than 15, it is set to the previous arbitration priority of the winning CPU increased by 1. Therefore, interrupts are distributed in a round-robin fashion among CPUs with the same task priority.[*]
Besides distributing interrupts among processors, the multi-APIC system allows CPUs to generate interprocessor interrupts . When a CPU wishes to send an interrupt to another CPU, it stores the interrupt vector and the identifier of the target's local APIC in the Interrupt Command Register (ICR) of its own local APIC. A message is then sent via the APIC bus to the target's local APIC, which therefore issues a corresponding interrupt to its own CPU.
Many of the current uniprocessor systems include an I/O APIC chip, which may be configured in two distinct ways:
As a standard 8259A-style external PIC connected to the CPU. The local APIC is disabled and the two LINT 0 and LINT 1 local IRQ lines are configured, respectively, as the INTR and NMI pins.
As a standard external I/O APIC. The local APIC is enabled, and all external interrupts are received through the I/O APIC.
The 80×86 microprocessors issue roughly 20 different exceptions .[*] The kernel must provide a dedicated exception handler for each exception type. For some exceptions, the CPU control unit also generates a hardware error code and pushes it on the Kernel Mode stack before starting the exception handler.
The following list gives the vector, the name, the type, and a brief description of the exceptions found in 80×86 processors. Additional information may be found in the Intel technical documentation.
Raised when a program issues an integer division by 0.
Raised when the
eflags is set (quite useful to implement
single-step execution of a debugged program) or when the address of an
instruction or operand falls within the range of an active debug
register (see the section "Hardware Context"
in Chapter 3).
Reserved for nonmaskable interrupts (those that use the NMI pin).
The CPU execution unit has detected an invalid opcode (the part of the machine instruction that determines the operation performed).
An ESCAPE, MMX, or SSE/SSE2 instruction has been executed
TS flag of
cr0 set (see the section "Saving and Loading the
FPU, MMX, and XMM Registers" in Chapter 3).
Normally, when the CPU detects an exception while trying to call the handler for a prior exception, the two exceptions can be handled serially. In a few cases, however, the processor cannot handle them serially, so it raises this exception.
Problems with the external mathematical coprocessor (applies only to old 80386 microprocessors).
The CPU has attempted a context switch to a process having an invalid Task State Segment.
A reference was made to a segment not present in memory
(one in which the
Segment-Present flag of the Segment
Descriptor was cleared).
The instruction attempted to exceed the stack segment
limit, or the segment identified by
ss is not present in memory.
One of the protection rules in the protected mode of the 80×86 has been violated.
The addressed page is not present in memory, the corresponding Page Table entry is null, or a violation of the paging protection mechanism has occurred.
The floating-point unit integrated into the CPU chip has signaled an error condition, such as numeric overflow or division by 0.[*]
The address of an operand is not correctly aligned (for instance, the address of a long integer is not a multiple of 4).
A machine-check mechanism has detected a CPU or bus error.
The SSE or SSE2 unit integrated in the CPU chip has signaled an error condition on a floating-point operation.
The values from 20 to 31 are reserved by Intel for future development. As illustrated in Table 4-1, each exception is handled by a specific exception handler (see the section "Exception Handling" later in this chapter), which usually sends a Unix signal to the process that caused the exception.
Table 4-1. Signals sent by the exception handlers
Device not available
Coprocessor segment overrun
Segment not present
Stack segment fault
SIMD floating point
A system table called Interrupt Descriptor Table (IDT ) associates each interrupt or exception vector with the address of the corresponding interrupt or exception handler. The IDT must be properly initialized before the kernel enables interrupts.
The IDT format is similar to that of the GDT and the LDTs examined in Chapter 2. Each entry corresponds to an interrupt or an exception vector and consists of an 8-byte descriptor. Thus, a maximum of 256 × 8 = 2048 bytes are required to store the IDT.
idtr CPU register allows the IDT to be located anywhere in
memory: it specifies both the IDT base physical address and its limit
(maximum length). It must be initialized before enabling interrupts by
lidt assembly language instruction.
The IDT may include three types of descriptors; Figure 4-2 illustrates the
meaning of the 64 bits included in each of them. In particular, the
value of the
Type field encoded in
the bits 40–43 identifies the descriptor type.
The descriptors are:
Includes the TSS selector of the process that must replace the current one when an interrupt signal occurs.
Includes the Segment Selector and the offset inside the
segment of an interrupt or exception handler. While transferring
control to the proper segment, the processor clears the
IF flag, thus disabling further
Similar to an interrupt gate, except that while
transferring control to the proper segment, the processor does
not modify the
After executing an instruction, the
eip pair of registers contain the logical
address of the next instruction to be executed. Before dealing with
that instruction, the control unit checks whether an interrupt or an
exception occurred while the control unit executed the previous
instruction. If one occurred, the control unit does the
Determines the vector i (0 ≤ i ≤ 255) associated with the interrupt or the exception.
Reads the i th entry of the IDT
referred by the
(we assume in the following description that the entry contains an
interrupt or a trap gate).
Gets the base address of the GDT from the
gdtr register and looks in the GDT to read the Segment
Descriptor identified by the selector in the IDT entry. This
descriptor specifies the base address of the segment that includes
the interrupt or exception handler.
Makes sure the interrupt was issued by an authorized source.
First, it compares the Current Privilege Level (CPL), which is
stored in the two least significant bits of the
cs register, with the Descriptor
Privilege Level (DPL ) of the Segment Descriptor included in the GDT.
Raises a "General protection " exception if the CPL is lower than the DPL,
because the interrupt handler cannot have a lower privilege than
the program that caused the interrupt. For programmed exceptions,
makes a further security check: compares the CPL with the DPL of
the gate descriptor included in the IDT and raises a "General
protection" exception if the DPL is lower than the CPL. This last
check makes it possible to prevent access by user applications to
specific trap or interrupt gates.
Checks whether a change of privilege level is taking place — that is, if CPL is different from the selected Segment Descriptor's DPL. If so, the control unit must start using the stack that is associated with the new privilege level. It does this by performing the following steps:
esp registers with the
proper values for the stack segment and stack pointer
associated with the new privilege level. These values are
found in the TSS (see the section "Task State
Segment" in Chapter
In the new stack, it saves the previous values of
esp, which define the logical
address of the stack associated with the old privilege
If a fault has occurred, it loads
eip with the logical address of the
instruction that caused the exception so that it can be executed
If the exception carries a hardware error code, it saves it on the stack.
eip, respectively, with the Segment
Selector and the Offset fields of the Gate Descriptor stored in
the i th entry of the IDT. These values
define the logical address of the first instruction of the
interrupt or exception handler.
The last step performed by the control unit is equivalent to a jump to the interrupt or exception handler. In other words, the instruction processed by the control unit after dealing with the interrupt signal is the first instruction of the selected handler.
eflags registers with the values saved
on the stack. If a hardware error code has been pushed in the
stack on top of the
contents, it must be popped before executing
Check whether the CPL of the handler is equal to the value
contained in the two least significant bits of
cs (this means the interrupted process
was running at the same privilege level as the handler). If so,
iret concludes execution;
otherwise, go to the next step.
esp registers from the stack and return
to the stack associated with the old privilege level.
Examine the contents of the
gs segment registers; if any of them
contains a selector that refers to a Segment Descriptor whose DPL
value is lower than CPL, clear the corresponding segment register.
The control unit does this to forbid User Mode programs that run
with a CPL equal to 3 from using segment registers previously used
by kernel routines (with a DPL equal to 0). If these registers
were not cleared, malicious User Mode programs could exploit them
in order to access the kernel address space.
[*] More sophisticated devices use several IRQ lines. For instance, a PCI card can use up to four IRQ lines.
[*] The Pentium 4 local APIC doesn't have an arbitration priority register; the arbitration mechanism is hidden in the bus arbitration circuitry. The Intel manuals state that if the operating system kernel does not regularly update the task priority registers , performance may be suboptimal because interrupts might always be serviced by the same CPU.
[*] The exact number depends on the processor model.
[*] The 80 × 86 microprocessors also generate this exception when performing a signed division whose result cannot be stored as a signed integer (for instance, a division between -2,147,483,648 and -1).