Chapter 16. Physical Layout of the Kernel Source

So far, we’ve talked about the Linux kernel from the perspective of writing device drivers. Once you begin playing with the kernel, however, you may find that you want to “understand it all.” In fact, you may find yourself passing whole days navigating through the source code and grepping your way through the source tree to uncover the relationships among the different parts of the kernel.

This kind of “heavy grepping” is one of the tasks your authors perform quite often, and it is an efficient way to retrieve information from the source code. Nowadays you can even exploit Internet resources to understand the kernel source tree; some of them are listed in the preface. But despite Internet resources, wise use of grep,[62] less, and possibly ctags or etags can still be the best way to extract information from the kernel sources.

In our opinion, acquiring a bit of a knowledge base before sitting down in front of your preferred shell prompt can be helpful. Therefore, this chapter presents a quick overview of the Linux kernel source files based on version 2.4.2. If you’re interested in other versions, some of the descriptions may not apply literally. Whole sections may be missing (like the drivers/media directory that was introduced in 2.4.0-test6 by moving various preexisting drivers to this new directory). We hope the following information is useful, even if not authoritative, for browsing other versions of the kernel.

Every pathname is given relative to the source root (usually /usr/src/linux), while filenames with no directory component are assumed to reside in the “current” directory—the one being discussed. Header files (when named with < and > angle brackets) are given relative to the include directory of the source tree. We won’t dissect the Documentation directory, as its role is self-explanatory.

Booting the Kernel

The usual way to look at a program is to start where execution begins. As far as Linux is concerned, it’s hard to tell where execution begins—it depends on how you define “begins.”

The architecture-independent starting point is start_kernel in init/main.c. This function is invoked from architecture-specific code, to which it never returns. It is in charge of spinning the wheel and can thus be considered the “mother of all functions,” the first breath in the computer’s life. Before start_kernel, there was chaos.

By the time start_kernel is invoked, the processor has been initialized, protected mode[63] has been entered, the processor is executing at the highest privilege level (sometimes called supervisor mode), and interrupts are disabled. The start_kernel function is in charge of initializing all the kernel data structures. It does this by calling external functions to perform subtasks, since each setup function is defined in the appropriate kernel subsystem.

The first function called by start_kernel, after acquiring the kernel lock and printing the Linux banner string, is setup_arch. This allows platform-specific C-language code to run; setup_arch receives a pointer to the local command_line pointer in start_kernel, so it can make it point to the real (platform-dependent) location where the command line is stored. As the next step, start_kernel passes the command line to parse_options (defined in the same init/main.c file) so that the boot options can be honored.

Command-line parsing is performed by calling handler functions associated with each kernel argument (for example, video= is associated with video_setup). Each function usually ends up setting variables that are used later, when the associated facility is initialized. The internal organization of command-line parsing is similar to the init calls mechanism, described later.

After parsing, start_kernel activates the various basic functionalities of the system. This includes setting up interrupt tables, activating the timer interrupt, and initializing the console and memory management. All of this is performed by functions declared elsewhere in platform-specific code. The function continues by initializing less basic kernel subsystems, including buffer management, signal handling, and file and inode management.

Finally, start_kernel forks the init kernel thread (which gets 1 as a process ID) and executes the idle function (again, defined in architecture-specific code).

The initial boot sequence can thus be summarized as follows:

  1. System firmware or a boot loader arranges for the kernel to be placed at the proper address in memory. This code is usually external to Linux source code.

  2. Architecture-specific assembly code performs very low-level tasks, like initializing memory and setting up CPU registers so that C code can run flawlessly. This includes selecting a stack area and setting the stack pointer accordingly. The amount of such code varies from platform to platform; it can range from a few dozen lines up to a few thousand lines.

  3. start_kernel is called. It acquires the kernel lock, prints the banner, and calls setup_arch.

  4. Architecture-specific C-language code completes low-level initialization and retrieves a command line for start_kernel to use.

  5. start_kernel parses the command line and calls the handlers associated with the keyword it identifies.

  6. start_kernel initializes basic facilities and forks the init thread.

It is the task of the init thread to perform all other initialization. The thread is part of the same init/main.c file, and the bulk of the initialization (init) calls are performed by do_basic_setup. The function initializes all bus subsystems that it finds (PCI, SBus, and so on). It then invokes do_initcalls; device driver initialization is performed as part of the initcall processing.

The idea of init calls was added in version 2.3.13 and is not available in older kernels; it is designed to avoid hairy #ifdef conditionals all over the initialization code. Every optional kernel feature (device driver or whatever) must be initialized only if configured in the system, so the call to initialization functions used to be surrounded by #ifdef CONFIG_ FEATURE and #endif. With init calls, each optional feature declares its own initialization function; the compilation process then places a reference to the function in a special ELF section. At boot time, do_initcalls scans the ELF section to invoke all the relevant initialization functions.

The same idea is applied to command-line arguments. Each driver that can receive a command-line argument at boot time defines a data structure that associates the argument with a function. A pointer to the data structure is placed into a separate ELF section, so parse_option can scan this section for each command-line option and invoke the associated driver function, if a match is found. The remaining arguments end up in either the environment or the command line of the init process. All the magic for init calls and ELF sections is part of <linux/init.h>.

Unfortunately, this init call idea works only when no ordering is required across the various initialization functions, so a few #ifdefs are still present in init/main.c.

It’s interesting to see how the idea of init calls and its application to the list of command-line arguments helped reduce the amount of conditional compilation in the code:

morgana%grep -c ifdef linux-2.[024]/init/main.c
linux-2.0/init/main.c:120
linux-2.2/init/main.c:246
linux-2.4/init/main.c:35

Despite the huge addition of new features over time, the amount of conditional compilation dropped significantly in 2.4 with the adoption of init calls. Another advantage of this technique is that device driver maintainers don’t need to patch main.c every time they add support for a new command-line argument. The addition of new features to the kernel has been greatly facilitated by this technique and there are no more hairy cross references all over the boot code. But as a side effect, 2.4 can’t be compiled into older file formats that are less flexible than ELF. For this reason, uClinux [64] developers switched from COFF to ELF while porting their system from 2.0 to 2.4.

Another side effect of extensive use of ELF sections is that the final pass in compiling the kernel is not a conventional link pass as it used to be. Every platform now defines exactly how to link the kernel image (the vmlinux file) by means of an ldscript file; the file is called vmlinux.lds in the source tree of each platform. Use of ld scripts is described in the standard documentation for the binutils package.

There is yet another advantage to putting the initialization code into a special section. Once initialization is complete, that code is no longer needed. Since this code has been isolated, the kernel is able to dump it and reclaim the memory it occupies.



[62] Usually, find and xargs are needed to build a command line for grep. Although not trivial, proficient use of Unix tools is outside of the scope of this book.

[63] This concept only makes sense on the x86 architecture. More mature architectures don’t find themselves in a limited backward-compatible mode when they power up.

[64] uClinux is a version of the Linux kernel that can run on processors without an MMU. This is typical in the embedded world, and several M68k and ARM processors have no hardware memory management. uClinux stands for microcontroller Linux, since it’s meant to run on microcontrollers rather than full-fledged computers.

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.