Chapter 10. Judicious Use of Data Types

Before we go on to more advanced topics, we need to stop for a quick note on portability issues. Modern versions of the Linux kernel are highly portable, running on several very different architectures. Given the multiplatform nature of Linux, drivers intended for serious use should be portable as well.

But a core issue with kernel code is being able both to access data items of known length (for example, filesystem data structures or registers on device boards) and to exploit the capabilities of different processors (32-bit and 64-bit architectures, and possibly 16 bit as well).

Several of the problems encountered by kernel developers while porting x86 code to new architectures have been related to incorrect data typing. Adherence to strict data typing and compiling with the -Wall -Wstrict-prototypes flags can prevent most bugs.

Data types used by kernel data are divided into three main classes: standard C types such as int, explicitly sized types such as u32, and types used for specific kernel objects, such as pid_t. We are going to see when and how each of the three typing classes should be used. The final sections of the chapter talk about some other typical problems you might run into when porting driver code from the x86 to other platforms, and introduce the generalized support for linked lists exported by recent kernel headers.

If you follow the guidelines we provide, your driver should compile and run even on platforms on which you are unable to test it.

Use of Standard C Types

Although most programmers are accustomed to freely using standard types like int and long, writing device drivers requires some care to avoid typing conflicts and obscure bugs.

The problem is that you can’t use the standard types when you need “a two-byte filler” or “something representing a four-byte string” because the normal C data types are not the same size on all architectures. To show the data size of the various C types, the datasize program has been included in the sample files provided on the O’Reilly FTP site, in the directory misc-progs. This is a sample run of the program on a PC (the last four types shown are introduced in the next section):

morgana% misc-progs/datasize
arch   Size:  char  shor   int  long   ptr long-long  u8 u16 u32 u64
i686            1     2     4     4     4     8        1   2   4   8

The program can be used to show that long integers and pointers feature a different size on 64-bit platforms, as demonstrated by running the program on different Linux computers:

arch   Size:  char  shor   int  long   ptr long-long  u8 u16 u32 u64
i386            1     2     4     4     4     8        1   2   4   8
alpha           1     2     4     8     8     8        1   2   4   8
armv4l          1     2     4     4     4     8        1   2   4   8
ia64            1     2     4     8     8     8        1   2   4   8
m68k            1     2     4     4     4     8        1   2   4   8
mips            1     2     4     4     4     8        1   2   4   8
ppc             1     2     4     4     4     8        1   2   4   8
sparc           1     2     4     4     4     8        1   2   4   8
sparc64         1     2     4     4     4     8        1   2   4   8

It’s interesting to note that the user space of Linux-sparc64 runs 32-bit code, so pointers are 32 bits wide in user space, even though they are 64 bits wide in kernel space. This can be verified by loading the kdatasize module (available in the directory misc-modules within the sample files). The module reports size information at load time using printk and returns an error (so there’s no need to unload it):

kernel: arch   Size:  char short int long  ptr long-long u8 u16 u32 u64
kernel: sparc64         1    2    4    8    8     8       1   2   4   8

Although you must be careful when mixing different data types, sometimes there are good reasons to do so. One such situation is for memory addresses, which are special as far as the kernel is concerned. Although conceptually addresses are pointers, memory administration is better accomplished by using an unsigned integer type; the kernel treats physical memory like a huge array, and a memory address is just an index into the array. Furthermore, a pointer is easily dereferenced; when dealing directly with memory addresses you almost never want to dereference them in this manner. Using an integer type prevents this dereferencing, thus avoiding bugs. Therefore, addresses in the kernel are unsigned long, exploiting the fact that pointers and long integers are always the same size, at least on all the platforms currently supported by Linux.

The C99 standard defines the intptr_t and uintptr_t types for an integer variable which can hold a pointer value. These types are almost unused in the 2.4 kernel, but it would not be surprising to see them show up more often as a result of future development work.

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.