In the next few sections, we’ll look at the various operations a
driver can perform on the devices it manages. An open device is
identified internally by a file
structure, and the
kernel uses the file_operations
structure to access
the driver’s functions. The structure, defined in
<linux/fs.h>
, is an array of function
pointers. Each file is associated with its own set of functions (by
including a field called f_op
that points to a
file_operations
structure). The operations are
mostly in charge of implementing the system calls and are thus named
open, read, and so on. We
can consider the file to be an “object” and the functions operating
on it to be its “methods,” using object-oriented programming
terminology to denote actions declared by an object to act on itself.
This is the first sign of object-oriented programming we see in the
Linux kernel, and we’ll see more in later chapters.
Conventionally, a file_operations
structure or a
pointer to one is called fops
(or some variation
thereof); we’ve already seen one such pointer as an argument to the
register_chrdev call. Each field in the structure
must point to the function in the driver that implements a specific
operation, or be left NULL
for unsupported
operations. The exact behavior of the kernel when a
NULL
pointer is specified is different for each
function, as the list later in this section shows.
The file_operations
structure has been slowly
getting bigger as new functionality is added to the kernel. The
addition of new operations can, of course, create portability problems
for device drivers. Instantiations of the structure in each driver
used to be declared using standard C syntax, and new operations were
normally added to the end of the structure; a simple recompilation of
the drivers would place a NULL
value for that
operation, thus selecting the default behavior, usually what you
wanted.
Since then, kernel developers have switched to a “tagged” initialization format that allows initialization of structure fields by name, thus circumventing most problems with changed data structures. The tagged initialization, however, is not standard C but a (useful) extension specific to the GNU compiler. We will look at an example of tagged structure initialization shortly.
The following list introduces all the operations that an application
can invoke on a device. We’ve tried to keep the list brief so it can
be used as a reference, merely summarizing each operation and the
default kernel behavior when a NULL
pointer is
used. You can skip over this list on your first reading and return to
it later.
The rest of the chapter, after describing another important data
structure (the file
, which actually includes a
pointer to its own file_operations
), explains the
role of the most important operations and offers hints, caveats, and
real code examples. We defer discussion of the more complex operations
to later chapters because we aren’t ready to dig into topics like
memory management, blocking operations, and asynchronous notification
quite yet.
The following list shows what operations appear in struct file_operations
for the 2.4 series of kernels, in the order
in which they appear. Although there are minor differences between 2.4
and earlier kernels, they will be dealt with later in this chapter, so
we are just sticking to 2.4 for a while. The return value of each
operation is 0 for success or a negative error code to signal an
error, unless otherwise noted.
-
loff_t (*llseek) (struct file *, loff_t, int);
The llseek method is used to change the current read/write position in a file, and the new position is returned as a (positive) return value. The
loff_t
is a “long offset” and is at least 64 bits wide even on 32-bit platforms. Errors are signaled by a negative return value. If the function is not specified for the driver, a seek relative to end-of-file fails, while other seeks succeed by modifying the position counter in thefile
structure (described in Section 3.4 later in this chapter).-
ssize_t (*read) (struct file *, char *, size_t, loff_t *);
Used to retrieve data from the device. A null pointer in this position causes the read system call to fail with
-EINVAL
(“Invalid argument”). A non-negative return value represents the number of bytes successfully read (the return value is a “signed size” type, usually the native integer type for the target platform).-
ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
Sends data to the device. If missing,
-EINVAL
is returned to the program calling the write system call. The return value, if non-negative, represents the number of bytes successfully written.-
int (*readdir) (struct file *, void *, filldir_t);
This field should be
NULL
for device files; it is used for reading directories, and is only useful to filesystems.-
unsigned int (*poll) (struct file *, struct poll_table_struct *);
The poll method is the back end of two system calls, poll and select, both used to inquire if a device is readable or writable or in some special state. Either system call can block until a device becomes readable or writable. If a driver doesn’t define its poll method, the device is assumed to be both readable and writable, and in no special state. The return value is a bit mask describing the status of the device.
-
int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
The ioctl system call offers a way to issue device-specific commands (like formatting a track of a floppy disk, which is neither reading nor writing). Additionally, a few ioctl commands are recognized by the kernel without referring to the
fops
table. If the device doesn’t offer an ioctl entry point, the system call returns an error for any request that isn’t predefined (-ENOTTY
, “No such ioctl for device”). If the device method returns a non-negative value, the same value is passed back to the calling program to indicate successful completion.-
int (*mmap) (struct file *, struct vm_area_struct *);
mmap is used to request a mapping of device memory to a process’s address space. If the device doesn’t implement this method, the mmap system call returns
-ENODEV
.-
int (*open) (struct inode *, struct file *);
Though this is always the first operation performed on the device file, the driver is not required to declare a corresponding method. If this entry is
NULL
, opening the device always succeeds, but your driver isn’t notified.-
int (*flush) (struct file *);
The flush operation is invoked when a process closes its copy of a file descriptor for a device; it should execute (and wait for) any outstanding operations on the device. This must not be confused with the fsync operation requested by user programs. Currently, flush is used only in the network file system (NFS) code. If flush is
NULL
, it is simply not invoked.-
int (*release) (struct inode *, struct file *);
This operation is invoked when the
file
structure is being released. Like open, release can be missing.[18]-
int (*fsync) (struct inode *, struct dentry *, int);
This method is the back end of the fsync system call, which a user calls to flush any pending data. If not implemented in the driver, the system call returns
-EINVAL
.-
int (*fasync) (int, struct file *, int);
This operation is used to notify the device of a change in its
FASYNC
flag. Asynchronous notification is an advanced topic and is described in Chapter 5. The field can beNULL
if the driver doesn’t support asynchronous notification.-
int (*lock) (struct file *, int, struct file_lock *);
The lock method is used to implement file locking; locking is an indispensable feature for regular files, but is almost never implemented by device drivers.
-
ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
,ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
These methods, added late in the 2.3 development cycle, implement scatter/gather read and write operations. Applications occasionally need to do a single read or write operation involving multiple memory areas; these system calls allow them to do so without forcing extra copy operations on the data.
-
struct module *owner;
This field isn’t a method like everything else in the
file_operations
structure. Instead, it is a pointer to the module that “owns” this structure; it is used by the kernel to maintain the module’s usage count.
The scull device driver implements only the
most important device methods, and uses the tagged format to declare
its file_operations
structure:
struct file_operations scull_fops = { llseek: scull_llseek, read: scull_read, write: scull_write, ioctl: scull_ioctl, open: scull_open, release: scull_release, };
This declaration uses the tagged structure initialization syntax, as we described earlier. This syntax is preferred because it makes drivers more portable across changes in the definitions of the structures, and arguably makes the code more compact and readable. Tagged initialization allows the reordering of structure members; in some cases, substantial performance improvements have been realized by placing frequently accessed members in the same hardware cache line.
It is also necessary to set the owner
field of the
file_operations
structure. In some kernel code, you
will often see owner
initialized with the rest of
the structure, using the tagged syntax as follows:
owner: THIS_MODULE,
That approach works, but only on 2.4 kernels. A more portable approach
is to use the SET_MODULE_OWNER
macro, which is
defined in
<linux/module.h>
. scull
performs this initialization as follows:
SET_MODULE_OWNER(&scull_fops);
This macro works on any structure that has an owner
field; we will encounter this field again in other contexts later in
the book.
[18] Note that
release isn’t invoked every time a process calls
close. Whenever a file
structure is shared (for example, after a fork or
a dup), release won’t be
invoked until all copies are closed. If you need to flush pending data
when any copy is closed, you should implement the
flush method.
Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.