File Operations

In the next few sections, we’ll look at the various operations a driver can perform on the devices it manages. An open device is identified internally by a file structure, and the kernel uses the file_operations structure to access the driver’s functions. The structure, defined in <linux/fs.h>, is an array of function pointers. Each file is associated with its own set of functions (by including a field called f_op that points to a file_operations structure). The operations are mostly in charge of implementing the system calls and are thus named open, read, and so on. We can consider the file to be an “object” and the functions operating on it to be its “methods,” using object-oriented programming terminology to denote actions declared by an object to act on itself. This is the first sign of object-oriented programming we see in the Linux kernel, and we’ll see more in later chapters.

Conventionally, a file_operations structure or a pointer to one is called fops (or some variation thereof); we’ve already seen one such pointer as an argument to the register_chrdev call. Each field in the structure must point to the function in the driver that implements a specific operation, or be left NULL for unsupported operations. The exact behavior of the kernel when a NULL pointer is specified is different for each function, as the list later in this section shows.

The file_operations structure has been slowly getting bigger as new functionality is added to the kernel. The addition of new operations can, of course, create portability problems for device drivers. Instantiations of the structure in each driver used to be declared using standard C syntax, and new operations were normally added to the end of the structure; a simple recompilation of the drivers would place a NULL value for that operation, thus selecting the default behavior, usually what you wanted.

Since then, kernel developers have switched to a “tagged” initialization format that allows initialization of structure fields by name, thus circumventing most problems with changed data structures. The tagged initialization, however, is not standard C but a (useful) extension specific to the GNU compiler. We will look at an example of tagged structure initialization shortly.

The following list introduces all the operations that an application can invoke on a device. We’ve tried to keep the list brief so it can be used as a reference, merely summarizing each operation and the default kernel behavior when a NULL pointer is used. You can skip over this list on your first reading and return to it later.

The rest of the chapter, after describing another important data structure (the file, which actually includes a pointer to its own file_operations), explains the role of the most important operations and offers hints, caveats, and real code examples. We defer discussion of the more complex operations to later chapters because we aren’t ready to dig into topics like memory management, blocking operations, and asynchronous notification quite yet.

The following list shows what operations appear in struct file_operations for the 2.4 series of kernels, in the order in which they appear. Although there are minor differences between 2.4 and earlier kernels, they will be dealt with later in this chapter, so we are just sticking to 2.4 for a while. The return value of each operation is 0 for success or a negative error code to signal an error, unless otherwise noted.

loff_t (*llseek) (struct file *, loff_t, int);

The llseek method is used to change the current read/write position in a file, and the new position is returned as a (positive) return value. The loff_t is a “long offset” and is at least 64 bits wide even on 32-bit platforms. Errors are signaled by a negative return value. If the function is not specified for the driver, a seek relative to end-of-file fails, while other seeks succeed by modifying the position counter in the file structure (described in Section 3.4 later in this chapter).

ssize_t (*read) (struct file *, char *, size_t, loff_t *);

Used to retrieve data from the device. A null pointer in this position causes the read system call to fail with -EINVAL (“Invalid argument”). A non-negative return value represents the number of bytes successfully read (the return value is a “signed size” type, usually the native integer type for the target platform).

ssize_t (*write) (struct file *, const char *, size_t, loff_t *);

Sends data to the device. If missing, -EINVAL is returned to the program calling the write system call. The return value, if non-negative, represents the number of bytes successfully written.

int (*readdir) (struct file *, void *, filldir_t);

This field should be NULL for device files; it is used for reading directories, and is only useful to filesystems.

unsigned int (*poll) (struct file *, struct poll_table_struct *);

The poll method is the back end of two system calls, poll and select, both used to inquire if a device is readable or writable or in some special state. Either system call can block until a device becomes readable or writable. If a driver doesn’t define its poll method, the device is assumed to be both readable and writable, and in no special state. The return value is a bit mask describing the status of the device.

int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);

The ioctl system call offers a way to issue device-specific commands (like formatting a track of a floppy disk, which is neither reading nor writing). Additionally, a few ioctl commands are recognized by the kernel without referring to the fops table. If the device doesn’t offer an ioctl entry point, the system call returns an error for any request that isn’t predefined (-ENOTTY, “No such ioctl for device”). If the device method returns a non-negative value, the same value is passed back to the calling program to indicate successful completion.

int (*mmap) (struct file *, struct vm_area_struct *);

mmap is used to request a mapping of device memory to a process’s address space. If the device doesn’t implement this method, the mmap system call returns -ENODEV.

int (*open) (struct inode *, struct file *);

Though this is always the first operation performed on the device file, the driver is not required to declare a corresponding method. If this entry is NULL, opening the device always succeeds, but your driver isn’t notified.

int (*flush) (struct file *);

The flush operation is invoked when a process closes its copy of a file descriptor for a device; it should execute (and wait for) any outstanding operations on the device. This must not be confused with the fsync operation requested by user programs. Currently, flush is used only in the network file system (NFS) code. If flush is NULL, it is simply not invoked.

int (*release) (struct inode *, struct file *);

This operation is invoked when the file structure is being released. Like open, release can be missing.[18]

int (*fsync) (struct inode *, struct dentry *, int);

This method is the back end of the fsync system call, which a user calls to flush any pending data. If not implemented in the driver, the system call returns -EINVAL.

int (*fasync) (int, struct file *, int);

This operation is used to notify the device of a change in its FASYNC flag. Asynchronous notification is an advanced topic and is described in Chapter 5. The field can be NULL if the driver doesn’t support asynchronous notification.

int (*lock) (struct file *, int, struct file_lock *);

The lock method is used to implement file locking; locking is an indispensable feature for regular files, but is almost never implemented by device drivers.

ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); , ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);

These methods, added late in the 2.3 development cycle, implement scatter/gather read and write operations. Applications occasionally need to do a single read or write operation involving multiple memory areas; these system calls allow them to do so without forcing extra copy operations on the data.

struct module *owner;

This field isn’t a method like everything else in the file_operations structure. Instead, it is a pointer to the module that “owns” this structure; it is used by the kernel to maintain the module’s usage count.

The scull device driver implements only the most important device methods, and uses the tagged format to declare its file_operations structure:

struct file_operations scull_fops = {
 llseek:  scull_llseek,
 read:  scull_read,
 write:  scull_write,
 ioctl:  scull_ioctl,
 open:  scull_open,
 release: scull_release,
};

This declaration uses the tagged structure initialization syntax, as we described earlier. This syntax is preferred because it makes drivers more portable across changes in the definitions of the structures, and arguably makes the code more compact and readable. Tagged initialization allows the reordering of structure members; in some cases, substantial performance improvements have been realized by placing frequently accessed members in the same hardware cache line.

It is also necessary to set the owner field of the file_operations structure. In some kernel code, you will often see owner initialized with the rest of the structure, using the tagged syntax as follows:

 owner: THIS_MODULE,

That approach works, but only on 2.4 kernels. A more portable approach is to use the SET_MODULE_OWNER macro, which is defined in <linux/module.h>. scull performs this initialization as follows:

 SET_MODULE_OWNER(&scull_fops);

This macro works on any structure that has an owner field; we will encounter this field again in other contexts later in the book.



[18] Note that release isn’t invoked every time a process calls close. Whenever a file structure is shared (for example, after a fork or a dup), release won’t be invoked until all copies are closed. If you need to flush pending data when any copy is closed, you should implement the flush method.

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.