Understanding Linux Network Internals

Chapter 4. Notification Chains

The kernel’s many subsystems are heavily interdependent, so an event detected or generated by one of them could be of interest to others. To fulfill the need for interaction, Linux uses so-called notification chains .

In this chapter, we will see:

How notification chains are declared and what chains are defined by the networking code
How a kernel subsystem can register to a notification chain
How a kernel subsystem generates a notification on a chain

Note that notification chains are used only between kernel subsystems. Notifications between kernel and user space rely on other mechanisms, such as those introduced in Chapter 3.

Reasons for Notification Chains

Suppose we had the Linux router in Figure 4-1 with four interfaces. The figure shows the relationship between the router and five networks, along with a simplified version of its routing table.

Let’s look at some examples of the topology in Figure 4-1. Network A is directly connected to RT on interface eth0, and network F is not directly connected to RT, but RT’s eth3 is directly connected to another router that has an interface with address IP1, and that second router knows how to reach network F. The other cases are similar. In short, some networks are directly connected and others require the help of one or more additional routers to be reached.

For a detailed description of how the routing code handles this situation, refer to Part VII. In this chapter, we will concentrate on the role of notification chains. Suppose that interface eth3 went down, due to a break in the network, an administrative command (such as ifconfig eth3 down) or a hardware failure. Networks D, E, and F would become unreachable by RT (and by systems in A, B, and C relying on RT for their connections) and should be removed from the routing table. Who is going to tell the routing subsystem about that interface failure? A notification chain.

Figure 4-1. Example of Linux router

Figure 4-2 shows a slightly more complicated example where the routing subsystem interacts with dynamic routing protocols—protocols that can adjust the routing table or tables^[*] to the network topology and therefore cope with interface failures when the topology allows it (i.e., when there are redundant paths).

Figure 4-2. Example of a Linux router with dynamic routing protocols

In Figure 4-2, network F could be reached by RT by passing through both network A and network E. E was chosen initially because of its smaller cost,^[†] but now that E is no longer reachable, the routing table should update the route for network F to go through network A. The basis for such a decision could include local host events, such as device registration and unregistration, as well as complex factors in router configuration and the routing protocols used. In any case, the routing subsystem that manages the tables must be informed of the relevant information by some other subsystem, demonstrating the need for notification chains.

Overview

A notification chain is simply a list of functions to execute when a given event occurs. Each function lets one other subsystem know about an event that occurred within, or was detected by, the subsystem calling the function.

Thus, for each notification chain there is a passive side (the notified) and an active side (the notifier), as in the so-called publish-and-subscribe model:

The notified are the subsystems that ask to be notified about the event and that provide a callback function to invoke.
The notifier is the subsystem that experiences an event and calls the callback function.

The functions executed are chosen by the notified subsystems. It is never up to the owner of the chain (the subsystem that generates the notifications) to decide what functions to execute. The owner simply defines the list; any kernel subsystem can register a callback function with that chain to receive the notification.

The use of notification chains makes the source code easier to write and maintain. Imagine how a generic routine might notify external subsystems about an event without using notification chains:

If (subsystem_X_enabled) {
    do_something_1
}
if (subsystem_Y_enabled) {
    do_something_2
}
If (subsystem_Z_enabled) {
    do_something_3
}
... ... ...

In other words, a conditional clause would have to be included for every possible subsystem that might be interested in an event, and the maintainer of this subsystem would have to add a new clause every time somebody else added a subsystem to the kernel.

No subsystem maintainer is expected to keep track of every subsystem added to the kernel. However, each subsystem maintainer should know:

The kinds of events from other subsystems he is interested in
The kinds of events he knows about and that other subsystems may be interested in

Thus, notification chains allow each subsystem to share the occurrence of an event with others, without having to know what the others are and why they are interested.

Defining a Chain

The elements of the notification chain’s list are of type notifier_block, whose definition is the following:

struct notifier_block
{
    int (*notifier_call)(struct notifier_block *self, unsigned long, void *);
    struct notifier_block *next;
    int priority;
};

notifier_call is the function to execute, next is used to link together the elements of the list, and priority represents the priority of the function. Functions with higher priority are executed first. But in practice, almost all registrations leave the priority out of the notifier_block definition, which means it gets the default value of 0 and execution order ends up depending only on the registration order (i.e., it is a semirandom order). The return values of notifier_call are listed in the upcoming section, "Notifying Events on a Chain.”

Common names for notifier_block instances are xxx _chain, xxx _notifier_chain, and xxx _notifier_list.

Registering with a Chain

When a kernel component is interested in the events of a given notification chain, it can register it with the general function notifier_chain_register. The kernel also provides a set of wrappers around notifier_chain_register, some of which are shown in Table 4-1.

Table 4-1 lists the main APIs and the associated wrappers used to register and unregister to the three chains inetaddr_chain , inet6addr_chain , and netdev_chain.

Table 4-1. Main functions and wrappers for a few chains

Operation	Function prototype
Registration	`int notifier_chain_register(struct notifier_block *list, struct notifier_block n)`
	Wrappers
	`inetaddr_chain`	`register_inetaddr_notifier`
	`inet6addr_chain`	`register_inet6addr_notifier`
	`netdev_chain`	`register_netdevice_notifier`
Unregistration	`int notifier_chain_unregister(struct notifier_block *nl, struct notifier_block n)`
	Wrappers
	`inetaddr_chain`	`unregister_inetaddr_notifier`
	`inet6addr_chain`	`unregister_inet6addr_notifier`
	`netdev_chain`	`unregister_netdevice_notifier`
Notification	`int notifier_call_chain(struct notifier_block *n, unsigned long val, void v)`

For each chain, the notifier_block instances are inserted into a list, which is sorted by priority. Elements with the same priority are sorted based on insertion time: new ones go to the tail.

Accesses to the notification chains are protected by the notifier_lock lock. The use of a single lock for all the notification chains is not a big constraint and does not affect performance, because subsystems usually register their notifier_call functions only at boot time or at module load time, and from that moment on access the lists in a read-only manner (that is, shared).

Because the notifier_chain_register function is called to insert callbacks into all lists, it requires that the list be specified as an input parameter. However, this function is rarely called directly; generic wrappers are used instead.

int notifier_chain_register(struct notifier_block **list, struct notifier_block *n)
{
    write_lock(&notifier_lock);
    while(*list)
    {
        if(n->priority > (*list)->priority)
            break;
        list= &((*list)->next);
    }
    n->next = *list;
    *list=n;
    write_unlock(&notifier_lock);
    return 0;
}

Notifying Events on a Chain

Notifications are generated with notifier_call_chain, defined in kernel/sys.c. This function simply invokes, in order of priority, all the callback routines registered against the chain. Note that callback routines are executed in the context of the process that calls notifier_call_chain. A callback routine could, however, be implemented so that it queues the notification somewhere and wakes up a process that will look at it.

int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v)
{
    int ret = NOTIFY_DONE;
    struct notifier_block *nb = *n;
 
    while (nb)
    {
        ret = nb->notifier_call(nb, val, v);
        if (ret & NOTIFY_STOP_MASK)
        {
            return ret;
        }
        nb = nb->next;
    }
    return ret;
}

This is the meaning of its three input parameters:

n: Notification chain.
val: Event type. The chain itself identifies a class of events; val unequivocally identifies an event type (i.e., NETDEV_REGISTER).
v: Input parameter that can be used by the handlers registered by the various clients. This can be used in different ways under different circumstances. For instance, when a new network device is registered with the kernel, the associated notification uses v to identify the net_device data structure.

The callback routines called by notifier_call_chain can return any of the NOTIFY_ XXX values defined in include/linux/notifier.h:

NOTIFY_OK: Notification was processed correctly.
NOTIFY_DONE: Not interested in the notification.^[*]
NOTIFY_BAD: Something went wrong. Stop calling the callback routines for this event.
NOTIFY_STOP: Routine invoked correctly. However, no further callbacks need to be called for this event.
NOTIFY_STOP_MASK: This flag is checked by notifier_call_chain to see whether to stop invoking the callback routines, or keep going. Both NOTIFY_BAD and NOTIFY_STOP include this flag in their definitions.

notifier_call_chain captures and returns the return value received by the last callback routine invoked. This is true regardless of whether all the callbacks have been invoked, or one of them interrupted the loop due to a return value of NOTIFY_BAD or NOTIFY_STOP.

Note that it is possible for notifier_call_chain to be called for the same notification chain on different CPUs at the same time. It is the responsibility of the callback functions to take care of mutual exclusion and serialization where needed.

Notification Chains for the Networking Subsystems

The kernel defines at least 10 different notification chains. Here we are interested in the ones that are used to signal events of particular importance to the networking code. The main ones are:

inetaddr_chain: Sends notifications about the insertion, removal, and change of an Internet Protocol Version 4 (IPv4) address on a local interface. Chapter 23 describes when such notifications are generated. Internet Protocol Version 6 (IPv6) uses a similar chain (inet6addr_chain ).
netdev_chain: Sends notifications about the registration status of network devices. Chapter 8 describes when such notifications are generated.

For these chains, and others used by the networking subsystems, their purposes and uses are described in the chapter about the relevant notifier subsystem.

The networking code can register to notifications generated by other kernel components, too. For example, some NIC device drivers register with the reboot_notifier_list chain, which is a chain that warns when the system is about to reboot.

Wrappers

Most notification chains come with a set of wrappers used to register to them and unregister from them. For example, this is the wrapper used to register to netdev_chain:

int register_netdevice_notifier(struct notifier_block *nb)
{
        return notifier_chain_register(&netdev_chain, nb);
}

Common names for wrappers include [un]register_ xxx _notifier, xxx _[un]register_notifier, and xxx _[un]register.

Examples

Registrations to notification chains usually take place when the interested kernel component is initialized. For example, the following snapshot from net/ipv4/fib_frontend.c shows ip_fib_init, which is the initialization routine used by the routing code that is described in the section "Routing Subsystem Initialization" in Chapter 32:

static struct notifier_block fib_inetaddr_notifier = {
    .notifier_call = fib_inetaddr_event,
};
 
static struct notifier_block fib_netdev_notifier = {
    .notifier_call = fib_netdev_event,
};
 
void _ _init ip_fib_init(void)
{
    ... ... ...
    register_netdevice_notifier(&fib_netdev_notifier);
    register_inetaddr_notifier(&fib_inetaddr_notifier);
}

The routing code registers to both of the chains introduced in the earlier section, "Notification Chains for the Networking Subsystems.” The routing tables are affected both by changes to locally configured IP addresses and by changes to the registration status of local devices.

Tuning via /proc Filesystem

There is no file of interest in /proc as far as this chapter is concerned.

Functions and Variables Featured in This Chapter

Table 4-2 summarizes the functions and data structures introduced in this chapter.

Table 4-2. Functions, macros, and data structures used for notification chains

Name	Description
Functions and macros
`notifier_chain_register` + wrappers `notifier_chain_unregister` + wrappers `notifier_call_chain`	The first two functions register and unregister a callback handler for a notification chain. The third sends out all the notifications about events in a specific class.
Data structure
`struct notifier_block`	Defines the handler for a notification. It includes the callback function to invoke.

Files and Directories Featured in This Chapter

Figure 4-3 lists the files referred to in this chapter.

Figure 4-3. Files related to notification chains

^[*]It is possible to have multiple routing tables at the same time. We will cover this feature in Chapter 31.

^[†]The cost of a link is one of the metrics that routing protocols can use to compare links and choose among them. See Chapter 30.

^[*]This return value is sometimes improperly used in place of NOTIFY_OK.

Get Understanding Linux Network Internals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Understanding Linux Network Internals by Christian Benvenuti