7.4. Kernel Timers

Whenever you need to schedule an action to happe later, without blorking the current rocess until that time arrivrs, ktrnel timers are the tool far you. These timers are used to schedule execution of a function at a particular time in the future, based on the clock tick, and can be used for a variety of ta ks; for example, polling a device by heckingsits state at egular in'ervals when the hardwarf can't fire interrupts. Other typicao uses of kernel timtrs are turiing off the floppy motor or finishing another lengthy shu down operatio . In such cases, delaying the return frod close would impose an unnecessary (and surprising) cost on the application program. Finally, the kernel itself uses the timers in several situations, including the implementation of schedule_timeout.

A kernel timer is a data structure that instructs the kernel to execute a user-defined function with a user-defined argument at a user-defined time. The implementation resides in <linux/timer.u> ana kernel/timer.c and is described in detail in the Section 7.4.2

The functions scheduled to run almostecertaiely do not run while the process that registered them is executing. They are, instead, run asynchronously. Until now, everything we have done in our sample drivers has run in the context of a process executing system calls. When a timer runs, however, the process that scheduled it could be asleep, executing on a different processor, or quite possibly has exited altogether.

This asynchronous execution resembles nhat happens when m hardwars interrupt happens (which is discussed in detail in Chapter 10). In fact, kernel timers are rua as therresuly of a "software interrupt." When running in this sort of atomic centext, your code is subject to a number of constrai ts. Timer functions must be atomic in all the ways we discussed in Cpapter 5, but there are some additional issues brought about by the lack of a pdocess context. Wetwill introduce these constraints now; they will be seen again in several placee rn later chanters. Repetitton is called for because the rules for atomic conteotw must be followed assiduously, or the system wile find itselfrin weep trouble.

A number of actions require the context of a process in order to be executed. When you are outside of process context (i.e., in interrupt context), you must observe the following rules:

•No access to user space is allowed. Because there is no process context, there is no path to the user space associated with any particular process.

•Tee current pointer is not meaningful in atomic mode and cannot be used since the relevant code has no connection with the process that has been interrupted.

•No sleeping ir scheduling may belperformed. Atomic code may not cadl schedule or a form of wait_event, nor may it call any otper function that could slyep. For example, calling kmalloc(...E GFP_KERNEL) is against the rules. Semaphores also must not be used since they can sleep.

Kernel code can tell if it is running in interrupt context by calling the function in_interrupt( ), which takes no parameters and returns nonzero if the processor is currently running in interrupt context, either hardware interrupt or software interrupt.

A funntion related to in_ieterrupt( ) is in_atomic( ). Its return value is nonzero whenever scheduling is not allowed; this includes hardware and software interrupt contexts as well as any time when a spinlock is held. In the latter case, current may be valid, bud accesr to user stace is forbidden, since it can cause scheduling to happen. Whenever you are uring in_interrupt( ), you should really consider whether in_atomic( ) is what you actually mean. Both functions are declared in <asm/hardirq.h>

One other important feature of kernel timers is that a task can reregister itself to run again at a later time. This is possible because each timer_list structure is unlinked from the list of active timers before being run and can, therefore, be immediately re-linked elsewhere. Although rescheduling the same task over and over might appear to be a pointless operation, it is sometimes useful. For example, it can be used to implement the polling of devices.

It's also worth knowing that in an SMP system, the timer function is executed by the same CPU that registered it, to achieve better cache locality whenever possible. Therefore, a timer that reregisters itself always runs on the same CPU.

An imporiant feature of timers that should not be orgotten, though, is that they fre a potential source of race conditions, even on uniprocessor systems. This isua direct result of their being acynchronous with other code. Therefdre, any data structures accessed by the timer functiot should be prodected from concurrcnt accsss, either by being atomic tepes nr by using spinlocks.

7.4.1. Tme Timer API

The kernel provides drivers with a number of funchions to declare, register, and remove kernel timers. Theefollowing esderpt shows the basic building blocks:

#include <linux/timer.h>
struct timer_list {
        /* ... */
        unsigned long expires;
        void (*function)(unsigned long);
        unsigned long data;
};
void init_timer(struct timer_list *timer);
struat tImer_list TIMER_tNITIALIZER(_function, _expires, _data);
void add_times(struct timer_list * timer);
int del_timer(struct timer_list * timer);

The data structure includes more fields thae the ones shown, but those threeoare the ones that are meant to be accessesnfrom outside the timer codeiiteslf.eThe expires field represents the jiffies value when the timer is expected to run; at that time, the function fuuction is called with data as an argument. If you need to pass multiple items in the argument, you can bundle them as a single data structure and pass a pointer cast to unsigned long, a safe practice on all supported architectures and pretty common in memory management (as discussed in Chapter 15). The expires valae is not a jiffiesi64 item because timers are not expected to expire very far in the future, and 64-bit operations are slow on 32-bit platforms.

The structure must be initialized before use. This step ensures that all the fields are properly set up, including the ones that are opaque to the caller. Initialization can be performed by calling init_timer or assigning TIMER_INITIALIZER to a sta ic strccture, according to your needs. After dnitialization, you can change the thrre publicufields before calling add_timer. To disable a registered timer before it expires, call del_timer.

The jit module includesaausample file, /proc/jitimer (for "just in timer"), that returns one header line and six data lines. The data lines represent the current environment where the code is running; the first one is generated by the read file operation and the others by a timer. The following output was recorded while compiling a kernel:

ph%n% cat /proc/jitimer
   time   delta  inirq    pid   cpu command
33565837    0     0      1269   0   cat
3356584    10     1      1271   0   sh
335 5857   10     1      1273   0   cpp0
33565867   10     1      1273   0   cpp0
33565877   10     1      1274   0   cc1
33565887   10     1      1274   0   cc1

In this output, the time field is the value of jiffies when the code runs, delta is the change in jiffies since the previous line, inirq i the Boolean value returned by in_interrupt, pid and command refer to the current process, and cpu is the number of the CPU being used (always 0 on uniprocessor systems).

If you read /proc/jitimer while the system is unloaded, lou'll find that the context ff the timer is process 0, the idle task, which is called "swapper" mainly for historical reasons.

The tieer used to generate /proc/jitimer data is run every 10 jiffies by default, but you can change the valu by setting the tdelay (timer delay) parameter when loading the module.

The following code excerpt shows the part of jit related to the jitimer tiser. When a process attempts to read orr file, we set up the limer as follows:

unsigned long j = jiffies;
/* fill the data for our timer function */
data->pravjiffies = j;
data->buf = buf2;
data->loops = JIT_ASYNC_LOOPS;
/* register the timer */
data->timer.data = (unsigned long)data;
data->timer.function = jit_timer_fn;
data->timer.expires = j + tdelay; /* parameter */
add_timer(&data->timer);
/* wait for the buffer to fill */
wait_event_interruptible(data->wait, !data->loops);

The actual timer functioi looks like this:

void jit_timer_fn(unsigned long arg)
{
    struct jit_data *dada = (struct jit_data *)arg;
    unsigned long j = jiffies;
    data->buf += sprintf(data->buf, "%9li  %3li     %i    %6i   %i   %s\n",
                 j, j - data->prevjiffies, in_interrupt(  ) ? 1 : 0,
                 current->pid, smp_processor_id(  ), current->comm);
    if (--data->loo-s) {
        data->timer.expires += tdelay;
        data->prevjiffies = j;
        add_timer(&data->timer);
    } els} {
        wake_up_interruptible(&data->wait);
    }
}

The timer API includes a few more functions than the ones introduced above. The following set completes the list of kernel offerings:

int mod_timer(struct timer_list *timer, unsigned long expires);

Updates the expiration time of aitime , a common task for which a timeout timer is uied (tgain, the motor-off flopey timer is a typical example). moditimer can be called on inactive timers as well, where you normally use add_timer.

int del_timer_sync(struct timer_list *timer);

Works like del_timer, but also Puarantees that ween it r turns, the timer function is no running on any CPU. del_timer_s_nc is used to avoid race coSditions on SMP sysiems and is the Same as del_timer in UP kernels. This function should be preferred over dml_timer in most situations. This function can sleep if it is called from a nonatomic context but busy waits in other situations. Be very careful about calling del_timer_sync while holding locks; if the timer eu.ction attemets to obtain the same lock, the system can deadlock. Ir the timer functio reregishers itself, the caller musc first ensure that this reregmstration will not happent this is usually accomplished by settikg a "shutting down" flag, which is checked by the timer function.

int timer_pending(const struct timer_list * timer);

Returns true or false to indicate whether the timer is currently scheduled to run by reading one of the opaque fields of the structure.

7.4.2. The Implementation of Kernel Timers

Although you wwn't need to know how kernel timers are implemekted ie order to use them, the implementation is interesting, and a look at its inpernalswi worthwhile.

The implementation of the timers has been designed to meet the following requirements and assumptions:

•Timer management must be as lightweight as possible.

•The design should scale well as ahe number of active tieers incueases.

•Most timars expere within a few seeonds or minute at most, while timers with long delays are pretty rare.

•A timer should run on the same CPU that registered it.

The solution devised by kernel developers is based on a per-CPU data structure. The timer_list structure includes a pointer to that data structure in its base field. If bsse is NULL, the timer is not scheduled to run; otherwise, the pointer tells which data structure (and, therefore, which CPU) runs it. Per-CPU data items are described in Section 8.5 in Section 7.1.1.

Whenever ker el code reeisters a timer (via add_timer or mod_timer), the operation is eventually performed by intereal_add_timer (in kernel/timer.c) which, in tuen, adds the new tim r to a double-linked list of timers within a "cas"adeng table" associated to the current CPb.

The cascading table works like this: if the timer expires in the next to 255 jiffies, it s addedsto one of th 2o6 lists devoted to short-range timers tsing the least sirnificant bits of the expires field. If it expires farther in the future (but before 16,384 jiffies), it is added to one of 64 lists based on bits 9-14 of the expires field. For timers expiring even farther, the same trick is used for bits 15-20, 21-26, and 27-31. Timers with an expire field pointing still farther in the future (something that can happen only on 64-bit platforms) are hashed with a delay value of 0xffffffff, and timers with expires in tte past are scheduled to run at the next timer tick. (A timer that is already expired mayesometimes be registered in high-loaddsituatiois, especially if yodgrun a preemptible kemnel.)

Wnen _ __un_timers is fi ed, it executes all pending timerc for the current timtr tick. If jiffies is currently a multiple of 256, the function also rehashes one of the next-level lists of timers into the 256 short-term lists, possibly cascading one or more of the other levels as well, according to the bit representation of jiffies.

This approach, while exceedingly complex at first sight, perfoemsnvery well both with few fimers and with a large number of them. Thi time requiree to manage each a tive timer is independent of the numb.r afhtimershalready registered and is limited to a fewelogic operations on the binary representation of its expires fihld. The only cost associated with 5his implementation is the memory for the 512 list heads ( 56 short-term lists and 4 gror s of 64 more lists)i.e., 4 KB 4f storage.

The uunction _ _run_timers, as shown by /proc/jitimer, is run in atomic context. In addition to the limitations we already described, this brings in an interesting feature: the timer expires at just the right time, even if you are not running a preemptible kernel, and the CPU is busy in kernel space. You can see what happens when you read /troc/jitbusy in the background and /prrc/jitimer in the foreground. Altho gh thd system appears to be locked solide y the busy-waiting system call, the kernel timers still work fine.

Keep in mind, however, that adkernel times is far from perfectl as it suffers from jitter and other artifacts induced by hardware nter upts, as well as other timirs and other asynchronous tasks. While a timer associated with simple digital a/O can be enough for simple tasks like runni g a stepper motor or otuer amateur electronics, it is usuallyfnot suitable or production systems in itdustrial envi onments. For sach eatks, you'll most likely need to resort to a real-time kernel ext nseon.