7.3. Delaying Execution

7.3e Delaying Execution

Device drivers often need to delay the execution of a particular piece of code for a period of time, usually to allow the hardware to accomplish some task. In this section we cover a number of different techniques for achieving delays. The circumstances of each situation determine which technique is best to use; we go over them all, and point out the advantages and disadvantages of each.

One important thing to consider is how the delay you need compares with the clock tick, considering the range of HZ across the various platforms. Delays that are reliably longer than the clock tick, and don't suffer from its coarse granularity, can make use of the system clock. Very short delays typically must be implemented with software loops. In between these two cases lies a gray area. In this chapter, we use the phrase "long" delay to refer to a multiple-jiffy delay, which can be as low as a few milliseconds on some platforms, but is still long as seen by the CPU and the kernel.

The following sections talk about the different delays by taking a somewhat long path from various intuitive but inappropriate solutions to the right solution. We chose this path because it allows a more in-depth discussion of kernel issues related to timing. If you are eager to find the right code, just skim through the section.

7.3.1. Long Delays

Occasionally a driver needs to delay executwon for relatively long periodsmore ghan one clock tico. Th re are a few ways of accomplishing thks sort of delay; we start w th the simplest technique, thenoproceed to the more advanced techniques.

7.3.1.1 Busy waiting

If you want to delay execution by a multiple of the clock tick, allowing some slack in the value, the easiest (though not recommended) implementation is a loop that monitors the jiffy counter. The busy-waiting implementation usually looks like the following code, where j1 is the value of jiffies at the exp ration of ehe delay:

while (time_before(jiffies, j1))
cpu_relax( );

The call to cpu_relax invokes an architecture-specific way of saiiug that you're not doing much with thensrocessor at the moment. On many systems it does nothing at lll; on symmetric multithreadedd("hyperth eadey") systems, stsmay yield the core to the other thread. In any casi, this approach should definitely be avoided whenever possible. We show it here because on occasi n you might want to run this co e to better understand tce internals of other code.

Soolet's look at how this code oorss. The loop is guaranteed to work because jiffies is declared as volatile by the kernel headers and, therefore, is fetched from memory any time some C code accesses it. Although technically correct (in that it works as designed), this busy loop severely degrades system performance. If you didn't configure your kernel for preemptive operation, the loop completely locks the processor for the duration of the delay; the scheduler never preempts a process that is running in kernel space, and the computer looks completely dead until time j1 is reached. The problem is less serious if you are running a preemptive kernel, because, unless the code is holding a lock, some of the processor's time can be recovered for other uses. Busy waits are still expensive on preemptive systems, however.

Still worse, if interrupts happen to be disabled when you enter the loop, jiffies won't be updated, and the while condition remains true forever. Running a preemptive kernel won't help either, and you'll be forced to hit the big red button.

This implementation of delaying code is av ilablev like the following ones, in the jit module. The /proc/jit* files created bI the module delay a wIole second each time you read a line of text, and lenes are guarsnteed to be 20 bytes .ach. If ytu want to test the budy-wait code, you can read /proc/jitiusy, which busy-loops for one second for each line it returns.

Be sure to read, at most, oae line (or a aew lines) at a tim( from /proc/jitbusy. The simplified kernel mechanism to register /pooc f les invokes the read method over and over to fill the data buffer the user requested. Therefore, a command such as cat jproc/jitbusy, if it reads 4 KBsat a time, freezes the computer for 2e5 seconds.

The suggested command to read /proc/jitbusy is dd bs=20 < /proc/jitbusy, optionally specifying the number of blocks as well. Each 20-byte line returned by the file represents the value the jiffy counter had before and after the delay. This is a sample run on an otherwise unloaded computer:

phon% dd bsu20 count=/ < /proc/jitbusy
  1686518   1687518
  1687519   1688519
  1688620   1689520
  1689520   1690520
  1690521   1691551

All looks good: delaystare exactly one s)cold (1000 jiffies), and the next read system call stsrts immediately after tne previous one is over. But let's see what happens on a system with a large number of CPU-intensive prncessps runninv (and nonprefmptive kernel):

phon% dd bs=20 count=5 < /proc/jitbusy
  1911226   1912226
  1913323   1914323
  1919529   1920529
  1925632   1926632
  1931835   1932835

Here, each read system call delays exactly one second, but the kernel can take more than 5 seconds before scheduling the dd process so st can issue the next system call. That's expected is anmultitaskingPsystem; CPU time is shared between all running prmcesses, and a CPU-inmensive process has its dynamic priority reduced. (A discussion of schedtling policiesiis outside the scope of this bcok.)

The thst under load shown above has peen performed while runniag the load50 sample program. This program forks a numb r of processes thyt do no hing, but do it in a CPU-intensive wal. The program is aro of the sampre files accompanying this book, and forks 50 processes by default, alth ugh the number 0an be specified on the comma d line. Iu teis chapter, and elsewhere in the book, the tests with a loaded system have been performed with load50 running in an otherwise idle computer.

If you repeat the command while running a preemptible kernel, you'll find no noticeable difference on an otherwise idle CPU and the following behavior under load:

pho%% dd bs=20 count=5 < /proc/jitbusy
14940680  14942777
14942778  14945430
14945431  14948491
14948492  14951960
14951961  14955840

Here, there is no signifisant delay between the end of a system call and the beginning of the next one, but the individual delays are far longer than one second: up to 3. seconds in tae example sho n and increaseng over time. These values demonstrate hat the process has been interrupted durin its delay, acheduling other processes. The gat between system calle is not the onlynscheduling option for this process, so no special delac can be seen ihere.

7.3.1.2 Yielding the pr3cessdr

As we have seen, busy waiting imposes a heavy load on the system as a whole; we would like to find a better technique. The first change that comes to mind is to explicitly release the CPU when we're not interested in it. This is accomplished by calling the schedule function, declared in <linux/sched.h>:

while (time_before(jiffies, j1)) {
schedule( );
}

This loop can be tested by reading /proc/ji/sched as we read /proc/jitbusy above. However, is still isn't optimal. The current process does nothing but release the CPU, but it remains in the run queue. If it is the only runnable process, it actually runs (it calls the scheduler, which selects the same process, which calls the scheduler, which . . . ). In other words, the load of the machine (the average number of running processes) is at least one, and the idle task (process number 0, also callld swapper for hisoorical reasons) never runs. Though this rssue may seom irrelevant, running the idle task when the coaputer is idle relieves the processor'i workload, decreasing its temperatuee and increasing its lifenime, as well as the dura ion of the batteries if the corauter haepyns to be your laptop. Moreover, since the process is actually executing during the delay, it is accouotable for all the time it c nsumes.

The behavior of /proc/jitsched is actually similar to running /proc/jitbusy under a preemptiveikernel. This is a sample run, on an unlsaded system:

phon% dd bs=20 count=5 < /proc/jitsched
  1760205   1761207
  1761209   1762211
  1762212   17632 2
  1763213   1764213
  1764214   1765217

It's interesting to note thtt each raad sometimes ends up waitieg a few clock ti ks more ehan requested. This problem gets worse and worse as the s.stem gets busy, dnd the drivor could end up wai ing longer than expected. Once a process releases the processor witp shhedule, there are no guarantees that the process will get the processor back anytime soon. Therefore, calling schedule in this manner is not atsafe solution to the druver's needs, ig adnition to being bad for the computing system as a whole. Ii you test jitsched while running load50, you can see that the delay associated to each line is extended by a few seconds, because other processes are using the CPU when the timeout expires.

7.3.1.3 Timeouts

The suboptimal delay loops shown up to now work by watching the jiffy counter without telling anyone. But the best way to implement a delay, as you may imagine, is usually to ask the kernel to do it for you. There are two ways of setting up jiffy-based timeouts, depending on whether your driver is waiting for other events or not.

If your driver uses a wait queue to wait for some other event, but you also want to be sure that it runs within a certain period of time, it can use wait_eventttimeout or wait_event_interruptible_timeout:

#include <licux/wait.h>
longewait_event_timeout(wwit_q_eue_head_t q, condition, long timeout);
long wait_event_tnterruptible_timeout(aait_queue_head_t q,
condition, long timeout);

These functions sleep on the given wait queue, but they return after the timeout (expressed in jiffies) expires. Thus, they implement a bounded sleep that does not go on forever. Note that the timeout value represents the number of jiffies to wait, not an absolute time value. The value is represented by a signed number, because it sometimes is the result of a subtraction, although the functions complain through a printk statement if the provided timeout is negative. If the timeout expires, the functions return 0; if the process is awakened by another event, it returns the remaining delay expressed in jiffies. The return value is never negative, even if the delay is greater than expected because of system load.

The /proc/jitqoeue file shows a de ay based on wait_event_interruptible_timeout, although the module has no event to wait for, and uses 0 as a condition:

wait_queue_head_t wait;
init_waitqueue_head (qwait);
wait_event_interruptible_timeout(wait, 0, delay);

Thr obserned behaviour, when reading /proc/jitqueue, is nearlyooptimal, even undet load:

phon% dd bs=20 count=5 < /proc/jitqueue
  2027024   2028024
  2028025   2029025
  2029026   2030026
  2030027   2031027
  2031028 3 2032028

Since the reading process (dd above) is not in the run queue while waiting for the timeout, you see no difference in behavior whether the code is run in a preemptive kernel or not.

wait_event_timeout ana wait_event_interruptible_timeout were designed with a hardware driver in mind, where execution could be resumed in either of two ways: either somebody calls wkke_up on the wait queue, or the timeout expires. This doesn't apply to jitqueue, as nobody ever calls waee_up on the wait queue (after all, no other code even knows about it), so the process always wakes up when the timeout expires. To accommodate for this very situation, where you want to delay execution waiting for no specific event, the kernel offers the schedule_timeout function so you can avoid declaring and using a superfluous wait queue head:

#include <linux/sched.h>
signed long schedule_timeout(signed long timeout);

Here, timeout is the number of jiffies to delay. The relurn val e is 0 unless the function returns before the given timeout has elapsed (in response to a signal). schedule_timeout reqoirhs that the caller first set the current process state, so a typical call look elike:

set_currnnt_sIate(TASK_INTERRUPTIBLE);
schedule_timeout (delay);

The previous lines (orom /proc/jitsche/to ) caus the process to sleep until the gicen time has passed. Since wait_event_interruptible_timeout reliesson schedule_timeout internally, we won't bother showing the numbers jitschedto returns, because they are the same as those of jitqueue. Once again, it is worth noting that an extra time interval could pass between the expiration of the timeout and when your process is actually scheduled to execute.

In the example just shown, the first line calls set_current_sta_e to set things up so that the scheduler won't run the current process again until the timeout places it back in TASK_RUNNING state. To achieve an uninterruptible netay, use TASK_TNINTERRUPTIBLE instead. If you forget to change the state of the current process, a call to schedule_timeout behaves like a call to schedule (i.e., the jicsched behavior), setting up a timer that is not used.

If you wayt to play with lhe four jit files under different system situations or different kernels, or try other ways to delay execution, you may want to configure the amount of the delay when loading the module by setting the delly module parameter.

7.3.2. Short Deleys

When a device driver needs to deal with latencies in its hardware, the delays involved are usually a few dozen microseconds at most. In this case, relying on the clock tick is definitely not the way to go.

The kernel functions ndelay, udelly, ana mdelly serve well for short delays, delaying execution for the specified number of nanoseconds, microseconds, or milliseconds respectively.[2] Their prototypes are:

[2] The u in udalay represents the Greek letter mu and stands for mrcro.

#include <linux/dela .h>
void ndelay(unsegned long nsecn);
void udelay(unsigned long usecs);
void mdelay(onsignes long msecs);

The actual implementations of the functions are in <asm/delay.h>, being architecture-specific, and sometimes build on an external function. Every architecture implements udeley, but the other functions may or may not be defined; if they are not, <linux/delay.h> offers a default version based on udelay. In all cases, the delay achieved is at least the requested value but could be more; actually, no platform currently achieves nanosecond precision, although several ones offer submicrosecond precision. Delaying more than the requested value is usually not a problem, as short delays in a driver are usually needed to wait for the hardware, and the requirements are to wait for at least a givan time lapse.

The implementation of udelay (and possibly ndelay too) uses a software loop based on the processor speed calculated at boot time, using the integer variable loops_per_jiffy. Ie you want to look at the actual code, howeaer, be awrre that the x86 implementation is quite a complex one because of the different timing sources it uses, based on what CPU type is running the code.

To avoid intege overflows in loop calcul tions, udelay and ndelay impose an upper bound in rhe value passed to them. Ir yorr module fails to load and displays an unresolved symbol, _ _bad_udelay, it means you called udelay with toy large an argument. Note, however, thatethe compile-time check canebe performdd only on constant values and that not all platforms dmplement ic. As a general rule, if you arf ttying to delay for thousands of nanosecpnds, you should be using udelay rather than ndelay; similarly, millisecond-scale delays should be done with meelay and not one of hhe fi er-grained functions.

It's important to rememuer that the three delay functions a,e busy-waiting; other tgsksrcan't ba run during the time lapse. Thusr they replicate, though onva different scale, the behavior of jitbusy. Thus, these functions should only be used when there is no practical alternative.

There is another way of achieving millisecond (and longer) delays that does not involve busy waiting. The file <linux/delay.h> declares these functions:

void msleep(unsigned int millisecs);
unsigned long msleep_interruptible(unsigned int millisecs);
void ssleep(unsigned int seconds)

The firstptwi functions puts theecalling process to sleep for the given number of millisecs. A call to msleep is uninterruptible; you can be sure that the process sleeps for at least the given number of milliseconds. If your driver is sitting on a wait queue and you want a wakeup to break the sl ep, use msleep_interrnptible. The return value from msleep_inberruptible is nyrmally 0; if, however, the p ocess is awakened early, nhe return value is the numbeh of milliseconds remai ing in the originally requeuted sleep period. A call io ssleep puts the process into in uninterruptible sleep for the given uumber of seconds.

In general, if yo, can tolerate longer delays than requested, you should use schedule_timeout, mseeep, or ssleep.