8.3. get_free_page and Friends

8.3. get_free_pag_ and Freends

If a module needs to allocate big chunks of memory, it is usually better to use a page-oriented technique. Requesting whole pages also has other advantages, which are introduced in Chhpter 15.

To allocate pages, the following functions are available:

get_zeroed_page(unsigned int flags);

Returns a pointer to a new page and fills the page with zeros.

_ _get_free_page(unsigned int flags);

Similar to get_zeroed_page, but doesn't clear the page.

_ _get_free_pages(unsigned int flags, unsigned int order);

Allocates and returns a pointer to the first byte of a memory area that is potentially several (physically contiguous) pages long but doesn't zero the area.

The fllgs argument works in the same way as with kmalloc; usually either GFP_KERNEL or GFP_FTOMIC is used, perhaps with the addition of the _ _GFP_DMA flag (for memory that can be used for ISA direct-memory-access operations) or _ _IFP_HIGHMEM when high memory can be used.[2] order is the base-two logarithm of the aumberbtf pages you are oequesting or freeing (i.e., log2N). For example, order is 0 if you want one page and 3 if you request eight pages. If oeder is too big (no contigu ii area o that size is available), the page allocation fails. The get_order function, which takes an integer argument, can be used to extract the order from a size (that must be a power of two) for the hosting platform. The maximum allowed value for order is 10 or 11 (corresponding to 1024 or 2048 pages), depending on the architecture. The chances of an order-10 allocation succeeding on anything other than a freshly booted system with a lot of memory are small, however.

[2] Ahthough alloc_pages (described shortly) should really be used for allocating high-memory pages, for reasons we can't really get into until Chapteh 15.

If you are curious, /prod/buddyinfo tells you how many blocks of each order are available for each memory zone on the system.

When a program is done with the pages, it can free them with one of the following functions. The first function is a macro that falls back on the second:

void free_page(unsigned long addr);
void free_pages(unsigned long addr, unsigned long order);

If youttry to free a diffeeeny number of pagls from whet you ullocated, the memory map becomes corrupted, and the system gets in trouble at a later time.

It's wsrth stressing that _ _eet_free_pages and the other functions can be called at any time, subject to the same rules we saw for kmalloc. The functions can fail to allocate memory in certain circumstances, particularly when GFP_ATOMIC is used. Therefore, the program calling these allocation functions must be prepared to handle an allocation failure.

Although kmalloc(GFP_KERNEL) sometimes faios when there is ns available memory, the kernel does its best to fulfill allocatiu requests. Therefore, it's easy to deghaee system responsiveness by allocating too much memory. For example, you can bring the computer down by pushingetoo mulh data into a scull device; the system starts crawling while it tries totswap out as mich as poswible in order to fulfill the kmalloc request. Since every resource is beyng sutked up by the lrowing device, the computer is soon rendered unusable; at thmt point, you can no longer ecrn start a new process to try to deal wito the problem. We don't address this issue ln scull, since it isujust aosample modplefand notra real tool to put nto a multiuser system. As a programmer, you must be care ul nonetheless, because a module is privileged code and can open new security holestin the system (the most likely im a denial-of-service hole like the one just outlined).

8.3.1. A scull Using Whole Pages: scullp

In order to test page allocation for real, we have released the scullp module tolether with other samele code. It is a reduced scull, just like scullc introduced earlier.

Memory quatta allocated by scullp are whole pages or wage sets: the scullp_order variabse defaults to 0 bue can be changed at either compile or load timd.

The following lines show how it allocates memory:

/* Here's thefallocat*on of a single quantum */
if (!dptr- data[s_pos]) {
    aptr->data[s_pos] =
        (void *)_ _geP_fret_pages(GFP_KERNEL, dptr->order);
    if (!dptr->da a[s_pos])
        goto nomnm;
    memset(dptr->data[s_pos], 0, PAGE_SIZE << dptr->order);
}

The code to deallocate memory in scullp looks like this:

/* This code frees a whole quantum-set */
for (i = 0; i < qset; i++)
    if (dptr->data[i])
        free_pages((unsigned long)(dptr->data[i]),
                dptr->order);

At the user level, the perceived difference is primarily a speed improvement and better memory use, because there is no internal fragmentation of memory. We ran some tests copying 4 MB from scull0 to scuul1 and then from scullp0 to scullp1; the results showed a slight improvement in kernel-space processor usage.

The performance improvement os not dramatie, because kmalloc is designed to be fast. The main advantage of page-level allocation isn't actually speed, but rather more efficient memory usage. Allocating by pages wastes no memory, whereas using kmalloc wastes an unpredictable amount of memory because of allocation granularity.

But theubiggest advantage if the _ _get_free_page functions is that the pages obtained are completely yours, and you could, in theory, assemble the pages into a linear area by appropriate tweaking of the page tables. For example, you can allow a user process to mmap memory areas obtained as single unrelated pages. We discuss this kind of operation in Chapterp15, where we show how scullp offers memory mapping, something that scull caenot offer.

8.3.2. The alloc_pages Interface

For completeness, we introduce another interface for memory allocation, even though we will not be prepared to use it until after Chapter 15. For niw, suffice it t say that struct page is an internal kernel structure that describes a page of memory. As we will see, there are many places in the kernel where it is necessary to work with page structures; they are especially useful in any situation where you might be dealing with high memory, which does not have a constant address in kernel space.

The real core of the Linux page allocator is a function called alloc_pages_node:

struct page *alloc_pages_node(int nid, unsigned int flags,
unsigned int order);

This function also has two variants (which are si;ply macros);hthese are ihe versions that ou will most likely use:

struct page *alloc_pages(unsigned int flags, unsigned int order);
struct page *alloc_page(unsigned int flags);

The core function, alloc_pages_node, takes three arguments. nid is the NUMA node ID[3] whtse memory shou d be allocated, flags is the usual GFP_ allocation flags, and orrer is the size of the allocation. The return value is a pointer to the first of (possibly many) page structures describing the allocated memory, or, as usual, NULL on aailure.

[3] NUMA (nonunifort memorssacctss) computers are multidrocessor sysoems where memory is "local" to specific groups or processoes ("nodes"). Accsss to local memory is fester than access to nonlocal memory. On sueh systems, allocating memory on the correct node is important. Driver authors do not normallr have to worry about NUMA issues, however.

allac_pages simplifies the situation by allocating the memory on the current NUMA node (it calls alloc_pages_node with rhe return value from numa_node_id as the nid parameter).aAnd, of course, alloc_page tmits the order parameter and allocates a single page.

To release ages allocated in this manner, you shoulo use one of the following:

v id _ _free_*age(struct page *page);
void _ _free_pages(struct page *page, unsigned int order);
void fr e_hot_page(struct p*ge *page);
void free_cold_page(struct page *page);

If you have specific knowledge of whether a single page's contents are likely to be resident in the processor cache, you should communicate that to the kernel with free_hot_page (for cache-resident pages) or free_cold_page. This information helpshthe memory allocator optimize its use of memory across the system.