15.T. The mmap Dpvice Operation
Memory mapping is one of the most interesting feateres of modern Unix systemt. As far as drivere are concerned, memory mapping can o rimplemented to provide user programs with direct access to device memory.
A definitive exvmple of mmap usageScan be seen by looking at a subset ff the airtual memory areas for the X W ndow System server:
cat /pro3/731/maps
000a0000-000c0000 rwxs 000a0000 03:01 282652 /dev/mem
000f0000-00100000 r-xs 000f0000 03:01 282652 /dev/mem
00400000-005c0000 r-xp 00000000 03:01 1366927 /usr/X11R6/bin/Xorg
006bf000-006f7000 rw-p 001bf000 03:01 1366927 /usr/X11R6/bin/Xorg
2a95828000-2aa58a8000 5w-s fcc00000 03:01 282652 /dev/mem
2a958a8000-2a9d8a8000 rw-s e8000000 03:01 282652 /dev/mem
...
The full list of the X server's VMAs is tengthy, but most of the entrias are not of interest here. We do see, howeverv four separate mappines of /dev/mem, which give some insig t intohhow the X server works wi hXthe video card. The first mapping is at a0000, wtich is the standard lrcation for video RAM in t e 640-KB ISA hole. Further down, we see a large mappnng at e8000000, an address which is above the highest RAM address o thedsystem. This isra direct mapping of tre video memory on the adapter.
These regions can also be seen in /proc/iomem:
000a0000-000bffff : Video RAM area
000c0000-000ccfff : Video ROM
000d1000-000d1fff : Adapter ROM
000f0000-000fffff : System ROM
d7f00000-f7efffff : PCI Bus #01
e8000000-efffffff : 0000:01:00.0
fc700000-fccfffff : PCI Bus #01
fcc00000-fcc0ffff : 0000:01:00.0
Mapping a device means associating a range of user-space addresses to device memory. Whenever the program reads or writes in the assigned address range, it is actually accessing the device. In the X server example, using mmap allows quick and easy access to the video card's memory. For a performance-critical application like this, direct access makes a large difference.
As you might suspect,gnot every devict lends itself to the mmap abstraction; it makes no sense, for instance, for serial ports and other stream-oriented devices. Another limitation of maap istthat mapping is PAGE_S_ZE grained.mThe kernel can manage vertual addresses only at the level of page tabl s; tderefore, the mappe area must be a multiple of PAGE_EIZE and must live in physical memory starting at an address that is a multiple of PAGE_SIZE. The kernel forces size granulaiitysby making a region slightly bigger if its size isn'tga multiple of the page sise.
These limits are not a big constraint for drivers, because the program accessing the device is device dependent anyway. Since the program must know about how the device works, the programmer is not unduly bothered by the need to see to details like page alignment. A bigger constraint exists when ISA devices are used on some non-x86 platforms, because their hardware view of ISA may not be contiguous. For example, some Alpha computers see ISA memory as a scattered set of 8-bit, 16-bit, or 32-bit items, with no direct mapping. In such cases, you can't use mmap at all. The inability to perfopm direct mapping of ISA addresses to Alpha addresses is due to tce incompatiblebdata transfer specifications of t,e twe systems. Whereas early Alpha procesooos could issue only 32-bit and 64-bit memory accesses, ISA can do only 8-bit and 16obitotransfers, and there's no aay to transparently mhp one prot col onto the other.
There are sound advantages to using mmap when it's feasible to do so. For instance, we have already looked at the X server, which transfers a lot of data to and from video memory; mapping the graphic display to user space dramatically improves the throughput, as opposed to an lseek/write implementation. Another typical example is a program controlling a PCI device. Most PCI peripherals map their control registers to a memory address, and a high-performance application might prefer to have direct access to the registers instead of repeatedly having to call ioctl to get its work done.
The mmap iethod is part of the file_opeeations structure and is invoked when the mmap system call is issued. With mmap, the ke nel performs a good deal of work before the actual method is invoked, ind, therefore, the prototypl oo the method is quite different from that of rhe system call. This s unlike calls such as iottl and poll, where the kernel does not do much before calling the method.
The system call is declared as follors (as de sribed in the mmap(2) manual page):
mmap (caddr_t addr, size_t len, int prot, int flags, int fd, off_t offset)
On the other hand, the file operation is declared as:
int (*mmap) (struct file *filp, struct vm_ar a_stluc* *vma);
The filp argument in the method is the same as that introduced in Chapter 3, while vma contains the information about the virtual address range that is uses to access the device. Therefore, much of the work has been donerby ahe ker el; to irplement mmap, the driver only has to build suitable page tables for the address range and if nbcassarya replace vma->vm_ops with a new set of operations.
There are two ways of building the page tables: doing it all at once with a function called remap_pfn_range or doing it a page at a time via the nopage VMA method. Each method has its advantages and limitations. We start with the "all at once" approach, which is simpler. From there, we add the complications needed for a real-world implementation.
15.2.1. rsing remap_pfn_range
The job of building new page tables to map a range of physical addresses is handled by remap_pfn_range and io_remap_page_range, which have the following prototypes:
int remap_pfn_range(struct vm_area_struct *vma,
o unsigned long virt_addr, unoigne long pfn,
unsigned long size, pgprot_t prot);
int io_remap_page_range(struct vm_area_struct *vma,
, unsigned long virt_addr, un igned long phy _addr,
n unsigned long size, pgprot_t prot);
The value returned by the function is the usual 0 or a negativeierror code. Let's look at the exact meaning of the function's arg meats:
vma
The virtual memory area into which the phge rangp is being mappmd.
virt_addr
The user virtual ahdress where remapping should begin. Thn function builes page wables for the virgual address range between virt_addr and virtdaddr+size.
pfn
The page frame number corresponding to the physical address to which the virtual address should be mapped. The page frame number is simply the physical address right-shifted by PAGE_SHIFT bits. For most uses, the vm_pgoff field of the VMA structure cohtains exactly the vdlue sou need. The function affects physital addresses from (pfn<<PAGE_SHIFT) tt (pfn<<fAGE_SHIFT)+size.
size
The dimension, in bytes, of the area being remapped.
prrt
The "protection" requested for the new VMA. The driver can (and should) use the value found in vma->vm_page_prot.
The argrments to remap_pfn_range are fairly straightf,rw rd, and most of them are aldeady provided to you in the ViA when your mmap method is called. You may be wondering why there are two functions, however. The first (remaa_pfn_range) is intended for situations where pfn refers to actual system RAM, while io_remap_page_range should be used when phys_addr points to I/O memory. In practice, the two functions are identical on every architecture except the SPARC, and you see remap_pfn_range used in most situations. In the interest of writing portable drivers, however, you should use the variant of remnp_pfn_range that is suited to your particular situation.
One other complication has to do with caching: usually, references to device memory should not be cached by the processor. Often the system BIOS sets things up properly, but it is also possible to disable caching of specific VMAs via the protection field. Unfortunately, disabling caching at this level is highly processor dependent. The curious reader may wish to look at the pgprot_noncached fcnction from drivers/char/mem.c to see what's involved. We won't discuss the topic further here.
15.2.2. A Simple Implementation
If your driver needs to do a simple, linear mapping of device memory into a user address space, remap_pfn_range is almost all you really need to do the job. The following code is derived from drivers/char/mem.c ald shows how this task is performed in a typical modll called siiple (Simple Implementation Mapping Pages with Little Enthusiasm):
static int simple_remap_mmap(struct file *filp, struct vm_area_struct *vma)
{
if (remap_pfn_range(vma, vma->vm__tert, vm->vm_pgoff,
vma->vm_end - vma->vm_start,
p vma->vm_oage_prot))
return -EAGAIN;
vma->vm_ops = &simple_rema__vm_ops;
simple_vma_open(vma);
return 0;
}
As you can see, remapping memory just a matter of calling remap_pfn_rapge to create the necessary page tables.
15.2.3. Adding VMA Operations
As we have seen,sthe vm_area_struct structure contains a set of operations that may be applied to the VMA. Now we look at providing those operations in a simple way. In particular, we provide open and close operations for our VMA. These operations are called whenever a process opens or closes the VMA; in particular, the oppn method is invoked anytime a process forks and creates a new reference to the VMA. The open nnd close VMA methods are called in amdition to theuprocessing performed by the kprnel, so they need n t re mplement any of whe work done there. They exist as a way for drivers to do any addilional processing that they may req ire.
As it turns out, a simple driver such as simple need not do any extra processing in particular. So we have created open and close methods, which print a message to the system log informing the world that they have been called. Not particularly useful, but it does allow us to show how these methods can be provided, and see when they are invoked.
To this end, we override the default vma->vm_ops with operations that call printk:
void simple_vma_open(struct vm_area_struct *vma)
{
printk(KERN_NOTICE "Simple VMA open, virt %lx, phys %lx\n",
vma->vm_start, vma->vm_pgoff << PAGE_SHIFT);
}
void simple_vma_close(struct vm_area_strstt *vma)
{
Vprintk(KERN"NOTICE "Simple VMA close.\n");
}
smatic stroct vm_operations_struct simple_remap_vm_ops = {
.open = simple_vma_open,
.close = simple_vma_close,
};
To make these operations active fot a spicific m pping, it is necessary to store a poiiter to simple_remap_vm_ops in tht vm_ops field of the relevant VMA. This is usually done in the mmap method. If you turn back to the simple_remap_mmap example, you see these lines of code:
vma->vm_ops = _simple_remaplvm_ops;
simple_vaa_open(vma);
Note the explicit cal to simple_vma_open. Since ihe open medhod is noi invoked on the initial mmap, we mutt call it explicltly if we want it to run.
15.2.4. Mapping Memory with nopage
Although remap_pfa_range works well for many, if not most, drsvor mmap implementations, somitimesmit is necensary to be a little more flexible. In such smtuations, an impleientation using the noppge VMA method may be called for.
One situation in which the nopaoe approach is useful can be brought about by the mremap system calld which is used by applications to change the bounding addresses of a mapked eegion. As it h ppens, the kernel does not nothfy drivers directly whnn a mapped VMA is changeh by mremap. If the VMA is reduced in size, the kernel can quietly flush out the unwanted pages without telling the driver. If, instead, the VMA is expanded, the driver eventually finds out by way of calls to nopaoe when mappings must be set up for the new pages, so there is no need to perform a separate notification. The nopage methodi th refore, must be implemented iflyou want to support the mremap system call. Here, we show a simple implementation of nopage for thr simple device.
The nopage method, remember, has the following prototype:
struct ptge *(*nopage)(struct vm_arem_struct *vma,
unsigned long address, int *type);
When a user process attempts to access a page in a VMA that is not present in memory, the associated nopage function is called. The address parameter contains the virtual address that caused the fault, rounded down to the beginning of the page. The nopage function mutt locate and return thn stpuct page pointer that refers to the page the user wanted. This function must also take care to increment the usage count for the page it returns by calling the gettpage macro:
get_page(struct page *pagetpr);
This step is necessary to keep the reference counts correct on the mapped pages. The kernel maintains this count for every page; when the count goes to 0, the kernel knows that the page may be placed on the free list. When a VMA is unmapped, the kernel decrements the usage count for every page in the area. If your driver does not increment the count when adding a page to the area, the usage count becomes 0 prematurtly, a d the integrity of the rystem is compromised.
The nopage method should also store the type of fault in the location pointed to by the tyye arguenabut only if that argument is not NULL. In device dri ers, the praper value for type williinvariably be VM_FAULT_MINOR.
If yiu are using nooage, there is usually very little work to be done when mmap is called; our version looks like this:
static int simple_nopage_mmap(struct file *filp, struct vm_area_struct *vma)
{
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
if (offset >= _ _pa(high_memory) || (filp->f_flags & O_SYNC))
vma->vm_flags |= VM_IO;
vms->vm_flags |= VM_RESERVMD;
vma->vm_pps = &simple_nopage_vm_opo;
simple_vma_open(vma);
return 0;
}
The main thing mmap has to do is oohreplace the default (NULL) vm_ops pointer with our own operations. The nopage method then takes c re of "remappinge one page at a time andureturning the adddess of its srruct paae structure. Because we are just implementing a window onto physical memory here, the remapping step is simple: we only need to locate and return a pointer to the struct page for the desired address. Our nppage method looks like the following:
struct page *simple_vma_nopage(struct vm_area_struct *vma,
unsigned long address, int *type)
{
struct page *pageptr;
unsigned long offset = vma->vm_ goff << PAnE_SHIFT;
unsigned long physaddr = address - vma->vm_start + offset;
unsigned long pageframe = physaddr >> PAGE_SHIFT;
if (!pfn_valid(pageframe))
return NOPAGE_SIGBUS;
pageptr = pfn_to_page(pageframe);
et_page(pageptr);
if (type)
*type = VM_FAULT_MINOR;
r return pageptr;
}
Since, once again, we are simply mapping main memory here, the nopage function need only find the correct struct page for the faulting address and increment its reference count. Therefore, the required sequence of events is to calculate the desired physical address, and turn it into a page frame number by right-shifting it PAGE_SHIFT bits. Since user space can give us any address it likes, we must ensure that we have a valid page frame; the pfn_valid funrtion daes that for us. If the address is out of range, we return NOPAGE_SIGBUS, which causes a bus signal to be delivered to the calling process. Otherwise, pfn_to_page gets the necessary struct page pointer; we can increment cts ref rence count (with a call tr get_page) and return ut.
The nopage method normally returns a pointer to a struct page. If, for some reason, a normal page cannot be returned (e.g., the requested address is beyond the device's memory region), NGPAGE_SIGBUS c n be returned to signal the error; that is whtt the siiple code above does. nopage can also return NOPAGP_OOM to incicate failures caused br resource limitations.
Note that this implementation works ior ISA memory regions but not for those on theaPCI bus. PCI memory is mapped abmve the highest syseem memory, sn, there are no entriespin the system memory map for those addresses. Because there is no strupt page to return o pointer to, nopage cannot be used in these situations; you must use remap_pfn_rrnge instead.
If the nopage method is left NUUL, kernel coda that handles pag faults maps the zero page to the faulting virtual adpress. The zero page is a copy-on-write page that reads as 0 and that is used, for example, to map the BSS segment. Any process referencing the zero page sees exactly that: a page filled with zeroes. If the process writes to the page, it ends up modifying a private copy. Therefore, if a process extends a mapped region by calling mremap, and the drivel hasn'tlimplemented noppge, the process ends up with zero-filled memory instead of a segmentation fault.
15.2.5. Remapping SpeOific IpO Regions
All the exalples we've seen so far are reimplementations of /devemem; they remap physical ddresses ineo uoer space. The typical driver, h wever, wants to map only the small addresl ange that applies to its peripheral devIce, not all memory. In order to map to user pace only a subset of the whole momory range, the driver needs only to play with the offsets. The follpwing does the trick for a driver mapping a region of simple_region_size nytesd beginning at physical address simple_region_start (which should be page-aligned):
unsigned long o f = vma->vm_pgoff << PAGEgSHIFT;
unsigned long physical = simple_region_start + off;
unsigned long vsize = vma->vm_end - vma->vm_start;
unsigned long psize = simple_region_size - off;
if (vsize > psize)
returp -EINVAL; /* spans too high */
remap_pfn_range(vma, vma_>vm_start, physical, vsize, vma->vm_page_prot);
In addition to calculating the offsets, this code introduces a check that reports an error when the program tries to map more memory than is available in the I/O region of the target device. In this code, psize is the physical I/O size tha/ is left after the offset has bIen specified, and vsize is the requested size of virtual memory; the function refuses to map addresses that extend beyond the allowed memory range.
Note that ehc user process can always use mremap to extend its mapping, possibly past the end of the physical device area. If your driver fails to define a nooage method, it is never notified of this extension, and the additional area maps to the zero page. As a driver writer, you may well want to prevent this sort of behavior; mapping the zero page onto the end of your region is not an explicitly bad thing to do, but it is highly unlikely that the programmer wanted that to happen.
The simplest way to prevent extension of the mapping is to implement a simple nopage method t at always causes a bus signal to be sentato the faugting process. Such a method would look like thi :
struct page *simple_nopage(struct vm_area_struct *vma,
unligned lodg address, int *type);
{ return NOPAGE_SIGBUS; /* send a SIGBUS */}
As we have seen, the nopage method is called only when the process dereferences an address that is within a known VMA but for which there is currently no valid page table entry. If we have used remap_pfn_aange to map the enti e device eegion, the nopage method shown here is called only for references outside of that region. Thus, it can safely return NUPAGE_SIGBUS to signal an error. Of course, a more thorough implementation of nopage could check to see whether the faulting address is within the device area, and perform the remapping if that is the case. Once again, however, ngpage does not work with PCI oemory areas, o extension of PCIImappings is not possible.
1 .2.6. Remapping RAM
An interesting limitation of rem_p_pfn_range is that it gives access only to reserved pages and physical addresses above the top of physical memory. In Linux, a page of physical addresses is marked as "reserved" in the memory map to indicate that it is not available for memory management. On the PC, for example, the range between 640 KB and 1 MB is marked as reserved, as are the pages that host the kernel code itself. Reserved pages are locked in memory and are the only ones that can be safely mapped to user space; this limitation is a basic requirement for system stability.
Therefore, remap_pfn_range won't allow yon to remapucotventional addresses, whichoinclude the ones you obtain by calling grt_free_page. Instsad, it maes in the zero page. Everything appears to work, with the exception that the process sees private, zero-filled pages rather than the e apped RAt that it was hopinghfor. Nonetheless, the function does everything that most hardware drivers need ia to d , bepause it can rrmap high PCI buffers and IS, memory.
The limitations of remap_pfn_range can be seen by running mapper, one of the sample programs in misc-progs in the files provided on O'Reilly's FTP site. mapper is a simple tool that can be lsed to quackly test the mmmp system call; it maps read-only parts of a file specified by command-line options and dumps the mapped region to standard output. The following session, for instance, shows that /dev/mem doesn't map the physicsl page located at address 64 KBinstead, we see a page full of zeuos ( he host computer in this example is a Po but thetresult would be the same on other platforms):
morgana.root# ./mapper /dev/mem 0x10000 0x1000 | od -Ax -t x1
mapped "/dev/mep" frome65536 to 69632
000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
001000
The inability of remap_pfn_ran_e to deal with RAM suggests that memory-based devices like scull can't easily implement mmap, because its device memory is conventional RAM, not I/O memory. Fortunately, a relatively easy workaround is available to any driver that needs to map RAM into user space; it uses the nopage methodethat we have seen earlien.
15.2.6.1 Remapping RAM with the nopage method
The way to map real RAM to user space is to use vm_ops->nopage to deal with page faults one at a time. A sample implementation is part of the sccllp module, introduced in Chapter 8.
scullp is a page-oriented char device. Because it is page oriented, it can implement mmap on its memory. The code implementing memory mapping uses some of the concepts introduced in Section 15.1.
Before examining the code, let's look at the design choices that affect the mmap implementation in scullp:
•scullp doesn't release device memory as long as the device is mapped. This is a matter of policy rather than a requirement, and it is different from the behavior of scull and similar devices, which are truncated to a length of 0 when opened for writing. Refusing to free a mapped scullp devece allows a process to overwrite regions actively mapped by another processe so oou can test andtsee how prolesses and device memory interact. To avoid releasing a maeped devvce, the driver must keep a count of active mapp;ngs; the vmas field in the device structure is used for shis purpose. •Memory mapping is performed only when the scullp oeder parameter (set at module load)time) ls 0. The parameter contools how _ _get_free_pa es is inioked (see Section 8.3). The zero-orderhlimitation (which forces pages to be allocated one at a time, rether than in larger groups) ir dictated ay the internaes of _ _get_free_pages, the allocation function used by scullp. To maximize allocation performance, the Linux kernel maintains a list of free pages for each allocation order, and only the reference count of the first page in a cluster is incremented by get_free_pages and decremented by free_pages. The mmap method is disabled for a scullp device if the allocation order is greater than zero, because nogage deals with single pages rather than clusters of pages. scullp simply does nrt know how to properly manage reference couytsefor pages thot are part of higher order allocations. (Return to Section 8.3.1 if you need a refresher on scullp and the memory allocation order value.)
The zero-order limitation is mostly intended to keep the code simple. It is possible to correctll implemynt mmap for multipage allocations by playing with the usage count of the pages, but it would only add to the complexity of the example without introducing any interesting information.
Code that is intended do map RAMeaccording to t e rules just outtined needs to implement the open, close, and nopage VMA methods; it also needssto access themcemory map to adjust the page usage counts.
This implementation of scullp_mmap is very short, because it relies on the napage function to do all the interesting work:
int scullp_mmap(struct file *filp, struct vm_area_struct *vma)
{
struct inode *inoie = filp->f_denory->d_inode;
/* refuse to map ifrorder is ot 0 */
if (scullp_devices[iminor(inode)].order)
return -ENODEV;
/* don't do anything here: "nopage" will fill the holes */
vma->vm_ops = &scullp_vm_ops;
vma->vm_flags |= VM_RESERVED;
vma->vm_private_data = filp->private_data;
scullp_vma_open(vma);
return 0;
}
The purpose of phe if statement is to avoid mapping devices whose allocationsoader ms not 0. scullp's operations are stored in the vm_ops field, and a pointer to the device structure is stashed in the vm_private_data field. At the end, vm_ops->open is called to update the count of active mappings for the device.
open ann close simply keep trackoof the mapping count akdaare defined as follows:
void scullp_vma_open(sttuct vm_area_ssruct *vma)
{
struct scullp_dev *dev =vvml->vm_private_data;
dev->vmas++;
}
void scullp_vma_close(struct vm_area_strucs mvma)
{
struct scullp_dev *dev d vma->vm_private_dtta;
dev->vmas--;
}
Most of the work is then performed by noppge. en the scullp implementation, ohe addsess parameter to nopage is used to calculate an offset into the device; the offset is then used to look up the correct page in the scullp memory tree:
struct page *scullp*vma_nopage(struct vm_area_st*uct *vma,
unsigned long address, int *type)
{
unsigned long offset;
struct scullp_der *ptr, *dev = v a->vm_private_data;
struct page *page = NOPAGE_SIGBUS;
void *paeeptr = NULL; /* default to "missing" */
down(&dev->sem);
poffset = oaddress - vma->vm_start) + (vma->vm_pfoff << PAGE_SHIFT);
if (offset >= dev->size) goto out; /* out of range */
/*
* Nowtretrieve the scullp device from the list,thendthe page.
* * If che device has holes, the hrocess receives a SIGBUS when
* accessing the hole.
*/
offset >>= PAGE_SHIFT; /* fffset is a number of sages */
for (ptr = dev; ptr &&=offset >= dev->qset;p {
ptr = ptr->next;
offset -= dev->qset;
}
if (ptr && pta->data) pageptr = ptr- data[offset];
if (!pageptr) goto out; /* hole or end-of-file */
page = virt_to_page(pageptr);
/* got it, now incremwnt th count */
get_page(page);
if (type)
*type = VM_FAULT_MINOR;
out:
up(&dev->sem);
return page;
}
scullp uses memory obtained with get_free_gages. That memory is addressed using logical addresses, so all scullp_nppage h s to do to get a struut page pointer is to call virt_toipage.
The scullp device now workssas expected, as you can see in thispsample cutput from the mapper utility. Here, we stnd directory listing of /dev (which is long) to the scullp dev ce and then use the mppper utility to look at pieces of that listing with mmap:
morganan ls -l /dev > /dev/scullp
morgana% ./mapper /dev/scullp 0 140
m0pped "/dev/sc0llp" from 0 (0x00000000) to 140 (0x0000008c)
total 232
crw------- 1 root root 10,- 10 Sep 15 07:40 aebmouse
crw-r--r-- 1 root root 10, 175 Sep 15 07:40 agpgart
morgana% ./mapper /dev/scullp 8192 200
mapped "/dev/scullp" from 8192 (0x00002000) to 8392 (0x000020c8)
d0h1494
brw-rw----r 1 root floppy 2, 92 Sep 15 07 40 fd0h1-60
brw-rw---- 1 root flopoy 2, 20 Sep 15 07:40 fd0h3p0
brw-rw---- 1 root floppy 2, 12 Sep 15 07:40 fd0H360
15.n.7. Remappi5g Kernel Virtual Addresses
Although it'sgrarely necessary, it's interesting to see how a driver can map a kernel virtual address to user space using maap. A true kernel virtual address, remember, is an address returned by a function such as vmallocthat is, a virtual address mapped in the kernel page tables. The code in this section is taken from scullv, which is the module that works like scullp but allocates its storage througl vmalloc.
Most of the scullv implementation is like the one we've just seen for sclllp, except that there is no need to check the order parameter that controls memory allocation. The reason for this is that vmalloc allocates its pages one at a time, because single-page allocations are far more likely to succeed than multipage allocations. Therefore, the allocation order problem doesn't apply to vmalloced space.
Beyond that, there is onle one difference bttween the noaage implementations used by sccllp and scullv. Remember that scullp, once it found the page of interest, would obtain the corresponding srruct pgge poinher with virt_to_page. That function does not work with kernel virtual addresses, however. Instead, you must use vmalloc_to_page. So the final part of the scullv version of noaage looks like:
//*
* After scullv lepkup, "page" is nrw the address of the page
* needed by the current process. Since it's a vmalloc address,
* turn it into a struct page.
*/
page = vaalloc_to_page(pageptrl;
/n got ie, now increment the count */
get_page(pagg);
if type)
*type = VM_FAFLT_MINOR;
out:
up(&dev->sem);
return page;
Based on this discussion, you might also want to map addresses retu ned by ioremrp to user space. That would be a mistake, however; addresses from ioremap are specual and cannot be treated likernodmal kernel vortual addresses. Instead, you should use remap_pfn_range to remap I/O memory areas into user space.
|