|
| 1 | +.. SPDX-License-Identifier: GPL-2.0+ |
| 2 | +
|
| 3 | +======= |
| 4 | +IOMMUFD |
| 5 | +======= |
| 6 | + |
| 7 | +:Author: Jason Gunthorpe |
| 8 | +:Author: Kevin Tian |
| 9 | + |
| 10 | +Overview |
| 11 | +======== |
| 12 | + |
| 13 | +IOMMUFD is the user API to control the IOMMU subsystem as it relates to managing |
| 14 | +IO page tables from userspace using file descriptors. It intends to be general |
| 15 | +and consumable by any driver that wants to expose DMA to userspace. These |
| 16 | +drivers are eventually expected to deprecate any internal IOMMU logic |
| 17 | +they may already/historically implement (e.g. vfio_iommu_type1.c). |
| 18 | + |
| 19 | +At minimum iommufd provides universal support of managing I/O address spaces and |
| 20 | +I/O page tables for all IOMMUs, with room in the design to add non-generic |
| 21 | +features to cater to specific hardware functionality. |
| 22 | + |
| 23 | +In this context the capital letter (IOMMUFD) refers to the subsystem while the |
| 24 | +small letter (iommufd) refers to the file descriptors created via /dev/iommu for |
| 25 | +use by userspace. |
| 26 | + |
| 27 | +Key Concepts |
| 28 | +============ |
| 29 | + |
| 30 | +User Visible Objects |
| 31 | +-------------------- |
| 32 | + |
| 33 | +Following IOMMUFD objects are exposed to userspace: |
| 34 | + |
| 35 | +- IOMMUFD_OBJ_IOAS, representing an I/O address space (IOAS), allowing map/unmap |
| 36 | + of user space memory into ranges of I/O Virtual Address (IOVA). |
| 37 | + |
| 38 | + The IOAS is a functional replacement for the VFIO container, and like the VFIO |
| 39 | + container it copies an IOVA map to a list of iommu_domains held within it. |
| 40 | + |
| 41 | +- IOMMUFD_OBJ_DEVICE, representing a device that is bound to iommufd by an |
| 42 | + external driver. |
| 43 | + |
| 44 | +- IOMMUFD_OBJ_HW_PAGETABLE, representing an actual hardware I/O page table |
| 45 | + (i.e. a single struct iommu_domain) managed by the iommu driver. |
| 46 | + |
| 47 | + The IOAS has a list of HW_PAGETABLES that share the same IOVA mapping and |
| 48 | + it will synchronize its mapping with each member HW_PAGETABLE. |
| 49 | + |
| 50 | +All user-visible objects are destroyed via the IOMMU_DESTROY uAPI. |
| 51 | + |
| 52 | +The diagram below shows relationship between user-visible objects and kernel |
| 53 | +datastructures (external to iommufd), with numbers referred to operations |
| 54 | +creating the objects and links:: |
| 55 | + |
| 56 | + _________________________________________________________ |
| 57 | + | iommufd | |
| 58 | + | [1] | |
| 59 | + | _________________ | |
| 60 | + | | | | |
| 61 | + | | | | |
| 62 | + | | | | |
| 63 | + | | | | |
| 64 | + | | | | |
| 65 | + | | | | |
| 66 | + | | | [3] [2] | |
| 67 | + | | | ____________ __________ | |
| 68 | + | | IOAS |<--| |<------| | | |
| 69 | + | | | |HW_PAGETABLE| | DEVICE | | |
| 70 | + | | | |____________| |__________| | |
| 71 | + | | | | | | |
| 72 | + | | | | | | |
| 73 | + | | | | | | |
| 74 | + | | | | | | |
| 75 | + | | | | | | |
| 76 | + | |_________________| | | | |
| 77 | + | | | | | |
| 78 | + |_________|___________________|___________________|_______| |
| 79 | + | | | |
| 80 | + | _____v______ _______v_____ |
| 81 | + | PFN storage | | | | |
| 82 | + |------------>|iommu_domain| |struct device| |
| 83 | + |____________| |_____________| |
| 84 | + |
| 85 | +1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can |
| 86 | + hold multiple IOAS objects. IOAS is the most generic object and does not |
| 87 | + expose interfaces that are specific to single IOMMU drivers. All operations |
| 88 | + on the IOAS must operate equally on each of the iommu_domains inside of it. |
| 89 | + |
| 90 | +2. IOMMUFD_OBJ_DEVICE is created when an external driver calls the IOMMUFD kAPI |
| 91 | + to bind a device to an iommufd. The driver is expected to implement a set of |
| 92 | + ioctls to allow userspace to initiate the binding operation. Successful |
| 93 | + completion of this operation establishes the desired DMA ownership over the |
| 94 | + device. The driver must also set the driver_managed_dma flag and must not |
| 95 | + touch the device until this operation succeeds. |
| 96 | + |
| 97 | +3. IOMMUFD_OBJ_HW_PAGETABLE is created when an external driver calls the IOMMUFD |
| 98 | + kAPI to attach a bound device to an IOAS. Similarly the external driver uAPI |
| 99 | + allows userspace to initiate the attaching operation. If a compatible |
| 100 | + pagetable already exists then it is reused for the attachment. Otherwise a |
| 101 | + new pagetable object and iommu_domain is created. Successful completion of |
| 102 | + this operation sets up the linkages among IOAS, device and iommu_domain. Once |
| 103 | + this completes the device could do DMA. |
| 104 | + |
| 105 | + Every iommu_domain inside the IOAS is also represented to userspace as a |
| 106 | + HW_PAGETABLE object. |
| 107 | + |
| 108 | + .. note:: |
| 109 | + |
| 110 | + Future IOMMUFD updates will provide an API to create and manipulate the |
| 111 | + HW_PAGETABLE directly. |
| 112 | + |
| 113 | +A device can only bind to an iommufd due to DMA ownership claim and attach to at |
| 114 | +most one IOAS object (no support of PASID yet). |
| 115 | + |
| 116 | +Kernel Datastructure |
| 117 | +-------------------- |
| 118 | + |
| 119 | +User visible objects are backed by following datastructures: |
| 120 | + |
| 121 | +- iommufd_ioas for IOMMUFD_OBJ_IOAS. |
| 122 | +- iommufd_device for IOMMUFD_OBJ_DEVICE. |
| 123 | +- iommufd_hw_pagetable for IOMMUFD_OBJ_HW_PAGETABLE. |
| 124 | + |
| 125 | +Several terminologies when looking at these datastructures: |
| 126 | + |
| 127 | +- Automatic domain - refers to an iommu domain created automatically when |
| 128 | + attaching a device to an IOAS object. This is compatible to the semantics of |
| 129 | + VFIO type1. |
| 130 | + |
| 131 | +- Manual domain - refers to an iommu domain designated by the user as the |
| 132 | + target pagetable to be attached to by a device. Though currently there are |
| 133 | + no uAPIs to directly create such domain, the datastructure and algorithms |
| 134 | + are ready for handling that use case. |
| 135 | + |
| 136 | +- In-kernel user - refers to something like a VFIO mdev that is using the |
| 137 | + IOMMUFD access interface to access the IOAS. This starts by creating an |
| 138 | + iommufd_access object that is similar to the domain binding a physical device |
| 139 | + would do. The access object will then allow converting IOVA ranges into struct |
| 140 | + page * lists, or doing direct read/write to an IOVA. |
| 141 | + |
| 142 | +iommufd_ioas serves as the metadata datastructure to manage how IOVA ranges are |
| 143 | +mapped to memory pages, composed of: |
| 144 | + |
| 145 | +- struct io_pagetable holding the IOVA map |
| 146 | +- struct iopt_area's representing populated portions of IOVA |
| 147 | +- struct iopt_pages representing the storage of PFNs |
| 148 | +- struct iommu_domain representing the IO page table in the IOMMU |
| 149 | +- struct iopt_pages_access representing in-kernel users of PFNs |
| 150 | +- struct xarray pinned_pfns holding a list of pages pinned by in-kernel users |
| 151 | + |
| 152 | +Each iopt_pages represents a logical linear array of full PFNs. The PFNs are |
| 153 | +ultimately derived from userspace VAs via an mm_struct. Once they have been |
| 154 | +pinned the PFNs are stored in IOPTEs of an iommu_domain or inside the pinned_pfns |
| 155 | +xarray if they have been pinned through an iommufd_access. |
| 156 | + |
| 157 | +PFN have to be copied between all combinations of storage locations, depending |
| 158 | +on what domains are present and what kinds of in-kernel "software access" users |
| 159 | +exist. The mechanism ensures that a page is pinned only once. |
| 160 | + |
| 161 | +An io_pagetable is composed of iopt_areas pointing at iopt_pages, along with a |
| 162 | +list of iommu_domains that mirror the IOVA to PFN map. |
| 163 | + |
| 164 | +Multiple io_pagetable-s, through their iopt_area-s, can share a single |
| 165 | +iopt_pages which avoids multi-pinning and double accounting of page |
| 166 | +consumption. |
| 167 | + |
| 168 | +iommufd_ioas is sharable between subsystems, e.g. VFIO and VDPA, as long as |
| 169 | +devices managed by different subsystems are bound to a same iommufd. |
| 170 | + |
| 171 | +IOMMUFD User API |
| 172 | +================ |
| 173 | + |
| 174 | +.. kernel-doc:: include/uapi/linux/iommufd.h |
| 175 | + |
| 176 | +IOMMUFD Kernel API |
| 177 | +================== |
| 178 | + |
| 179 | +The IOMMUFD kAPI is device-centric with group-related tricks managed behind the |
| 180 | +scene. This allows the external drivers calling such kAPI to implement a simple |
| 181 | +device-centric uAPI for connecting its device to an iommufd, instead of |
| 182 | +explicitly imposing the group semantics in its uAPI as VFIO does. |
| 183 | + |
| 184 | +.. kernel-doc:: drivers/iommu/iommufd/device.c |
| 185 | + :export: |
| 186 | + |
| 187 | +.. kernel-doc:: drivers/iommu/iommufd/main.c |
| 188 | + :export: |
| 189 | + |
| 190 | +VFIO and IOMMUFD |
| 191 | +---------------- |
| 192 | + |
| 193 | +Connecting a VFIO device to iommufd can be done in two ways. |
| 194 | + |
| 195 | +First is a VFIO compatible way by directly implementing the /dev/vfio/vfio |
| 196 | +container IOCTLs by mapping them into io_pagetable operations. Doing so allows |
| 197 | +the use of iommufd in legacy VFIO applications by symlinking /dev/vfio/vfio to |
| 198 | +/dev/iommufd or extending VFIO to SET_CONTAINER using an iommufd instead of a |
| 199 | +container fd. |
| 200 | + |
| 201 | +The second approach directly extends VFIO to support a new set of device-centric |
| 202 | +user API based on aforementioned IOMMUFD kernel API. It requires userspace |
| 203 | +change but better matches the IOMMUFD API semantics and easier to support new |
| 204 | +iommufd features when comparing it to the first approach. |
| 205 | + |
| 206 | +Currently both approaches are still work-in-progress. |
| 207 | + |
| 208 | +There are still a few gaps to be resolved to catch up with VFIO type1, as |
| 209 | +documented in iommufd_vfio_check_extension(). |
| 210 | + |
| 211 | +Future TODOs |
| 212 | +============ |
| 213 | + |
| 214 | +Currently IOMMUFD supports only kernel-managed I/O page table, similar to VFIO |
| 215 | +type1. New features on the radar include: |
| 216 | + |
| 217 | + - Binding iommu_domain's to PASID/SSID |
| 218 | + - Userspace page tables, for ARM, x86 and S390 |
| 219 | + - Kernel bypass'd invalidation of user page tables |
| 220 | + - Re-use of the KVM page table in the IOMMU |
| 221 | + - Dirty page tracking in the IOMMU |
| 222 | + - Runtime Increase/Decrease of IOPTE size |
| 223 | + - PRI support with faults resolved in userspace |
0 commit comments