-
Notifications
You must be signed in to change notification settings - Fork 105
[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] mm: swap: check for stable address space before operating on the VMA #1278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] mm: swap: check for stable address space before operating on the VMA #1278
Conversation
mainline inclusion
from mainline-v6.18-rc1
category: bugfix
It is possible to hit a zero entry while traversing the vmas in unuse_mm()
called from swapoff path and accessing it causes the OOPS:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000446--> Loading the memory from offset 0x40 on the
XA_ZERO_ENTRY as address.
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
The issue is manifested from the below race between the fork() on a
process and swapoff:
fork(dup_mmap()) swapoff(unuse_mm)
--------------- -----------------
1) Identical mtree is built using
__mt_dup().
2) copy_pte_range()-->
copy_nonpresent_pte():
The dst mm is added into the
mmlist to be visible to the
swapoff operation.
3) Fatal signal is sent to the parent
process(which is the current during the
fork) thus skip the duplication of the
vmas and mark the vma range with
XA_ZERO_ENTRY as a marker for this process
that helps during exit_mmap().
4) swapoff is tried on the
'mm' added to the 'mmlist' as
part of the 2.
5) unuse_mm(), that iterates
through the vma's of this 'mm'
will hit the non-NULL zero entry
and operating on this zero entry
as a vma is resulting into the
oops.
The proper fix would be around not exposing this partially-valid tree to
others when droping the mmap lock, which is being solved with [1]. A
simpler solution would be checking for MMF_UNSTABLE, as it is set if
mm_struct is not fully initialized in dup_mmap().
Thanks to Liam/Lorenzo/David for all the suggestions in fixing this
issue.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Fixes: d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Charan Teja Kalla <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Chris Li <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Kemeng Shi <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Peng Zhang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 1367da7)
Signed-off-by: Wentao Guan <[email protected]>
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis patch backports a check for a stable address space in unuse_mm to prevent dereferencing zero entries in the VMA list during swapoff, implementing an early exit under mmap_read_lock when the mm_struct is not fully initialized. Sequence diagram for swapoff and fork race condition handlingsequenceDiagram
participant swapoff
participant unuse_mm
participant mm_struct
participant fork
participant dup_mmap
participant __mt_dup
participant XA_ZERO_ENTRY
swapoff->>unuse_mm: Call unuse_mm(mm)
unuse_mm->>mm_struct: Acquire mmap_read_lock
unuse_mm->>mm_struct: check_stable_address_space(mm)
alt mm_struct is unstable
unuse_mm-->>swapoff: Early exit, skip VMA iteration
else mm_struct is stable
unuse_mm->>mm_struct: Iterate VMA list
unuse_mm->>XA_ZERO_ENTRY: May hit zero entry (previously caused OOPS)
unuse_mm-->>swapoff: Complete VMA operations
end
mm_struct->>unuse_mm: Release mmap_read_lock
Class diagram for mm_struct and VMA iteration changesclassDiagram
class mm_struct {
+mmap_read_lock()
+mmap_read_unlock()
+check_stable_address_space()
}
class unuse_mm {
+int unuse_mm(mm_struct *mm, unsigned int type)
}
class VMA_ITERATOR {
}
class vma {
+anon_vma
+is_vm_hugetlb_page()
}
mm_struct <|-- unuse_mm
unuse_mm o-- VMA_ITERATOR
VMA_ITERATOR o-- vma
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
deepin pr auto review我来对这个代码修改进行详细审查:
改进后的代码建议: static int unuse_mm(struct mm_struct *mm, unsigned int type)
{
struct vm_area_struct *vma;
int ret = 0;
VMA_ITERATOR(vmi, mm, 0);
mmap_read_lock(mm);
if (check_stable_address_space(mm)) {
ret = -EINVAL; /* 添加明确的错误码 */
goto unlock;
}
for_each_vma(vmi, vma) {
if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
ret = unuse_vma(vma, type);
if (ret)
goto unlock;
}
cond_resched();
}
unlock:
mmap_read_unlock(mm);
return ret;
}主要改进点:
这些改动提高了代码的可维护性和健壮性,同时保持了原有的性能特征。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Move the check_stable_address_space(mm) call to before mmap_read_lock(mm) so you avoid taking the lock on an unstable mm and simplify control flow.
- Add a brief comment above the stable address space check to explain why you must skip the VMA loop for a partially initialized mm_struct.
- Rename the 'unlock' label to something like 'out_unlock' to follow existing kernel label naming conventions.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Move the check_stable_address_space(mm) call to before mmap_read_lock(mm) so you avoid taking the lock on an unstable mm and simplify control flow.
- Add a brief comment above the stable address space check to explain why you must skip the VMA loop for a partially initialized mm_struct.
- Rename the 'unlock' label to something like 'out_unlock' to follow existing kernel label naming conventions.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds an early address space stability check to the unuse_mm() function in the swap subsystem. The check is performed after acquiring the mmap read lock but before iterating through VMAs to prevent operations on memory regions that are being reaped by the OOM killer.
Key Changes:
- Added
check_stable_address_space(mm)check aftermmap_read_lock() - Added an
unlocklabel to enable early exit from the function
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| VMA_ITERATOR(vmi, mm, 0); | ||
|
|
||
| mmap_read_lock(mm); | ||
| if (check_stable_address_space(mm)) |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value of check_stable_address_space(mm) is not captured in ret before jumping to unlock. This causes the function to return 0 (success) instead of the error code (VM_FAULT_SIGBUS) when the address space is unstable. Other callers of this function (in mm/huge_memory.c and mm/memory.c) properly assign the return value to ret. Change to: ret = check_stable_address_space(mm); if (ret) goto unlock;
| if (check_stable_address_space(mm)) | |
| ret = check_stable_address_space(mm); | |
| if (ret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO,check_stable_address_space(mm) return vm_fault type, and ret expect such as -ENOMEM to return out to syscall swapoff. In unuse_mm() ,discover unstable mm just skip it,NOT ERROR,no need to pass the value.
aac1193
into
deepin-community:linux-6.6.y
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: lanlanxiyiji The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
We backport d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
in #672 , so our 6.6 need the fix.
mainline inclusion
from mainline-v6.18-rc1
category: bugfix
It is possible to hit a zero entry while traversing the vmas in unuse_mm() called from swapoff path and accessing it causes the OOPS:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000446--> Loading the memory from offset 0x40 on the XA_ZERO_ENTRY as address.
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
The issue is manifested from the below race between the fork() on a process and swapoff:
fork(dup_mmap()) swapoff(unuse_mm)
Identical mtree is built using
__mt_dup().
copy_pte_range()-->
copy_nonpresent_pte():
The dst mm is added into the
mmlist to be visible to the
swapoff operation.
Fatal signal is sent to the parent
process(which is the current during the
fork) thus skip the duplication of the
vmas and mark the vma range with
XA_ZERO_ENTRY as a marker for this process
that helps during exit_mmap().
The proper fix would be around not exposing this partially-valid tree to others when droping the mmap lock, which is being solved with [1]. A simpler solution would be checking for MMF_UNSTABLE, as it is set if mm_struct is not fully initialized in dup_mmap().
Thanks to Liam/Lorenzo/David for all the suggestions in fixing this issue.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Fixes: d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Suggested-by: David Hildenbrand [email protected]
Cc: Baoquan He [email protected]
Cc: Barry Song [email protected]
Cc: Chris Li [email protected]
Cc: Kairui Song [email protected]
Cc: Kemeng Shi [email protected]
Cc: Liam Howlett [email protected]
Cc: Lorenzo Stoakes [email protected]
Cc: Nhat Pham [email protected]
Cc: Peng Zhang [email protected]
Cc: [email protected]
(cherry picked from commit 1367da7)
Summary by Sourcery
Bug Fixes: