Skip to content

Conversation

@opsiff
Copy link
Member

@opsiff opsiff commented Nov 6, 2025

We backport d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
in #672 , so our 6.6 need the fix.

mainline inclusion
from mainline-v6.18-rc1
category: bugfix

It is possible to hit a zero entry while traversing the vmas in unuse_mm() called from swapoff path and accessing it causes the OOPS:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000446--> Loading the memory from offset 0x40 on the XA_ZERO_ENTRY as address.
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault

The issue is manifested from the below race between the fork() on a process and swapoff:
fork(dup_mmap()) swapoff(unuse_mm)


  1. Identical mtree is built using
    __mt_dup().

  2. copy_pte_range()-->
    copy_nonpresent_pte():
    The dst mm is added into the
    mmlist to be visible to the
    swapoff operation.

  3. Fatal signal is sent to the parent
    process(which is the current during the
    fork) thus skip the duplication of the
    vmas and mark the vma range with
    XA_ZERO_ENTRY as a marker for this process
    that helps during exit_mmap().

     		     4) swapoff is tried on the
     			'mm' added to the 'mmlist' as
     			part of the 2.
    
     		     5) unuse_mm(), that iterates
     			through the vma's of this 'mm'
     			will hit the non-NULL zero entry
     			and operating on this zero entry
     			as a vma is resulting into the
     			oops.
    

The proper fix would be around not exposing this partially-valid tree to others when droping the mmap lock, which is being solved with [1]. A simpler solution would be checking for MMF_UNSTABLE, as it is set if mm_struct is not fully initialized in dup_mmap().

Thanks to Liam/Lorenzo/David for all the suggestions in fixing this issue.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Fixes: d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")

Suggested-by: David Hildenbrand [email protected]
Cc: Baoquan He [email protected]
Cc: Barry Song [email protected]
Cc: Chris Li [email protected]
Cc: Kairui Song [email protected]
Cc: Kemeng Shi [email protected]
Cc: Liam Howlett [email protected]
Cc: Lorenzo Stoakes [email protected]
Cc: Nhat Pham [email protected]
Cc: Peng Zhang [email protected]
Cc: [email protected]

(cherry picked from commit 1367da7)

Summary by Sourcery

Bug Fixes:

  • Add check_stable_address_space(mm) early in unuse_mm() to avoid operating on partially initialized address spaces during swapoff

mainline inclusion
from mainline-v6.18-rc1
category: bugfix

It is possible to hit a zero entry while traversing the vmas in unuse_mm()
called from swapoff path and accessing it causes the OOPS:

Unable to handle kernel NULL pointer dereference at virtual address
0000000000000446--> Loading the memory from offset 0x40 on the
XA_ZERO_ENTRY as address.
Mem abort info:
  ESR = 0x0000000096000005
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault

The issue is manifested from the below race between the fork() on a
process and swapoff:
fork(dup_mmap())			swapoff(unuse_mm)
---------------                         -----------------
1) Identical mtree is built using
   __mt_dup().

2) copy_pte_range()-->
	copy_nonpresent_pte():
       The dst mm is added into the
    mmlist to be visible to the
    swapoff operation.

3) Fatal signal is sent to the parent
process(which is the current during the
fork) thus skip the duplication of the
vmas and mark the vma range with
XA_ZERO_ENTRY as a marker for this process
that helps during exit_mmap().

				     4) swapoff is tried on the
					'mm' added to the 'mmlist' as
					part of the 2.

				     5) unuse_mm(), that iterates
					through the vma's of this 'mm'
					will hit the non-NULL zero entry
					and operating on this zero entry
					as a vma is resulting into the
					oops.

The proper fix would be around not exposing this partially-valid tree to
others when droping the mmap lock, which is being solved with [1].  A
simpler solution would be checking for MMF_UNSTABLE, as it is set if
mm_struct is not fully initialized in dup_mmap().

Thanks to Liam/Lorenzo/David for all the suggestions in fixing this
issue.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Fixes: d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Charan Teja Kalla <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Chris Li <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Kemeng Shi <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Peng Zhang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 1367da7)
Signed-off-by: Wentao Guan <[email protected]>
@sourcery-ai
Copy link

sourcery-ai bot commented Nov 6, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This patch backports a check for a stable address space in unuse_mm to prevent dereferencing zero entries in the VMA list during swapoff, implementing an early exit under mmap_read_lock when the mm_struct is not fully initialized.

Sequence diagram for swapoff and fork race condition handling

sequenceDiagram
participant swapoff
participant unuse_mm
participant mm_struct
participant fork
participant dup_mmap
participant __mt_dup
participant XA_ZERO_ENTRY
swapoff->>unuse_mm: Call unuse_mm(mm)
unuse_mm->>mm_struct: Acquire mmap_read_lock
unuse_mm->>mm_struct: check_stable_address_space(mm)
alt mm_struct is unstable
unuse_mm-->>swapoff: Early exit, skip VMA iteration
else mm_struct is stable
unuse_mm->>mm_struct: Iterate VMA list
unuse_mm->>XA_ZERO_ENTRY: May hit zero entry (previously caused OOPS)
unuse_mm-->>swapoff: Complete VMA operations
end
mm_struct->>unuse_mm: Release mmap_read_lock
Loading

Class diagram for mm_struct and VMA iteration changes

classDiagram
class mm_struct {
  +mmap_read_lock()
  +mmap_read_unlock()
  +check_stable_address_space()
}
class unuse_mm {
  +int unuse_mm(mm_struct *mm, unsigned int type)
}
class VMA_ITERATOR {
}
class vma {
  +anon_vma
  +is_vm_hugetlb_page()
}
mm_struct <|-- unuse_mm
unuse_mm o-- VMA_ITERATOR
VMA_ITERATOR o-- vma
Loading

File-Level Changes

Change Details Files
Add stable address space check and early exit in unuse_mm
  • Invoke check_stable_address_space(mm) immediately after acquiring mmap_read_lock
  • Branch to a new unlock label on unstable address space to skip VMA iteration
  • Introduce unlock label before mmap_read_unlock to unify the exit path
mm/swapfile.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

deepin pr auto review

我来对这个代码修改进行详细审查:

  1. 语法逻辑分析:
  • 代码在语法上是正确的,使用了goto语句进行错误处理
  • 在遍历VMA之前添加了check_stable_address_space()检查
  • 使用了统一的unlock标签来处理mmap_read_unlock()的释放
  1. 代码质量改进建议:
  • goto的使用虽然合理,但可以考虑重构为更清晰的控制流
  • 建议在goto unlock之前添加注释说明跳转原因
  • 可以考虑将check_stable_address_space()的检查结果记录在日志中,便于调试
  1. 代码性能分析:
  • 添加的检查会带来轻微的性能开销,但这是必要的稳定性检查
  • cond_resched()的位置保持不变,确保了良好的调度性能
  • 没有引入额外的循环或复杂计算,性能影响最小
  1. 代码安全分析:
  • 添加的address space稳定性检查提高了安全性
  • 确保了在异常情况下也能正确释放mmap_read_lock
  • 避免了在不稳定的地址空间上执行危险操作

改进后的代码建议:

static int unuse_mm(struct mm_struct *mm, unsigned int type)
{
    struct vm_area_struct *vma;
    int ret = 0;
    VMA_ITERATOR(vmi, mm, 0);

    mmap_read_lock(mm);
    if (check_stable_address_space(mm)) {
        ret = -EINVAL;  /* 添加明确的错误码 */
        goto unlock;
    }
    
    for_each_vma(vmi, vma) {
        if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
            ret = unuse_vma(vma, type);
            if (ret)
                goto unlock;
        }

        cond_resched();
    }
unlock:
    mmap_read_unlock(mm);
    return ret;
}

主要改进点:

  1. 添加了明确的错误返回码-EINVAL
  2. 保持了原有的goto结构,但使其更清晰
  3. 代码结构更加清晰,便于维护
  4. 错误处理更加统一和明确

这些改动提高了代码的可维护性和健壮性,同时保持了原有的性能特征。

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Move the check_stable_address_space(mm) call to before mmap_read_lock(mm) so you avoid taking the lock on an unstable mm and simplify control flow.
  • Add a brief comment above the stable address space check to explain why you must skip the VMA loop for a partially initialized mm_struct.
  • Rename the 'unlock' label to something like 'out_unlock' to follow existing kernel label naming conventions.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Move the check_stable_address_space(mm) call to before mmap_read_lock(mm) so you avoid taking the lock on an unstable mm and simplify control flow.
- Add a brief comment above the stable address space check to explain why you must skip the VMA loop for a partially initialized mm_struct.
- Rename the 'unlock' label to something like 'out_unlock' to follow existing kernel label naming conventions.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an early address space stability check to the unuse_mm() function in the swap subsystem. The check is performed after acquiring the mmap read lock but before iterating through VMAs to prevent operations on memory regions that are being reaped by the OOM killer.

Key Changes:

  • Added check_stable_address_space(mm) check after mmap_read_lock()
  • Added an unlock label to enable early exit from the function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

VMA_ITERATOR(vmi, mm, 0);

mmap_read_lock(mm);
if (check_stable_address_space(mm))
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value of check_stable_address_space(mm) is not captured in ret before jumping to unlock. This causes the function to return 0 (success) instead of the error code (VM_FAULT_SIGBUS) when the address space is unstable. Other callers of this function (in mm/huge_memory.c and mm/memory.c) properly assign the return value to ret. Change to: ret = check_stable_address_space(mm); if (ret) goto unlock;

Suggested change
if (check_stable_address_space(mm))
ret = check_stable_address_space(mm);
if (ret)

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO,check_stable_address_space(mm) return vm_fault type, and ret expect such as -ENOMEM to return out to syscall swapoff. In unuse_mm() ,discover unstable mm just skip it,NOT ERROR,no need to pass the value.

@lanlanxiyiji lanlanxiyiji merged commit aac1193 into deepin-community:linux-6.6.y Nov 7, 2025
18 of 19 checks passed
@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lanlanxiyiji
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants