Skip to content

Conversation

@opsiff
Copy link
Member

@opsiff opsiff commented Mar 19, 2025

Summary by Sourcery

This pull request introduces the ability to duplicate a maple tree, which is used in the kernel's memory management. It also fixes a number of bugs related to memory management during forking and execve, and improves the performance of forking by using the new maple tree duplication functionality.

New Features:

  • Adds the ability to duplicate a maple tree using mtree_dup().

Tests:

  • Adds tests for mtree_dup().

@sourcery-ai
Copy link

sourcery-ai bot commented Mar 19, 2025

Reviewer's Guide by Sourcery

This pull request introduces a new function, mtree_dup, to duplicate an entire maple tree. It also includes several bug fixes and enhancements related to memory management and error handling during the duplication process. Additionally, the pull request integrates the maple tree duplication functionality into the fork() system call, improving the efficiency of memory management when creating new processes.

Sequence diagram for fork() with mtree_dup

sequenceDiagram
  participant fork()
  participant dup_mmap()
  participant __mt_dup()
  participant ksm_fork()
  participant khugepaged_fork()

  fork()->>dup_mmap(): dup_mmap(mm, oldmm)
  dup_mmap()->>__mt_dup(): __mt_dup(&oldmm->mm_mt, &mm->mm_mt, GFP_KERNEL)
  alt __mt_dup fails
    __mt_dup-->>dup_mmap(): Return error
    dup_mmap()->>ksm_fork(): ksm_fork(mm, oldmm)
    dup_mmap()->>khugepaged_fork(): khugepaged_fork(mm, oldmm)
  else __mt_dup succeeds
    __mt_dup-->>dup_mmap(): Return success
    dup_mmap()->>ksm_fork(): ksm_fork(mm, oldmm)
    dup_mmap()->>khugepaged_fork(): khugepaged_fork(mm, oldmm)
  end
  dup_mmap()->>fork(): Return result
Loading

File-Level Changes

Change Details Files
Introduces mtree_dup and __mt_dup functions to duplicate an entire maple tree, offering a more efficient way to copy trees compared to element-by-element insertion.
  • Implements mtree_dup for duplicating maple trees with locking.
  • Implements __mt_dup for duplicating maple trees without locking.
  • Adds mas_dup_build to build a new maple tree from a source tree.
  • Adds mas_copy_node to copy a maple node and replace the parent.
  • Adds mas_dup_alloc to allocate child nodes for a maple node.
  • Adds mas_dup_free to free an incomplete duplication of a tree.
lib/maple_tree.c
include/linux/maple_tree.h
Integrates maple tree duplication into the fork() system call to improve memory management efficiency during process creation.
  • Replaces the original VMA-based memory duplication logic in dup_mmap with __mt_dup for maple tree duplication.
  • Handles potential memory allocation failures during maple tree duplication in dup_mmap.
  • Introduces vma_iter_clear_gfp to clear a range of entries in the VMA iterator.
  • Introduces XA_ZERO_ENTRY to mark failure points during mmap duplication.
  • Adjusts KSM and khugepaged integration within dup_mmap to accommodate the new duplication mechanism.
  • Updates exit_mmap to handle potentially incomplete VMA duplication due to errors during dup_mmap.
kernel/fork.c
mm/mmap.c
include/linux/mm.h
mm/memory.c
Adds comprehensive tests for the new mtree_dup function, covering various scenarios including full tree duplication, normal duplication, and memory allocation failure.
  • Introduces compare_node to compare two maple nodes.
  • Introduces compare_tree to compare two maple trees.
  • Introduces build_full_tree to build a full tree for testing.
  • Introduces check_mtree_dup to test the mtree_dup function.
  • Modifies check_forking to use __mt_dup.
  • Adds tests for memory allocation failures during duplication.
tools/testing/radix-tree/maple.c
lib/test_maple_tree.c
Adds a test to verify KSM is enabled for the process after fork exec.
  • Adds ksm_fork_exec_child to test if KSM is enabled for the process.
  • Adds test_prctl_fork_exec to verify KSM is enabled for the process after fork exec.
tools/testing/selftests/mm/ksm_functional_tests.c
Adjusts KSM integration with execve.
  • Calls ksm_execve with mmap write lock held.
  • Adds ksm_execve to check if KSM is enabled for the process after execve.
fs/exec.c
include/linux/ksm.h
Fixes memory freeing in kmem_cache when bulk allocation fails.
  • Fixes memory freeing in kmem_cache when bulk allocation fails.
tools/testing/radix-tree/linux.c
Fixes userfaultfd context handling on fork failure.
  • Adds dup_userfaultfd_fail to handle userfaultfd context on fork failure.
fs/userfaultfd.c
include/linux/userfaultfd_k.h
Fixes a race condition in uprobes when registering for each VMA.
  • Adds check_stable_address_space to check if the address space is stable before registering uprobes.
kernel/events/uprobes.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @opsiff - I've reviewed your changes - here's some feedback:

Overall Comments:

  • The compare_tree function could be simplified by comparing the ma_root members directly after comparing ma_flags.
  • Consider adding a comment to mas_dup_build explaining why the error code is stored in mas->node.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 2 issues found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

}

static inline void dup_userfaultfd_complete(struct list_head *l)
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Inline stub for dup_userfaultfd_fail may conflict.

There is a non-inline implementation of dup_userfaultfd_fail in fs/userfaultfd.c. The inline stub in the header might override or conflict with that implementation. Consider consolidating to a single definition to avoid potential link-time or behavioral inconsistencies.

Suggested implementation:

Make sure the non-inline definition (implementation) of dup_userfaultfd_fail in fs/userfaultfd.c is maintained and that no duplicate declarations exist.

* Compares two nodes except for the addresses stored in the nodes.
* Returns zero if they are the same, otherwise returns non-zero.
*/
static int __init compare_node(struct maple_enode *enode_a,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring the compare_node and build_full_tree functions into smaller helper functions to improve code readability and reduce complexity by removing the need for goto statements and simplifying control flow.

Consider refactoring the heavily interleaved control flow (especially the use of goto statements) by breaking the functions into smaller, well‐named helper functions. This can reduce complexity without reverting functionality. For example:

  1. In compare_node():
    Separate the leaf and internal node comparisons into helper functions. For instance:

    static int compare_leaf_node(const struct maple_node *a, const struct maple_node *b) {
        return memcmp(a, b, sizeof(struct maple_node));
    }
    
    static int compare_internal_node(struct maple_node *a, struct maple_node *b, enum maple_type type) {
        void **slots_a = ma_slots(a, type);
        void **slots_b = ma_slots(b, type);
        for (int i = 0; i < mt_slots[type]; i++) {
            if (!slots_a[i] && !slots_b[i])
                break;
            if (!slots_a[i] || !slots_b[i]) {
                pr_err("The number of slots is different.\n");
                return -1;
            }
            ((unsigned long *)slots_a)[i] &= MAPLE_NODE_MASK;
            ((unsigned long *)slots_b)[i] &= MAPLE_NODE_MASK;
        }
        return memcmp(a, b, sizeof(struct maple_node));
    }

    Then your main function becomes more straightforward:

    static int __init compare_node(struct maple_enode *enode_a, struct maple_enode *enode_b)
    {
        // ... [pre-comparison initialization unchanged]
        if (ma_is_leaf(type))
            return compare_leaf_node(&a, &b);
        else
            return compare_internal_node(&a, &b, type);
    }
  2. In build_full_tree():
    Replace goto usage by extracting loops into helper functions. For example, create a helper to perform the “store” operation:

    static inline int store_subtree(struct ma_state *mas, struct maple_tree *mt) {
        mas_subtree_max_range(mas);
        unsigned long step = mas->last - mas->index;
        if (step < 1)
            return -1;
        mas->last = mas->index + step / 2;
        mas_store_gfp(mas, xa_mk_value(5), GFP_KERNEL);
        return 0;
    }

    Then structure the main loop without gotos:

    static __init int build_full_tree(struct maple_tree *mt, unsigned int flags, int height)
    {
        MA_STATE(mas, mt, 0, 0);
        int ret = 0, cnt = 1;
        enum maple_type type;
    
        mt_init_flags(mt, flags);
        mtree_insert_range(mt, 0, ULONG_MAX, xa_mk_value(5), GFP_KERNEL);
        mtree_lock(mt);
    
        while (1) {
            mas_set(&mas, 0);
            if (mt_height(mt) < height) {
                mas.max = ULONG_MAX;
            } else {
                while (1) {
                    mas_dfs_preorder(&mas);
                    if (mas_is_none(&mas)) 
                        goto unlock;
                    type = mte_node_type(mas.node);
                    if (mas_data_end(&mas) + 1 < mt_slots[type]) {
                        mas_set(&mas, mas.min);
                        break;
                    }
                }
            }
            ret = store_subtree(&mas, mt);
            if (ret)
                goto unlock;
            ++cnt;
            // Optionally add a condition to break from the infinite loop.
        }
    unlock:
        mtree_unlock(mt);
        MT_BUG_ON(mt, mt_height(mt) != height);
        return ret;
    }

These small, focused refactorings encapsulate the convoluted parts into helper functions, making the overall flow clearer and easier to maintain.

* Author: Peng Zhang <[email protected]>
*/

/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring the DFS duplication logic to improve clarity by isolating traversal steps into smaller, focused helpers or using a recursive approach where appropriate to reduce cognitive complexity without changing functionality

The DFS duplication logic mixes control flow (including a while (1) loop, manual ascent/descent management, etc.) and helper functions that can make it hard to follow. One way to reduce cognitive complexity without changing functionality is to isolate the traversal steps in small, focused helpers or even switch to a recursive approach when possible.

For example, consider splitting the non-leaf handling and leaf processing into separate inline helpers. This refactor could look like:

static inline int mas_handle_leaf(struct ma_state *mas, struct ma_state *new_mas)
{
    /* When at a leaf and not at the final node, ascend until a subtree change is needed */
    do {
        mas_ascend(mas);
        mas_ascend(new_mas);
    } while (mas->offset == mas_data_end(mas));

    mas->offset++;
    new_mas->offset++;
    return 0; // You can also return an error code if needed
}

And then in the main duplication routine, refactor the DFS loop to improve clarity:

static inline void mas_dup_build(struct ma_state *mas, struct ma_state *new_mas, gfp_t gfp)
{
    struct maple_node *node;
    struct maple_pnode *parent = NULL;
    struct maple_enode *root;
    enum maple_type type;

    if (unlikely(mt_attr(mas->tree) != mt_attr(new_mas->tree)) ||
        unlikely(!mtree_empty(new_mas->tree))) {
        mas_set_err(mas, -EINVAL);
        return;
    }

    root = mas_start(mas);
    if (mas_is_ptr(mas) || mas_is_none(mas))
        goto set_new_tree;

    node = mt_alloc_one(gfp);
    if (!node) {
        new_mas->node = MAS_NONE;
        mas_set_err(mas, -ENOMEM);
        return;
    }

    type = mte_node_type(mas->node);
    root = mt_mk_node(node, type);
    new_mas->node = root;
    new_mas->min = 0;
    new_mas->max = ULONG_MAX;
    root = mte_mk_root(root);

    while (1) {
        mas_copy_node(mas, new_mas, parent);
        if (!mte_is_leaf(mas->node)) {
            /* Allocate child nodes for non-leaf nodes */
            mas_dup_alloc(mas, new_mas, gfp);
            if (unlikely(mas_is_err(mas)))
                return;
        } else {
            /* Process leaf: if duplication complete, break; otherwise, adjust state */
            if (mas->max == ULONG_MAX)
                break;
            mas_handle_leaf(mas, new_mas);
        }

        mas_descend(mas);
        parent = ma_parent_ptr(mte_to_node(new_mas->node));
        mas_descend(new_mas);
        mas->offset = 0;
        new_mas->offset = 0;
    }

    /* Special handling for root parent update */
    mte_to_node(root)->parent = ma_parent_ptr(mas_tree_parent(new_mas));

set_new_tree:
    new_mas->tree->ma_flags = mas->tree->ma_flags;
    rcu_assign_pointer(new_mas->tree->ma_root, root);
}

Actionable Steps:

  1. Extract leaf handling: Isolate the logic for processing leaf nodes (e.g., the ascent loop and offset adjustments) into its own helper function as shown above.
  2. Refactor the DFS loop: Use the helper(s) to break up the lengthy while (1) loop into logically separated steps. This makes the control flow easier to follow.
  3. Consider a recursive approach: Where safe (e.g., tree depth is bounded), a recursive helper could replace the manual state management. Ensure to measure stack usage if you choose that route.

These changes keep the functionality intact while reducing the interleaving of complex control flow and inline operations.

@opsiff opsiff force-pushed the linux-6.6.y-2025-03-19-maple-tree branch from 31550d7 to 497b83e Compare March 19, 2025 03:37
howlett and others added 19 commits March 24, 2025 18:06
…_bulk()

mainline inclusion
from mainline-v6.7-rc1
category: bugfix

The bulk allocation is iterating through an array and storing enough
memory for the entire bulk allocation instead of a single array entry.
Only allocate an array element of the size set in the kmem_cache.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: cc86e0c ("radix tree test suite: add support for slab bulk APIs")
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: Christophe JAILLET <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

Patch series "Introduce __mt_dup() to improve the performance of fork()", v7.

This series introduces __mt_dup() to improve the performance of fork().
During the duplication process of mmap, all VMAs are traversed and
inserted one by one into the new maple tree, causing the maple tree to be
rebalanced multiple times.  Balancing the maple tree is a costly
operation.  To duplicate VMAs more efficiently, mtree_dup() and __mt_dup()
are introduced for the maple tree.  They can efficiently duplicate a maple
tree.

Here are some algorithmic details about {mtree,__mt}_dup().  We perform a
DFS pre-order traversal of all nodes in the source maple tree.  During
this process, we fully copy the nodes from the source tree to the new
tree.  This involves memory allocation, and when encountering a new node,
if it is a non-leaf node, all its child nodes are allocated at once.

This idea was originally from Liam R.  Howlett's Maple Tree Work email,
and I added some of my own ideas to implement it.  Some previous
discussions can be found in [1].  For a more detailed analysis of the
algorithm, please refer to the logs for patch [3/10] and patch [10/10].

There is a "spawn" in byte-unixbench[2], which can be used to test the
performance of fork().  I modified it slightly to make it work with
different number of VMAs.

Below are the test results.  The first row shows the number of VMAs.  The
second and third rows show the number of fork() calls per ten seconds,
corresponding to next-20231006 and the this patchset, respectively.  The
test results were obtained with CPU binding to avoid scheduler load
balancing that could cause unstable results.  There are still some
fluctuations in the test results, but at least they are better than the
original performance.

21     121   221    421    821    1621   3221   6421   12821  25621  51221
112100 76261 54227  34035  20195  11112  6017   3161   1606   802    393
114558 83067 65008  45824  28751  16072  8922   4747   2436   1233   599
2.19%  8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42%

Thanks to Liam and Matthew for the review.

This patch (of 10):

Add two helpers:
1. mt_free_one(), used to free a maple node.
2. mt_attr(), used to obtain the attributes of maple tree.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 4f2267b)
Signed-off-by: Wentao Guan <[email protected]>
mainline inlcusion
from mainline-v6.8-rc1
category: performance

In some cases, nested locks may be needed, so {mtree,mas}_lock_nested is
introduced.  For example, when duplicating maple tree, we need to hold the
locks of two trees, in which case nested locks are needed.

At the same time, add the definition of spin_lock_nested() in tools for
testing.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit b2472ef)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

Introduce interfaces __mt_dup() and mtree_dup(), which are used to
duplicate a maple tree.  They duplicate a maple tree in Depth-First Search
(DFS) pre-order traversal.  It uses memcopy() to copy nodes in the source
tree and allocate new child nodes in non-leaf nodes.  The new node is
exactly the same as the source node except for all the addresses stored in
it.  It will be faster than traversing all elements in the source tree and
inserting them one by one into the new tree.  The time complexity of these
two functions is O(n).

The difference between __mt_dup() and mtree_dup() is that mtree_dup()
handles locks internally.

Analysis of the average time complexity of this algorithm:

For simplicity, let's assume that the maximum branching factor of all
non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a
full tree.

Under the given conditions, if there is a maple tree with n elements, the
number of its leaves is n/16.  From bottom to top, the number of nodes in
each level is 1/16 of the number of nodes in the level below.  So the
total number of nodes in the entire tree is given by the sum of n/16 +
n/16^2 + n/16^3 + ...  + 1.  This is a geometric series, and it has log(n)
terms with base 16.  According to the formula for the sum of a geometric
series, the sum of this series can be calculated as (n-1)/15.  Each node
has only one parent node pointer, which can be considered as an edge.  In
total, there are (n-1)/15-1 edges.

This algorithm consists of two operations:

1. Traversing all nodes in DFS order.
2. For each node, making a copy and performing necessary modifications
   to create a new node.

For the first part, DFS traversal will visit each edge twice.  Let
T(ascend) represent the cost of taking one step downwards, and T(descend)
represent the cost of taking one step upwards.  And both of them are
constants (although mas_ascend() may not be, as it contains a loop, but
here we ignore it and treat it as a constant).  So the time spent on the
first part can be represented as ((n-1)/15-1) * (T(ascend) + T(descend)).

For the second part, each node will be copied, and the cost of copying a
node is denoted as T(copy_node).  For each non-leaf node, it is necessary
to reallocate all child nodes, and the cost of this operation is denoted
as T(dup_alloc).  The behavior behind memory allocation is complex and not
specific to the maple tree operation.  Here, we assume that the time
required for a single allocation is constant.  Since the size of a node is
fixed, both of these symbols are also constants.  We can calculate that
the time spent on the second part is ((n-1)/15) * T(copy_node) + ((n-1)/15
- n/16) * T(dup_alloc).

Adding both parts together, the total time spent by the algorithm can be
represented as:

((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) -
n/16 * T(dup_alloc) - (T(ascend) + T(descend))

Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)
Let C2 = T(dup_alloc)
Let C3 = T(ascend) + T(descend)

Finally, the expression can be simplified as:
((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3).

This is a linear function, so the average time complexity is O(n).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Suggested-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit fd32e4e)
Signed-off-by: Wentao Guan <[email protected]>
…vior.

mainline inclusion
from mainline-v6.8-rc1
category: bugfix

When kmem_cache_alloc_bulk() fails to allocate, leave the freed pointers
in the array.  This enables a more accurate simulation of the kernel's
behavior and allows for testing potential double-free scenarios.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 46c99e2)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

Add test for mtree_dup().

Test by duplicating different maple trees and then comparing the two
trees.  Includes tests for duplicating full trees and memory allocation
failures on different nodes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit a2587a7)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

Introduce the new interface mtree_dup() in the documentation.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 9bc1d3c)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

Skip other tests when BENCH is enabled so that performance can be measured
in user space.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit f670fa1)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

Updated check_forking() and bench_forking() to use __mt_dup() to duplicate
maple tree.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 446e186)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

When destroying maple tree, preserve its attributes and then turn it into
an empty tree.  This allows it to be reused without needing to be
reinitialized.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 8e50d32)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

In dup_mmap(), using __mt_dup() to duplicate the old maple tree and then
directly replacing the entries of VMAs in the new maple tree can result in
better performance.  __mt_dup() uses DFS pre-order to duplicate the maple
tree, so it is efficient.

The average time complexity of __mt_dup() is O(n), where n is the number
of VMAs.  The proof of the time complexity is provided in the commit log
that introduces __mt_dup().  After duplicating the maple tree, each
element is traversed and replaced (ignoring the cases of deletion, which
are rare).  Since it is only a replacement operation for each element,
this process is also O(n).

Analyzing the exact time complexity of the previous algorithm is
challenging because each insertion can involve appending to a node,
pushing data to adjacent nodes, or even splitting nodes.  The frequency of
each action is difficult to calculate.  The worst-case scenario for a
single insertion is when the tree undergoes splitting at every level.  If
we consider each insertion as the worst-case scenario, we can determine
that the upper bound of the time complexity is O(n*log(n)), although this
is a loose upper bound.  However, based on the test data, it appears that
the actual time complexity is likely to be O(n).

As the entire maple tree is duplicated using __mt_dup(), if dup_mmap()
fails, there will be a portion of VMAs that have not been duplicated in
the maple tree.  To handle this, we mark the failure point with
XA_ZERO_ENTRY.  In exit_mmap(), if this marker is encountered, stop
releasing VMAs that have not been duplicated after this point.

There is a "spawn" in byte-unixbench[1], which can be used to test the
performance of fork().  I modified it slightly to make it work with
different number of VMAs.

Below are the test results.  The first row shows the number of VMAs.  The
second and third rows show the number of fork() calls per ten seconds,
corresponding to next-20231006 and the this patchset, respectively.  The
test results were obtained with CPU binding to avoid scheduler load
balancing that could cause unstable results.  There are still some
fluctuations in the test results, but at least they are better than the
original performance.

21     121   221    421    821    1621   3221   6421   12821  25621  51221
112100 76261 54227  34035  20195  11112  6017   3161   1606   802    393
114558 83067 65008  45824  28751  16072  8922   4747   2436   1233   599
2.19%  8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42%

[1] https://github.com/kdlucas/byte-unixbench/tree/master

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Suggested-by: Liam R. Howlett <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit d240629)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.12-rc6
category: bugfix

Patch series "fork: do not expose incomplete mm on fork".

During fork we may place the virtual memory address space into an
inconsistent state before the fork operation is complete.

In addition, we may encounter an error during the fork operation that
indicates that the virtual memory address space is invalidated.

As a result, we should not be exposing it in any way to external machinery
that might interact with the mm or VMAs, machinery that is not designed to
deal with incomplete state.

We specifically update the fork logic to defer khugepaged and ksm to the
end of the operation and only to be invoked if no error arose, and
disallow uffd from observing fork events should an error have occurred.

This patch (of 2):

Currently on fork we expose the virtual address space of a process to
userland unconditionally if uffd is registered in VMAs, regardless of
whether an error arose in the fork.

This is performed in dup_userfaultfd_complete() which is invoked
unconditionally, and performs two duties - invoking registered handlers
for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down
userfaultfd_fork_ctx objects established in dup_userfaultfd().

This is problematic, because the virtual address space may not yet be
correctly initialised if an error arose.

The change in commit d240629 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.

We address this by, on fork error, ensuring that we roll back state that
we would otherwise expect to clean up through the event being handled by
userland and perform the memory freeing duty otherwise performed by
dup_userfaultfd_complete().

We do this by implementing a new function, dup_userfaultfd_fail(), which
performs the same loop, only decrementing reference counts.

Note that we perform mmgrab() on the parent and child mm's, however
userfaultfd_ctx_put() will mmdrop() this once the reference count drops to
zero, so we will avoid memory leaks correctly here.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.1729014377.git.lorenzo.stoakes@oracle.com
Fixes: d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <[email protected]>
Reported-by: Jann Horn <[email protected]>
Reviewed-by: Jann Horn <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit f64e67e)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: bugfix

Patch series "mm/ksm: add fork-exec support for prctl", v4.

A process can enable KSM with the prctl system call.  When the process is
forked the KSM flag is inherited by the child process.  However if the
process is executing an exec system call directly after the fork, the KSM
setting is cleared.  This patch series addresses this problem.

1) Change the mask in coredump.h for execing a new process
2) Add a new test case in ksm_functional_tests

This patch (of 2):

Today we have two ways to enable KSM:

1) madvise system call
   This allows to enable KSM for a memory region for a long time.

2) prctl system call
   This is a recent addition to enable KSM for the complete process.
   In addition when a process is forked, the KSM setting is inherited.

This change only affects the second case.

One of the use cases for (2) was to support the ability to enable
KSM for cgroups. This allows systemd to enable KSM for the seed
process. By enabling it in the seed process all child processes inherit
the setting.

This works correctly when the process is forked. However it doesn't
support fork/exec workflow.

From the previous cover letter:

....
Use case 3:
With the madvise call sharing opportunities are only enabled for the
current process: it is a workload-local decision. A considerable number
of sharing opportunities may exist across multiple workloads or jobs
(if they are part of the same security domain). Only a higler level
entity like a job scheduler or container can know for certain if its
running one or more instances of a job. That job scheduler however
doesn't have the necessary internal workload knowledge to make targeted
madvise calls.
....

In addition it can also be a bit surprising that fork keeps the KSM
setting and fork/exec does not.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Stefan Roesch <[email protected]>
Fixes: d7597f5 ("mm: add new api to enable ksm per process")
Reviewed-by: David Hildenbrand <[email protected]>
Reported-by: Carl Klemm <[email protected]>
Tested-by: Carl Klemm <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 3c6f33b)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: bugfix

This adds a new test case to the ksm functional tests to make sure that
the KSM setting is inherited by the child process when doing a fork/exec.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Stefan Roesch <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Carl Klemm <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 0374af1)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.10-rc1
category: bugfix

Patch series "mm/ksm: fix ksm exec support for prctl", v4.

commit 3c6f33b ("mm/ksm: support fork/exec for prctl") inherits
MMF_VM_MERGE_ANY flag when a task calls execve().  However, it doesn't
create the mm_slot, so ksmd will not try to scan this task.  The first
patch fixes the issue.

The second patch refactors to prepare for the third patch.  The third
patch extends the selftests of ksm to verfity the deduplication really
happens after fork/exec inherits ths KSM setting.

This patch (of 3):

commit 3c6f33b ("mm/ksm: support fork/exec for prctl") inherits
MMF_VM_MERGE_ANY flag when a task calls execve().  Howerver, it doesn't
create the mm_slot, so ksmd will not try to scan this task.

To fix it, allocate and add the mm_slot to ksm_mm_head in __bprm_mm_init()
when the mm has MMF_VM_MERGE_ANY flag.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 3c6f33b ("mm/ksm: support fork/exec for prctl")
Signed-off-by: Jinjiang Tu <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Stefan Roesch <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 3a9e567)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.10-rc1
category: misc

Since commit 3c6f33b ("mm/ksm: support fork/exec for prctl"), when a
child process is forked, the MMF_VM_MERGE_ANY flag will be inherited in
mm_init(). So, it's unnecessary to set the flag in ksm_fork().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jinjiang Tu <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Stefan Roesch <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 7edea4c)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.12-rc6
category: bugfix

There is no reason to invoke these hooks early against an mm that is in an
incomplete state.

The change in commit d240629 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.

Their placement early in dup_mmap() only appears to have been meaningful
for early error checking, and since functionally it'd require a very small
allocation to fail (in practice 'too small to fail') that'd only occur in
the most dire circumstances, meaning the fork would fail or be OOM'd in
any case.

Since both khugepaged and KSM tracking are there to provide optimisations
to memory performance rather than critical functionality, it doesn't
really matter all that much if, under such dire memory pressure, we fail
to register an mm with these.

As a result, we follow the example of commit d2081b2 ("mm:
khugepaged: make khugepaged_enter() void function") and make ksm_fork() a
void function also.

We only expose the mm to these functions once we are done with them and
only if no error occurred in the fork operation.

Link: https://lkml.kernel.org/r/e0cb8b840c9d1d5a6e84d4f8eff5f3f2022aa10c.1729014377.git.lorenzo.stoakes@oracle.com
Fixes: d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <[email protected]>
Reported-by: Jann Horn <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Reviewed-by: Jann Horn <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 985da55)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.13-rc4
category: bugfix

If dup_mmap() encounters an issue, currently uprobe is able to access the
relevant mm via the reverse mapping (in build_map_info()), and if we are
very unlucky with a race window, observe invalid XA_ZERO_ENTRY state which
we establish as part of the fork error path.

This occurs because uprobe_write_opcode() invokes anon_vma_prepare() which
in turn invokes find_mergeable_anon_vma() that uses a VMA iterator,
invoking vma_iter_load() which uses the advanced maple tree API and thus
is able to observe XA_ZERO_ENTRY entries added to dup_mmap() in commit
d240629 ("fork: use __mt_dup() to duplicate maple tree in
dup_mmap()").

This change was made on the assumption that only process tear-down code
would actually observe (and make use of) these values.  However this very
unlikely but still possible edge case with uprobes exists and
unfortunately does make these observable.

The uprobe operation prevents races against the dup_mmap() operation via
the dup_mmap_sem semaphore, which is acquired via uprobe_start_dup_mmap()
and dropped via uprobe_end_dup_mmap(), and held across
register_for_each_vma() prior to invoking build_map_info() which does the
reverse mapping lookup.

Currently these are acquired and dropped within dup_mmap(), which exposes
the race window prior to error handling in the invoking dup_mm() which
tears down the mm.

We can avoid all this by just moving the invocation of
uprobe_start_dup_mmap() and uprobe_end_dup_mmap() up a level to dup_mm()
and only release this lock once the dup_mmap() operation succeeds or clean
up is done.

This means that the uprobe code can never observe an incompletely
constructed mm and resolves the issue in this case.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <[email protected]>
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]/
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peng Zhang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 8ac662f)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.14-rc1
category: bugfix

If a memory allocation fails during dup_mmap(), the maple tree can be left
in an unsafe state for other iterators besides the exit path.  All the
locks are dropped before the exit_mmap() call (in mm/mmap.c), but the
incomplete mm_struct can be reached through (at least) the rmap finding
the vmas which have a pointer back to the mm_struct.

Up to this point, there have been no issues with being able to find an
mm_struct that was only partially initialised.  Syzbot was able to make
the incomplete mm_struct fail with recent forking changes, so it has been
proven unsafe to use the mm_struct that hasn't been initialised, as
referenced in the link below.

Although 8ac662f ("fork: avoid inappropriate uprobe access to
invalid mm") fixed the uprobe access, it does not completely remove the
race.

This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the
oom side (even though this is extremely unlikely to be selected as an oom
victim in the race window), and sets MMF_UNSTABLE to avoid other potential
users from using a partially initialised mm_struct.

When registering vmas for uprobe, skip the vmas in an mm that is marked
unstable.  Modifying a vma in an unstable mm may cause issues if the mm
isn't fully initialised.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Liam R. Howlett <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Peng Zhang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 64c37e1)
Signed-off-by: Wentao Guan <[email protected]>
Conflicts:
	kernel/events/uprobes.c
@opsiff opsiff force-pushed the linux-6.6.y-2025-03-19-maple-tree branch from 497b83e to 08dcc40 Compare March 24, 2025 10:06
@opsiff
Copy link
Member Author

opsiff commented Mar 24, 2025

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @opsiff - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a comment describing the locking strategy for mtree_dup.
  • The compare_tree function in tools/testing/radix-tree/maple.c could be improved by using a MA_STATE for both trees simultaneously, instead of calling mas_dfs_preorder repeatedly.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

* Compares two nodes except for the addresses stored in the nodes.
* Returns zero if they are the same, otherwise returns non-zero.
*/
static int __init compare_node(struct maple_enode *enode_a,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring compare_node and build_full_tree to use early returns and clearer loop constructs instead of gotos to improve readability and reduce complexity.

In several functions (particularly in `compare_node` and `build_full_tree`), the complexity is increased by nested conditions and nonlocal jumps with `goto`. Consider refactoring those sections to use early returns or clearer loop constructs. For example:

**In `compare_node`:**
Replace the use of `goto cmp` with an early exit for leaf nodes. For example, change
```c
if (ma_is_leaf(type))
    goto cmp;

to:

if (ma_is_leaf(type))
    return memcmp(&a, &b, sizeof(struct maple_node));

and then do the rest of the comparisons normally. This flattens the control flow.

In build_full_tree:
You can refactor the inner while (1) and goto store/goto unlock by using flags or breaking out of the loops. For example, refactor like this:

static __init int build_full_tree(struct maple_tree *mt, unsigned int flags, int height)
{
    MA_STATE(mas, mt, 0, 0);
    unsigned long step;
    int ret = 0, cnt = 1;
    enum maple_type type;

    mt_init_flags(mt, flags);
    mtree_insert_range(mt, 0, ULONG_MAX, xa_mk_value(5), GFP_KERNEL);
    mtree_lock(mt);

    while (1) {
        mas_set(&mas, 0);

        if (mt_height(mt) < height) {
            mas.max = ULONG_MAX;
            /* Process store step */
        } else {
            bool found = false;
            while (1) {
                mas_dfs_preorder(&mas);
                if (mas_is_none(&mas)) {
                    ret = 0;
                    goto unlock;
                }
                type = mte_node_type(mas.node);
                if (mas_data_end(&mas) + 1 < mt_slots[type]) {
                    mas_set(&mas, mas.min);
                    found = true;
                    break;
                }
            }
            if (!found)
                break;  /* terminate if processing is done */
        }

        mas_subtree_max_range(&mas);
        step = mas.last - mas.index;
        if (step < 1) {
            ret = -1;
            break;
        }
        step /= 2;
        mas.last = mas.index + step;
        mas_store_gfp(&mas, xa_mk_value(5), GFP_KERNEL);
        ++cnt;
    }
unlock:
    mtree_unlock(mt);
    MT_BUG_ON(mt, mt_height(mt) != height);
    return ret;
}

This refactoring uses a flag (found) and break statements rather than interleaving goto labels, which helps to document the different stages more clearly.

These small changes can greatly improve readability while preserving all the functionality.

@opsiff opsiff merged commit 7246256 into deepin-community:linux-6.6.y Mar 24, 2025
5 of 7 checks passed
@opsiff
Copy link
Member Author

opsiff commented Mar 24, 2025

/test-ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants