Skip to content

Conversation

@Avenger-285714
Copy link
Member

Update kernel base to 6.6.80.

Handle ("perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF") because of b3b443c ("perf/x86/intel: Add common intel_pmu_init_hybrid").

@sourcery-ai
Copy link

sourcery-ai bot commented Mar 5, 2025

Reviewer's Guide by Sourcery

The pull request updates the kernel base to version 6.6.80 and includes various changes across multiple subsystems, including file system, drivers, and architecture-specific code. Key changes involve updates to the XFS file system, improvements in device drivers, and enhancements in architecture-specific implementations.

Sequence diagram for md_check_recovery

sequenceDiagram
  participant md_check_recovery
  participant mddev
  participant md_bitmap_write_all
  participant queue_work
  participant md_misc_wq
  participant md_start_sync

  md_check_recovery->>mddev: md_check_recovery(mddev)
  activate md_check_recovery
  alt bitmap && need_write_bitmap
    md_check_recovery->>md_bitmap_write_all: md_bitmap_write_all(mddev->bitmap)
    activate md_bitmap_write_all
    md_bitmap_write_all-->>md_check_recovery: return
    deactivate md_bitmap_write_all
  end
  md_check_recovery->>queue_work: queue_work(md_misc_wq, &mddev->sync_work)
  activate queue_work
  queue_work->>md_misc_wq: Queue mddev->sync_work
  deactivate queue_work
  md_misc_wq->>md_start_sync: md_start_sync(mddev)
  activate md_start_sync
  md_start_sync-->>md_misc_wq: return
  deactivate md_start_sync
  md_check_recovery-->>mddev: return
  deactivate md_check_recovery
Loading

File-Level Changes

Change Details Files
Update XFS file system to handle attribute operations and inode management.
  • Removed unused function xfs_attr_leaf_try_add.
  • Added functions xfs_attr_save_rmt_blk and xfs_attr_restore_rmt_blk for managing remote block information.
  • Modified xfs_attr_leaf_addname to improve attribute addition logic.
  • Updated xfs_attr3_leaf_add to return a boolean indicating success or need for further action.
fs/xfs/libxfs/xfs_attr.c
fs/xfs/libxfs/xfs_attr_leaf.c
fs/xfs/libxfs/xfs_attr_leaf.h
Enhance md (multiple device) driver for better synchronization and management.
  • Introduced __mddev_put function for better resource management.
  • Added new work_struct sync_work for managing synchronization tasks.
  • Updated mddev_put to use __mddev_put for cleanup.
drivers/md/md.c
drivers/md/md.h
Implement changes in the Bluetooth driver for improved firmware handling.
  • Added qca_filename_has_extension function to check file extensions.
  • Implemented qca_get_alt_nvm_file to handle alternative NVM file loading.
  • Updated qca_download_firmware to retry with default firmware if board-specific file is missing.
drivers/bluetooth/btqca.c
Refactor and optimize TCP and network-related functionalities.
  • Introduced tcp_cleanup_skb and tcp_add_receive_queue for better skb management.
  • Refactored tcp_read_sock to separate logic for acknowledgment handling.
  • Enhanced strparser to support custom read_sock callback.
net/core/flow_dissector.c
net/core/skmsg.c
net/ipv4/tcp.c
net/ipv4/tcp_input.c
net/strparser/strparser.c
Add new PHY drivers for CIX USB and USBDP.
  • Added new driver files for CIX USB2, USB3, and USBDP PHYs.
  • Implemented initialization and configuration logic for PHYs.
  • Integrated with typec switch and mux for orientation and mode management.
drivers/phy/cix/phy-cix-usb2.c
drivers/phy/cix/phy-cix-usb3.c
drivers/phy/cix/phy-cix-usbdp.c
drivers/phy/cix/phy-cix-usbdp.h

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Avenger-285714 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • The diff contains a mix of bug fixes and new features, so it might be helpful to categorize the changes for easier review.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 4 issues found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

return rc;
}

static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting common parts of the flush and stats-update logic into helper functions to reduce multiple branches and duplicated counter updates without changing functionality, such as consolidating the flush logic by splitting the hcall trigger and error handling into a helper that’s parameterized by the flush mode and extracting the tx_stats updates into one helper function with an enum for the tx_type.

Consider extracting the common parts of the flush and stats‐update logic into helper functions. This can reduce the multiple branches and duplicated counter updates without changing functionality. For example, you might extract the tx_stats updates into one helper:

```c
enum tx_type {
    TX_INDIRECT,
    TX_DIRECT
};

static inline void update_tx_stats(struct ibmvnic_adapter *adapter, int queue_num,
                                   enum tx_type type, unsigned int packets, unsigned int bytes)
{
    adapter->tx_stats_buffers[queue_num].bytes += bytes;
    if (type == TX_DIRECT)
        adapter->tx_stats_buffers[queue_num].direct_packets += packets;
    else
        adapter->tx_stats_buffers[queue_num].batched_packets += packets;
}

Then within ibmvnic_xmit update counters with a single call:

// Replace separate tx_dpackets/tx_bpackets updates with:
update_tx_stats(adapter, queue_num, /* TX_DIRECT or TX_INDIRECT */ selected_type, tx_packets, skblen);

Similarly, consider consolidating the flush logic by splitting the hcall trigger and error handling into a helper that’s parameterized by the flush mode. This refactoring centralizes the routing decisions and avoids duplicated checks.

static int flush_subcrq(struct ibmvnic_adapter *adapter,
                        struct ibmvnic_sub_crq_queue *tx_scrq, bool indirect)
{
    int rc;
    if (indirect)
        rc = send_subcrq_indirect(adapter, tx_scrq->handle,
                                  (u64)tx_scrq->ind_buf.indir_dma,
                                  (u64)tx_scrq->ind_buf.index);
    else
        rc = send_subcrq_direct(adapter, tx_scrq->handle,
                                (u64 *)tx_scrq->ind_buf.indir_arr);

    if (rc)
        ibmvnic_tx_scrq_clean_buffer(adapter, tx_scrq);
    else
        tx_scrq->ind_buf.index = 0;

    return rc;
}

Then update callers to use flush_subcrq so that the flush path is unified.

These focused refactorings help keep the functionality while reducing branching complexity and duplication.

return !strchr(suffix, '/');
}

static bool qca_get_alt_nvm_file(char *filename, size_t max_size)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider simplifying the alternate NVM file name generation by merging the extension check and replacement into one function using the “%.*s” formatter to avoid extra helper functions and temporary buffers .

Consider simplifying the alternate NVM file name generation in
`qca_get_alt_nvm_file()` by removing the extra helper and temporary buffer.
For example, you can merge the extension check and replacement into one
function using the “%.*s” formatter:

static bool qca_get_alt_nvm_file(char *filename, size_t max_size)
{
    char *dot = strrchr(filename, '.');
    if (!dot || dot == filename || !*(dot + 1))
        return false;
    /* Return if already using the default ".bin" extension */
    if (strcmp(dot, ".bin") == 0)
        return false;
    snprintf(filename, max_size, "%.*s.bin", (int)(dot - filename), filename);
    return true;
}

This reduces the number of helper functions and temporary buffers while keeping
all functionality intact.

* Returns 0 if the entry was added, 1 if a further split is needed or a
* negative error number otherwise.
*/
int
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider using error codes and an out parameter to signal whether a split is still needed, instead of mixing boolean return values with error codes in functions like xfs_attr3_leaf_add and its callers, to improve clarity and maintainability of the code.

Consider unifying the error‐reporting style in these routines. In several functions (like `xfs_attr3_leaf_add` and its caller wrappers) we now mix returning a boolean (to signal “entry added” vs. “needs further split”) with error codes, which makes the caller logic non‐obvious.

One approach is to always return an error code and use an out parameter for any additional signal (for example, whether a split is still needed). For instance:

```c
/* New signature with an extra "added" out parameter */
int xfs_attr3_leaf_add(struct xfs_buf *bp, struct xfs_da_args *args, bool *added) {
    struct xfs_attr_leafblock *leaf = bp->b_addr;
    struct xfs_attr3_icleaf_hdr ichdr;
    int entsize, error;

    trace_xfs_attr_leaf_add(args);
    xfs_attr3_leaf_hdr_from_disk(args->geo, &ichdr, leaf);
    ASSERT(args->index >= 0 && args->index <= ichdr.count);
    entsize = xfs_attr_leaf_newentsize(args, NULL);

    // Compute space etc.
    if (/* not enough space */) {
        *added = false;
        return -ENOSPC;
    }

    error = xfs_attr3_leaf_add_work(bp, &ichdr, args, 0);
    if (error)
        return error;

    xfs_attr3_leaf_hdr_to_disk(args->geo, leaf, &ichdr);
    xfs_trans_log_buf(args->trans, bp,
        XFS_DA_LOGRANGE(leaf, &leaf->hdr, xfs_attr3_leaf_hdr_size(leaf)));

    /* Use the out-param to signal whether the entry was added or we need a further split */
    *added = /* condition depending on available free space */;
    return 0;
}

Callers then check the error code first and use the out parameter to detect if the split is still required:

bool added;
error = xfs_attr3_leaf_add(bp, args, &added);
if (error)
    return error;
if (!added)
    return 1;  // or a defined constant indicating a further split is needed

This approach keeps error codes and success signals separate and avoids mixing booleans with negative error numbers. Adjust similar functions (e.g. xfs_attr3_leaf_add_work) accordingly to maintain consistency while preserving all functionality.

return 0;
}

static int udphy_dp_phy_set_voltages(struct cix_udphy *udphy,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider creating a helper function to centralize the register write/update sequences, reducing code duplication and improving readability by accepting parameters for the target lane's register base and looping over the lanes to select the appropriate lane numbers by checking the flip flag, which preserves functionality while improving maintainability and reducing redundancy..

You can reduce duplication by centralizing the nearly identical register write/update sequences into a helper that accepts parameters for the target lane’s register base. For example, in the udphy_dp_phy_set_voltages() function, you could replace the repeated blocks with something like:

static void set_lane_voltage(struct cix_udphy *udphy, int lane,
                             const phy_voltage_conf_t *cfg)
{
    /* Map lane to register offsets: adjust naming/index as needed */
    const unsigned int acya_reg[]  = { TX_DIAG_ACYA_LANE0, TX_DIAG_ACYA_LANE1,
                                       TX_DIAG_ACYA_LANE2, TX_DIAG_ACYA_LANE3 };
    const unsigned int txcc_ctrl[] = { TX_TXCC_CTRL_LANE0, TX_TXCC_CTRL_LANE1,
                                       TX_TXCC_CTRL_LANE2, TX_TXCC_CTRL_LANE3 };
    const unsigned int tx_drv[]    = { DRV_DIAG_TX_DRV_LANE0, DRV_DIAG_TX_DRV_LANE1,
                                       DRV_DIAG_TX_DRV_LANE2, DRV_DIAG_TX_DRV_LANE3 };
    const unsigned int txcc_mult[] = { TX_TXCC_MGNFS_MULT_000_LANE0, TX_TXCC_MGNFS_MULT_000_LANE1,
                                       TX_TXCC_MGNFS_MULT_000_LANE2, TX_TXCC_MGNFS_MULT_000_LANE3 };
    const unsigned int txcc_cpost[] = { TX_TXCC_CPOST_MULT_00_LANE0, TX_TXCC_CPOST_MULT_00_LANE1,
                                        TX_TXCC_CPOST_MULT_00_LANE2, TX_TXCC_CPOST_MULT_00_LANE3 };

    /* Trigger update, write values, then clear update */
    regmap_update_bits(udphy->phy_regmap, acya_reg[lane], BIT(0), 0x1);
    regmap_write(udphy->phy_regmap, txcc_ctrl[lane], cfg->tx_txcc_ctrl);
    regmap_write(udphy->phy_regmap, tx_drv[lane], cfg->drv_diag_tx_drv);
    regmap_write(udphy->phy_regmap, txcc_mult[lane], cfg->tx_txcc_mgnfc_mult_000);
    regmap_write(udphy->phy_regmap, txcc_cpost[lane], cfg->tx_txcc_cpost_mult_00);
    regmap_update_bits(udphy->phy_regmap, acya_reg[lane], BIT(0), 0x0);
}

Then, in your voltage setting function, you can loop over the lanes and select the appropriate lane numbers by checking the flip flag. For example:

static int udphy_dp_phy_set_voltages(struct cix_udphy *udphy,
                                     struct phy_configure_opts_dp *dp)
{
    u32 index = dp->voltage[0] * 4 + dp->pre[0];
    const phy_voltage_conf_t *cfg = &volt_cfg[index];

    switch (dp->lanes) {
    case 1:
        set_lane_voltage(udphy, udphy->flip ? 1 : 2, cfg);
        break;
    case 2:
        if (!udphy->flip) {
            set_lane_voltage(udphy, 2, cfg);
            set_lane_voltage(udphy, 3, cfg);
        } else {
            set_lane_voltage(udphy, 0, cfg);
            set_lane_voltage(udphy, 1, cfg);
        }
        break;
    case 4:
        for (int i = 0; i < 4; i++)
            set_lane_voltage(udphy, i, cfg);
        break;
    default:
        break;
    }
    return 0;
}

This approach reduces repetition and improves readability while preserving functionality. Adjust lane-to-register mapping as needed for your hardware.

ctmarinas and others added 25 commits March 5, 2025 16:40
PROT_MTE (memory tagging extensions) is not supported on all user mmap()
types for various reasons (memory attributes, backing storage, CoW
handling). The arm64 arch_validate_flags() function checks whether the
VM_MTE_ALLOWED flag has been set for a vma during mmap(), usually by
arch_calc_vm_flag_bits().

Linux prior to 6.13 does not support PROT_MTE hugetlb mappings. This was
added by commit 25c17c4 ("hugetlb: arm64: add mte support").
However, earlier kernels inadvertently set VM_MTE_ALLOWED on
(MAP_ANONYMOUS | MAP_HUGETLB) mappings by only checking for
MAP_ANONYMOUS.

Explicitly check MAP_HUGETLB in arch_calc_vm_flag_bits() and avoid
setting VM_MTE_ALLOWED for such mappings.

Fixes: 9f34193 ("arm64: mte: Add PROT_MTE support to mmap() and mprotect()")
Cc: <[email protected]> # 5.10.x-6.12.x
Reported-by: Naresh Kamboju <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 14cc006)
commit 6d2db12 upstream.

Protect against developers passing stupid limits when refactoring the
RT code once again.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit ed6282d)
commit 05aba19 upstream.

Actually use the inumber validator to check the argument passed in here.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 629e6a3)
commit de55149 upstream.

While refactoring code, I noticed that when xfs_iroot_realloc tries to
shrink a bmbt root block, it allocates a smaller new block and then
copies "records" and pointers to the new block.  However, bmbt root
blocks cannot ever be leaves, which means that it's not technically
correct to copy records.  We /should/ be copying keys.

Note that this has never resulted in actual memory corruption because
sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)).  However,
this will no longer be true when we start adding realtime rmap stuff,
so fix this now.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit a6790b5)
commit 77bfe1b upstream.

Fix a typo in comments.

Signed-off-by: Andrew Kreimer <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 3e2f7c2)
commit 90a71da upstream.

The background blockgc scanner runs on a 5m interval by default and
trims preallocation (post-eof and cow fork) from inodes that are
otherwise idle. Idle effectively means that iolock can be acquired
without blocking and that the inode has no dirty pagecache or I/O in
flight.

This simple mechanism and heuristic has worked fairly well for
post-eof speculative preallocations. Support for reflink and COW
fork preallocations came sometime later and plugged into the same
mechanism, with similar heuristics. Some recent testing has shown
that COW fork preallocation may be notably more sensitive to blockgc
processing than post-eof preallocation, however.

For example, consider an 8GB reflinked file with a COW extent size
hint of 1MB. A worst case fully randomized overwrite of this file
results in ~8k extents of an average size of ~1MB. If the same
workload is interrupted a couple times for blockgc processing
(assuming the file goes idle), the resulting extent count explodes
to over 100k extents with an average size <100kB. This is
significantly worse than ideal and essentially defeats the COW
extent size hint mechanism.

While this particular test is instrumented, it reflects a fairly
reasonable pattern in practice where random I/Os might spread out
over a large period of time with varying periods of (in)activity.
For example, consider a cloned disk image file for a VM or container
with long uptime and variable and bursty usage. A background blockgc
scan that races and processes the image file when it happens to be
clean and idle can have a significant effect on the future
fragmentation level of the file, even when still in use.

To help combat this, update the heuristic to skip cowblocks inodes
that are currently opened for write access during non-sync blockgc
scans. This allows COW fork preallocations to persist for as long as
possible unless otherwise needed for functional purposes (i.e. a
sync scan), the file is idle and closed, or the inode is being
evicted from cache. While here, update the comments to help
distinguish performance oriented heuristics from the logic that
exists to maintain functional correctness.

Suggested-by: Darrick Wong <[email protected]>
Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit f56db9c)
commit 4390f01 upstream.

fallocate unshare mode explicitly breaks extent sharing. When a
command completes, it checks the data fork for any remaining shared
extents to determine whether the reflink inode flag and COW fork
preallocation can be removed. This logic doesn't consider in-core
pagecache and I/O state, however, which means we can unsafely remove
COW fork blocks that are still needed under certain conditions.

For example, consider the following command sequence:

xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \
	-c "pwrite 0 32k" -c "funshare 0 1k" <file>

This allocates a data block at offset 0, shares it, and then
overwrites it with a larger buffered write. The overwrite triggers
COW fork preallocation, 32 blocks by default, which maps the entire
32k write to delalloc in the COW fork. All but the shared block at
offset 0 remains hole mapped in the data fork. The unshare command
redirties and flushes the folio at offset 0, removing the only
shared extent from the inode. Since the inode no longer maps shared
extents, unshare purges the COW fork before the remaining 28k may
have written back.

This leaves dirty pagecache backed by holes, which writeback quietly
skips, thus leaving clean, non-zeroed pagecache over holes in the
file. To verify, fiemap shows holes in the first 32k of the file and
reads return different data across a remount:

$ xfs_io -c "fiemap -v" <file>
<file>:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   ...
   1: [8..511]:        hole               504
   ...
$ xfs_io -c "pread -v 4k 8" <file>
00001000:  cd cd cd cd cd cd cd cd  ........
$ umount <mnt>; mount <dev> <mnt>
$ xfs_io -c "pread -v 4k 8" <file>
00001000:  00 00 00 00 00 00 00 00  ........

To avoid this problem, make unshare follow the same rules used for
background cowblock scanning and never purge the COW fork for inodes
with dirty pagecache or in-flight I/O.

Fixes: 46afb06 ("xfs: only flush the unshared range in xfs_reflink_unshare")
Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 7b5b119)
commit b1c649d upstream.

[backport: dependency of a5f7334 and b3f4e84]

xfs_attr_leaf_try_add is only called by xfs_attr_leaf_addname, and
merging the two will simplify a following error handling fix.

To facilitate this move the remote block state save/restore helpers up in
the file so that they don't need forward declarations now.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 3d58507)
commit 346c1d4 upstream.

[backport: dependency of a5f7334 and b3f4e84]

xfs_attr3_leaf_add only has two potential return values, indicating if the
entry could be added or not.  Replace the errno return with a bool so that
ENOSPC from it can't easily be confused with a real ENOSPC.

Remove the return value from the xfs_attr3_leaf_add_work helper entirely,
as it always return 0.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 702e1ac)
commit a5f7334 upstream.

xfs_attr3_leaf_split propagates the need for an extra btree split as
-ENOSPC to it's only caller, but the same return value can also be
returned from xfs_da_grow_inode when it fails to find free space.

Distinguish the two cases by returning 1 for the extra split case instead
of overloading -ENOSPC.

This can be triggered relatively easily with the pending realtime group
support and a file system with a lot of small zones that use metadata
space on the main device.  In this case every about 5-10th run of
xfs/538 runs into the following assert:

	ASSERT(oldblk->magic == XFS_ATTR_LEAF_MAGIC);

in xfs_attr3_leaf_split caused by an allocation failure.  Note that
the allocation failure is caused by another bug that will be fixed
subsequently, but this commit at least sorts out the error handling.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 512a911)
…addname

commit b3f4e84 upstream.

Just like xfs_attr3_leaf_split, xfs_attr_node_try_addname can return
-ENOSPC both for an actual failure to allocate a disk block, but also
to signal the caller to convert the format of the attr fork.  Use magic
1 to ask for the conversion here as well.

Note that unlike the similar issue in xfs_attr3_leaf_split, this one was
only found by code review.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit f37a5f0)
commit 865469c upstream.

[backport: dependency of 6aac770]

Userdata and metadata allocations end up in the same allocation helpers.
Remove the separate xfs_bmap_alloc_userdata function to make this more
clear.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 41e7f8f)
commit b611fdd upstream.

Exact minlen allocations only exist as an error injection tool for debug
builds.  Currently this is implemented using ifdefs, which means the code
isn't even compiled for non-XFS_DEBUG builds.  Enhance the compile test
coverage by always building the code and use the compilers' dead code
elimination to remove it from the generated binary instead.

The only downside is that the alloc_minlen_only field is unconditionally
added to struct xfs_alloc_args now, but by moving it around and packing
it tightly this doesn't actually increase the size of the structure.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 479e112)
commit 405ee87 upstream.

[backport: dependency of 6aac770]

xfs_bmap_exact_minlen_extent_alloc duplicates the args setup in
xfs_bmap_btalloc.  Switch to call it from xfs_bmap_btalloc after
doing the basic setup.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit a8a80b7)
commit 6aac770 upstream.

Currently the debug-only xfs_bmap_exact_minlen_extent_alloc allocation
variant fails to drop into the lowmode last resort allocator, and
thus can sometimes fail allocations for which the caller has a
transaction block reservation.

Fix this by using xfs_bmap_btalloc_low_space to do the actual allocation.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 9716ff8)
commit 20195d0 upstream.

Use !try_cmpxchg instead of cmpxchg (*ptr, old, new) != old in
xlog_cil_insert_pcp_aggregate().  x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg.

Also, try_cmpxchg implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.

Note that the value from *ptr should be read using READ_ONCE to
prevent the compiler from merging, refetching or reordering the read.

No functional change intended.

Signed-off-by: Uros Bizjak <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Chandan Babu R <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit b5d917a)
commit f6225ee upstream.

The definition of xfs_attr_use_log_assist() has been removed since
commit d9c61cc ("xfs: move xfs_attr_use_log_assist out of xfs_log.c").
So, Remove the empty declartion in header files.

Signed-off-by: Zhang Zekun <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 5a9e3db)
commit 82742f8 upstream.

[backport: dependency of 6a18765]

Currently only the new agcount is passed to xfs_initialize_perag, which
requires lookups of existing AGs to skip them and complicates error
handling.  Also pass the previous agcount so that the range that
xfs_initialize_perag operates on is exactly defined.  That way the
extra lookups can be avoided, and error handling can clean up the
exact range from the old count to the last added perag structure.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit bb305f8)
…fers

commit 6a18765 upstream.

Primary superblock buffers that change the file system geometry after a
growfs operation can affect the operation of later CIL checkpoints that
make use of the newly added space and allocation groups.

Apply the changes to the in-memory structures as part of recovery pass 2,
to ensure recovery works fine for such cases.

In the future we should apply the logic to other updates such as features
bits as well.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit a9c1eba)
commit b882b0f upstream.

XFS currently does not support reducing the agcount, so error out if
a logged sb buffer tries to shrink the agcount.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 5a9f827)
commit 069cf5e upstream.

[backport: uses kmem_zalloc instead of kzalloc]

__GFP_RETRY_MAYFAIL increases the likelyhood of allocations to fail,
which isn't really helpful during log recovery.  Remove the flag and
stick to the default GFP_KERNEL policies.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 7a2c246)
commit 4a201dc upstream.

Currently log recovery never updates the in-core perag values for the
last allocation group when they were grown by growfs.  This leads to
btree record validation failures for the alloc, ialloc or finotbt
trees if a transaction references this new space.

Found by Brian's new growfs recovery stress test.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit c904df6)
commit 3ef2268 upstream.

Recently, we found that the CPU spent a lot of time in
xfs_alloc_ag_vextent_size when the filesystem has millions of fragmented
spaces.

The reason is that we conducted much extra searching for extents that
could not yield a better result, and these searches would cost a lot of
time when there were millions of extents to search through. Even if we
get the same result length, we don't switch our choice to the new one,
so we can definitely terminate the search early.

Since the result length cannot exceed the found length, when the found
length equals the best result length we already have, we can conclude
the search.

We did a test in that filesystem:
[root@localhost ~]# xfs_db -c freesp /dev/vdb
   from      to extents  blocks    pct
      1       1     215     215   0.01
      2       3  994476 1988952  99.99

Before this patch:
 0)               |  xfs_alloc_ag_vextent_size [xfs]() {
 0) * 15597.94 us |  }

After this patch:
 0)               |  xfs_alloc_ag_vextent_size [xfs]() {
 0)   19.176 us    |  }

Signed-off-by: Chi Zhiling <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 1fe5c2a)
commit 81a1e1c upstream.

Directly return the error from xfs_bmap_longest_free_extent instead
of breaking from the loop and handling it there, and use a done
label to directly jump to the exist when we found a suitable perag
structure to reduce the indentation level and pag/max_pag check
complexity in the tail of the function.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 067ee59)
commit 2a492ff upstream.

Extsize should only be allowed to be set on files with no data in it.
For this, we check if the files have extents but miss to check if
delayed extents are present. This patch adds that check.

While we are at it, also refactor this check into a helper since
it's used in some other places as well like xfs_inactive() or
xfs_ioctl_setattr_xflags()

**Without the patch (SUCCEEDS)**

$ xfs_io -c 'open -f testfile' -c 'pwrite 0 1024' -c 'extsize 65536'

wrote 1024/1024 bytes at offset 0
1 KiB, 1 ops; 0.0002 sec (4.628 MiB/sec and 4739.3365 ops/sec)

**With the patch (FAILS as expected)**

$ xfs_io -c 'open -f testfile' -c 'pwrite 0 1024' -c 'extsize 65536'

wrote 1024/1024 bytes at offset 0
1 KiB, 1 ops; 0.0002 sec (4.628 MiB/sec and 4739.3365 ops/sec)
xfs_io: FS_IOC_FSSETXATTR testfile: Invalid argument

Fixes: e94af02 ("[XFS] fix old xfs_setattr mis-merge from irix; mostly harmless esp if not using xfs rt")
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: John Garry <[email protected]>
Signed-off-by: Ojaswin Mujoo <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Catherine Hoang <[email protected]>
Acked-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit b887d2f)
Haoxiang Li and others added 26 commits March 5, 2025 16:40
commit 878e7b1 upstream.

Add check for the return value of nfp_app_ctrl_msg_alloc() in
nfp_bpf_cmsg_alloc() to prevent null pointer dereference.

Fixes: ff3d43f ("nfp: bpf: implement helpers for FW map ops")
Cc: [email protected]
Signed-off-by: Haoxiang Li <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 29ccb1e)
commit d8d99c3 upstream.

The nullity of sps->cstream should be checked similarly as it is done in
sof_set_stream_data_offset() function.
Assuming that it is not NULL if sps->stream is NULL is incorrect and can
lead to NULL pointer dereference.

Fixes: 090349a ("ASoC: SOF: Add support for compress API for stream data/offset")
Cc: [email protected]
Reported-by: Curtis Malainey <[email protected]>
Closes: thesofproject/linux#5214
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Daniel Baluta <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Reviewed-by: Curtis Malainey <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 2b3878b)
commit a8c9a45 upstream.

If 'micfil->quality' received from micfil_quality_set() somehow ends
up with an unpredictable value, switch() operator will fail to
initialize local variable qsel before regmap_update_bits() tries
to utilize it.

While it is unlikely, play it safe and enable a default case that
returns -EINVAL error.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: bea1d61 ("ASoC: fsl_micfil: rework quality setting")
Cc: [email protected]
Signed-off-by: Nikita Zhandarovich <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit afa500d)
…dig_out_ctls()

commit 822b7ec upstream.

Check the return value of snd_ctl_rename_id() in
snd_hda_create_dig_out_ctls(). Ensure that failures
are properly handled.

[ Note: the error cannot happen practically because the only error
  condition in snd_ctl_rename_id() is the missing ID, but this is a
  rename, hence it must be present.  But for the code consistency,
  it's safer to have always the proper return check -- tiwai ]

Fixes: 5c219a3 ("ALSA: hda: Fix kctl->id initialization")
Cc: [email protected] # 6.4+
Signed-off-by: Wentao Liang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit edcb866)
commit 6d1f866 upstream.

Allows the LED on the dedicated mute button on the HP ProBook 450 G4
laptop to change colour correctly.

Signed-off-by: John Veness <[email protected]>
Cc: <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 868f622)
commit 46c7b90 upstream.

The spcm->stream[substream->stream].substream is set during open and was
left untouched. After the first PCM stream it will never be NULL and we
have code which checks for substream NULLity as indication if the stream is
active or not.
For the compressed cstream pointer the same has been done, this change will
correct the handling of PCM streams.

Fixes: 090349a ("ASoC: SOF: Add support for compress API for stream data/offset")
Cc: [email protected]
Reported-by: Curtis Malainey <[email protected]>
Closes: thesofproject/linux#5214
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Daniel Baluta <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Reviewed-by: Curtis Malainey <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit f69d2cd)
commit 56d5f3e upstream.

In [1] it was reported that the acct(2) system call can be used to
trigger NULL deref in cases where it is set to write to a file that
triggers an internal lookup. This can e.g., happen when pointing acc(2)
to /sys/power/resume. At the point the where the write to this file
happens the calling task has already exited and called exit_fs(). A
lookup will thus trigger a NULL-deref when accessing current->fs.

Reorganize the code so that the the final write happens from the
workqueue but with the caller's credentials. This preserves the
(strange) permission model and has almost no regression risk.

This api should stop to exist though.

Link: https://lore.kernel.org/r/[email protected] [1]
Link: https://lore.kernel.org/r/[email protected]
Fixes: 1da177e ("Linux-2.6.12-rc2")
Reported-by: Zicheng Qu <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 5c928e1)
commit 890ed45 upstream.

There's no point in allowing anything kernel internal nor procfs or
sysfs.

Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Fixes: 1da177e ("Linux-2.6.12-rc2")
Reviewed-by: Amir Goldstein <[email protected]>
Reported-by: Zicheng Qu <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 28d23f1)
…ment

commit 2ede647 upstream.

Add a sanity check to madvise_dontneed_free() to address a corner case in
madvise where a race condition causes the current vma being processed to
be backed by a different page size.

During a madvise(MADV_DONTNEED) call on a memory region registered with a
userfaultfd, there's a period of time where the process mm lock is
temporarily released in order to send a UFFD_EVENT_REMOVE and let
userspace handle the event.  During this time, the vma covering the
current address range may change due to an explicit mmap done concurrently
by another thread.

If, after that change, the memory region, which was originally backed by
4KB pages, is now backed by hugepages, the end address is rounded down to
a hugepage boundary to avoid data loss (see "Fixes" below).  This rounding
may cause the end address to be truncated to the same address as the
start.

Make this corner case follow the same semantics as in other similar cases
where the requested region has zero length (ie.  return 0).

This will make madvise_walk_vmas() continue to the next vma in the range
(this time holding the process mm lock) which, due to the prev pointer
becoming stale because of the vma change, will be the same hugepage-backed
vma that was just checked before.  The next time madvise_dontneed_free()
runs for this vma, if the start address isn't aligned to a hugepage
boundary, it'll return -EINVAL, which is also in line with the madvise
api.

From userspace perspective, madvise() will return EINVAL because the start
address isn't aligned according to the new vma alignment requirements
(hugepage), even though it was correctly page-aligned when the call was
issued.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8ebe0a5 ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs")
Signed-off-by: Ricardo Cañuelo Navarro <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Florent Revest <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 91f0e57)
commit 2b9df00 upstream.

Replace dma_request_channel() with dma_request_chan_by_mask() and use
helper functions to return proper error code instead of fixed -EBUSY.

Fixes: ec4ba01 ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem")
Cc: [email protected]
Signed-off-by: Niravkumar L Rabara <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit fcae111)
commit d76d22b upstream.

Remap the slave DMA I/O resources to enhance driver portability.
Using a physical address causes DMA translation failure when the
ARM SMMU is enabled.

Fixes: ec4ba01 ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem")
Cc: [email protected]
Signed-off-by: Niravkumar L Rabara <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit ad93934)
commit f37d135 upstream.

dma_map_single is using physical/bus device (DMA) but dma_unmap_single
is using framework device(NAND controller), which is incorrect.
Fixed dma_unmap_single to use correct physical/bus device.

Fixes: ec4ba01 ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem")
Cc: [email protected]
Signed-off-by: Niravkumar L Rabara <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 8380ebc)
commit 860ca5e upstream.

Add check for the return value of cifs_buf_get() and cifs_small_buf_get()
in receive_encrypted_standard() to prevent null pointer dereference.

Fixes: eec04ea ("smb: client: fix OOB in receive_encrypted_standard()")
Cc: [email protected]
Signed-off-by: Haoxiang Li <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 9e5d99a)
commit c158647 upstream.

The previous implementation incorrectly configured the cmn_interrupt_2_enable
register for interrupt handling. Using cmn_interrupt_2_enable to configure
Tag, Data RAM ECC interrupts would lead to issues like double handling of the
interrupts (EL1 and EL3) as cmn_interrupt_2_enable is meant to be configured
for interrupts which needs to be handled by EL3.

EL1 LLCC EDAC driver needs to use cmn_interrupt_0_enable register to configure
Tag, Data RAM ECC interrupts instead of cmn_interrupt_2_enable.

Fixes: 2745065 ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Signed-off-by: Komal Bajaj <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit ae2661f)
commit 57b76be upstream.

The function tracer should record the preemption level at the point when
the function is invoked. If the tracing subsystem decrement the
preemption counter it needs to correct this before feeding the data into
the trace buffer. This was broken in the commit cited below while
shifting the preempt-disabled section.

Use tracing_gen_ctx_dec() which properly subtracts one from the
preemption counter on a preemptible kernel.

Cc: [email protected]
Cc: Wander Lairson Costa <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: ce5e480 ("ftrace: disable preemption when recursion locked")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Tested-by: Wander Lairson Costa <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit ac35a1d)
commit 8eb4b09 upstream.

Check if a function is already in the manager ops of a subops. A manager
ops contains multiple subops, and if two or more subops are tracing the
same function, the manager ops only needs a single entry in its hash.

Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: 4f554e9 ("ftrace: Add ftrace_set_filter_ips function")
Tested-by: Heiko Carstens <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 59bdc12)
commit 4dbc1d1 upstream.

When profile rollback fails in mlx5e_netdev_change_profile, the netdev
profile var is left set to NULL. Avoid a crash when unloading the driver
by not calling profile->cleanup in such a case.

This was encountered while testing, with the original trigger that
the wq rescuer thread creation got interrupted (presumably due to
Ctrl+C-ing modprobe), which gets converted to ENOMEM (-12) by
mlx5e_priv_init, the profile rollback also fails for the same reason
(signal still active) so the profile is left as NULL, leading to a crash
later in _mlx5e_remove.

 [  732.473932] mlx5_core 0000:08:00.1: E-Switch: Unload vfs: mode(OFFLOADS), nvfs(2), necvfs(0), active vports(2)
 [  734.525513] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
 [  734.557372] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
 [  734.559187] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: new profile init failed, -12
 [  734.560153] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
 [  734.589378] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
 [  734.591136] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12
 [  745.537492] BUG: kernel NULL pointer dereference, address: 0000000000000008
 [  745.538222] #PF: supervisor read access in kernel mode
<snipped>
 [  745.551290] Call Trace:
 [  745.551590]  <TASK>
 [  745.551866]  ? __die+0x20/0x60
 [  745.552218]  ? page_fault_oops+0x150/0x400
 [  745.555307]  ? exc_page_fault+0x79/0x240
 [  745.555729]  ? asm_exc_page_fault+0x22/0x30
 [  745.556166]  ? mlx5e_remove+0x6b/0xb0 [mlx5_core]
 [  745.556698]  auxiliary_bus_remove+0x18/0x30
 [  745.557134]  device_release_driver_internal+0x1df/0x240
 [  745.557654]  bus_remove_device+0xd7/0x140
 [  745.558075]  device_del+0x15b/0x3c0
 [  745.558456]  mlx5_rescan_drivers_locked.part.0+0xb1/0x2f0 [mlx5_core]
 [  745.559112]  mlx5_unregister_device+0x34/0x50 [mlx5_core]
 [  745.559686]  mlx5_uninit_one+0x46/0xf0 [mlx5_core]
 [  745.560203]  remove_one+0x4e/0xd0 [mlx5_core]
 [  745.560694]  pci_device_remove+0x39/0xa0
 [  745.561112]  device_release_driver_internal+0x1df/0x240
 [  745.561631]  driver_detach+0x47/0x90
 [  745.562022]  bus_remove_driver+0x84/0x100
 [  745.562444]  pci_unregister_driver+0x3b/0x90
 [  745.562890]  mlx5_cleanup+0xc/0x1b [mlx5_core]
 [  745.563415]  __x64_sys_delete_module+0x14d/0x2f0
 [  745.563886]  ? kmem_cache_free+0x1b0/0x460
 [  745.564313]  ? lockdep_hardirqs_on_prepare+0xe2/0x190
 [  745.564825]  do_syscall_64+0x6d/0x140
 [  745.565223]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [  745.565725] RIP: 0033:0x7f1579b1288b

Fixes: 3ef14e4 ("net/mlx5e: Separate between netdev objects and mlx5e profiles initialization")
Signed-off-by: Cosmin Ratiu <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Jianqi Ren <[email protected]>
Signed-off-by: He Zhe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit d6fe973)
commit f2d87a7 upstream.

Commit ac61978 ("md: use separate work_struct for md_start_sync()")
use a new sync_work to replace del_work, however, stop_sync_thread() and
__md_stop_writes() was trying to wait for sync_thread to be done, hence
they should switch to use sync_work as well.

Noted that md_start_sync() from sync_work will grab 'reconfig_mutex',
hence other contex can't held the same lock to flush work, and this will
be fixed in later patches.

Fixes: ac61978 ("md: use separate work_struct for md_start_sync()")
Signed-off-by: Yu Kuai <[email protected]>
Acked-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 4b79bee)
commit f9cfe7e upstream.

Commit cf1b6d4 ("md: simplify md_seq_ops") introduce following
regressions:

1) If list all_mddevs is emptly, personalities and unused devices won't
   be showed to user anymore.
2) If seq_file buffer overflowed from md_seq_show(), then md_seq_start()
   will be called again, hence personalities will be showed to user
   again.
3) If seq_file buffer overflowed from md_seq_stop(), seq_read_iter()
   doesn't handle this, hence unused devices won't be showed to user.

Fix above problems by printing personalities and unused devices in
md_seq_show().

Fixes: cf1b6d4 ("md: simplify md_seq_ops")
Cc: [email protected] # v6.7+
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 8fab939)
… plus lts

commit a6a7cba upstream.

In general the delay should be added by the PHY instead of the MAC,
and this improves network stability on some boards which seem to
need different delay.

Fixes: 387b3bb ("arm64: dts: rockchip: Add Xunlong OrangePi R1 Plus LTS")
Cc: [email protected] # 6.6+
Signed-off-by: Tianling Shen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>
[Fix conflicts due to missing dtsi conversion]
Signed-off-by: Tianling Shen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit be2778b)
commit 47a973f upstream.

The EAX of the CPUID Leaf 023H enumerates the mask of valid sub-leaves.
To tell the availability of the sub-leaf 1 (enumerate the counter mask),
perf should check the bit 1 (0x2) of EAS, rather than bit 0 (0x1).

The error is not user-visible on bare metal. Because the sub-leaf 0 and
the sub-leaf 1 are always available. However, it may bring issues in a
virtualization environment when a VMM only enumerates the sub-leaf 0.

Introduce the cpuid35_e?x to replace the macros, which makes the
implementation style consistent.

Fixes: eb467aa ("perf/x86/intel: Support Architectural PerfMon Extension leaf")
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
[ The patch is not exactly the same as the upstream patch. Because in the 6.6
  stable kernel, the umask2/eq enumeration is not supported. The number of
  counters is used rather than the counter mask. But the change is
  straightforward, which utilizes the structured union to replace the macros
  when parsing the CPUID enumeration. It also fixed a wrong macros. ]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit ad75c8e)
…_link

commit 584db20 upstream.

Patch series "nilfs2: Folio conversions for directory paths".

This series applies page->folio conversions to nilfs2 directory
operations.  This reduces hidden compound_head() calls and also converts
deprecated kmap calls to kmap_local in the directory code.

Although nilfs2 does not yet support large folios, Matthew has done his
best here to include support for large folios, which will be needed for
devices with large block sizes.

This series corresponds to the second half of the original post [1], but
with two complementary patches inserted at the beginning and some
adjustments, to prevent a kmap_local constraint violation found during
testing with highmem mapping.

[1] https://lkml.kernel.org/r/[email protected]

I have reviewed all changes and tested this for regular and small block
sizes, both on machines with and without highmem mapping.  No issues
found.

This patch (of 17):

In a few directory operations, the call to nilfs_put_page() for a page
obtained using nilfs_find_entry() or nilfs_dotdot() is hidden in
nilfs_set_link() and nilfs_delete_entry(), making it difficult to track
page release and preventing change of its call position.

By moving nilfs_put_page() out of these functions, this makes the page
get/put correspondence clearer and makes it easier to swap
nilfs_put_page() calls (and kunmap calls within them) when modifying
multiple directory entries simultaneously in nilfs_rename().

Also, update comments for nilfs_set_link() and nilfs_delete_entry() to
reflect changes in their behavior.

To make nilfs_put_page() visible from namei.c, this moves its definition
to nilfs.h and replaces existing equivalents to use it, but the exposure
of that definition is temporary and will be removed on a later kmap ->
kmap_local conversion.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Stable-dep-of: ee70999 ("nilfs2: handle errors that nilfs_prepare_chunk() may return")
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 944a4f8)
commit 8cf57c6 upstream.

In nilfs_rename(), calls to nilfs_put_page() to release pages obtained
with nilfs_find_entry() or nilfs_dotdot() are alternated in the normal
path.

When replacing the kernel memory mapping method from kmap to
kmap_local_{page,folio}, this violates the constraint on the calling order
of kunmap_local().

Swap the order of nilfs_put_page calls where the kmap sections of multiple
pages overlap so that they are nested, allowing direct replacement of
nilfs_put_page() -> unmap_and_put_page().

Without this reordering, that replacement will cause a kernel WARNING in
kunmap_local_indexed() on architectures with high memory mapping.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Stable-dep-of: ee70999 ("nilfs2: handle errors that nilfs_prepare_chunk() may return")
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 35dcb8a)
commit ee70999 upstream.

Patch series "nilfs2: fix issues with rename operations".

This series fixes BUG_ON check failures reported by syzbot around rename
operations, and a minor behavioral issue where the mtime of a child
directory changes when it is renamed instead of moved.

This patch (of 2):

The directory manipulation routines nilfs_set_link() and
nilfs_delete_entry() rewrite the directory entry in the folio/page
previously read by nilfs_find_entry(), so error handling is omitted on the
assumption that nilfs_prepare_chunk(), which prepares the buffer for
rewriting, will always succeed for these.  And if an error is returned, it
triggers the legacy BUG_ON() checks in each routine.

This assumption is wrong, as proven by syzbot: the buffer layer called by
nilfs_prepare_chunk() may call nilfs_get_block() if necessary, which may
fail due to metadata corruption or other reasons.  This has been there all
along, but improved sanity checks and error handling may have made it more
reproducible in fuzzing tests.

Fix this issue by adding missing error paths in nilfs_set_link(),
nilfs_delete_entry(), and their caller nilfs_rename().

[[email protected]: adjusted for page/folio conversion]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=32c3706ebf5d95046ea1
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=1097e95f134f37d9395c
Fixes: 2ba466d ("nilfs2: directory entry operations")
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 7891ac3)
commit 318e8c3 upstream.

In [1] the meaning of the synthetic IBPB flags has been redefined for a
better separation of concerns:
 - ENTRY_IBPB     -- issue IBPB on entry only
 - IBPB_ON_VMEXIT -- issue IBPB on VM-Exit only
and the Retbleed mitigations have been updated to match this new
semantics.

Commit [2] was merged shortly before [1], and their interaction was not
handled properly. This resulted in IBPB not being triggered on VM-Exit
in all SRSO mitigation configs requesting an IBPB there.

Specifically, an IBPB on VM-Exit is triggered only when
X86_FEATURE_IBPB_ON_VMEXIT is set. However:

 - X86_FEATURE_IBPB_ON_VMEXIT is not set for "spec_rstack_overflow=ibpb",
   because before [1] having X86_FEATURE_ENTRY_IBPB was enough. Hence,
   an IBPB is triggered on entry but the expected IBPB on VM-exit is
   not.

 - X86_FEATURE_IBPB_ON_VMEXIT is not set also when
   "spec_rstack_overflow=ibpb-vmexit" if X86_FEATURE_ENTRY_IBPB is
   already set.

   That's because before [1] this was effectively redundant. Hence, e.g.
   a "retbleed=ibpb spec_rstack_overflow=bpb-vmexit" config mistakenly
   reports the machine still vulnerable to SRSO, despite an IBPB being
   triggered both on entry and VM-Exit, because of the Retbleed selected
   mitigation config.

 - UNTRAIN_RET_VM won't still actually do anything unless
   CONFIG_MITIGATION_IBPB_ENTRY is set.

For "spec_rstack_overflow=ibpb", enable IBPB on both entry and VM-Exit
and clear X86_FEATURE_RSB_VMEXIT which is made superfluous by
X86_FEATURE_IBPB_ON_VMEXIT. This effectively makes this mitigation
option similar to the one for 'retbleed=ibpb', thus re-order the code
for the RETBLEED_MITIGATION_IBPB option to be less confusing by having
all features enabling before the disabling of the not needed ones.

For "spec_rstack_overflow=ibpb-vmexit", guard this mitigation setting
with CONFIG_MITIGATION_IBPB_ENTRY to ensure UNTRAIN_RET_VM sequence is
effectively compiled in. Drop instead the CONFIG_MITIGATION_SRSO guard,
since none of the SRSO compile cruft is required in this configuration.
Also, check only that the required microcode is present to effectively
enabled the IBPB on VM-Exit.

Finally, update the KConfig description for CONFIG_MITIGATION_IBPB_ENTRY
to list also all SRSO config settings enabled by this guard.

Fixes: 864bcaa ("x86/cpu/kvm: Provide UNTRAIN_RET_VM") [1]
Fixes: d893832 ("x86/srso: Add IBPB on VMEXIT") [2]
Reported-by: Yosry Ahmed <[email protected]>
Signed-off-by: Patrick Bellasi <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 60ba9b8)
Link: https://lore.kernel.org/r/[email protected]
Tested-by: FLorian Fainelli <[email protected]>
Tested-by: Peter Schneider <[email protected]>
Tested-by: Mark Brown <[email protected]>
Tested-by: Shuah Khan <[email protected]>
Tested-by: Jon Hunter <[email protected]>
Tested-by: Ron Economos <[email protected]>
Tested-by: Harshit Mogalapalli <[email protected]>
Tested-by: Linux Kernel Functional Testing <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 568e253)
@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: opsiff

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.