Skip to content

Drop codegen support of gather (but not takeAlongAxis)#5907

Merged
naoyam merged 4 commits intomainfrom
gather
Feb 4, 2026
Merged

Drop codegen support of gather (but not takeAlongAxis)#5907
naoyam merged 4 commits intomainfrom
gather

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Jan 31, 2026

Gather allows non-gathered indices to have smaller output dimensions, which complicates indexing and is not yet supported by TensorIndexer and is supported only by the legacy indexer. Note that takeAlongAxis, which is a limited case of gather, is supported.

The motivation is to remove the legacy indexer. This is the only remaining fallback case.

One way to support it is to decompose it into a takeAlongAxis and slice. For now, this PR disables codegen of gather and delegates to ExprEval.

Note that the cross-entropy benchmark does use gather rather than takeAlongAxis. There's a pending change needed in Thunder. See #3924 (comment). While this is a perf regression, at this point I think it'd more important to remove the large technical debt.

In a follow-up PR, I'll remove the legacy indexer. This PR just inserts an assertion that no fallback is necessary, which should be true by the scheduler changes.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 31, 2026

!test

@github-actions
Copy link

github-actions bot commented Jan 31, 2026

Review updated until commit 475f7b3

Description

  • Disable codegen support for gather operations while keeping takeAlongAxis support

  • Add runtime checks to reject non-exact gather operations during scheduling

  • Move gather operations to fall back to ExprEval instead of compiled kernels

  • Update tests to use takeAlongAxis instead of gather and remove unsupported test cases

Changes walkthrough

Relevant files
Error handling
indexing.cpp
Add TensorIndexer support validation                                         

csrc/id_model/indexing.cpp

  • Added NVF_ERROR assertion to validate fusion support by TensorIndexer
  • Ensures only supported fusions proceed with indexing operations
  • +2/-0     
    Enhancement
    expr_eval_sched.cpp
    Disable compile-time scheduling for GatherOp                         

    csrc/scheduler/expr_eval_sched.cpp

  • Added GatherOp to list of unsupported compile-time schedulable
    operations
  • Forces gather operations to use ExprEval fallback instead of compiled
    kernels
  • +1/-0     
    registry.cpp
    Reject non-exact gather operations                                             

    csrc/scheduler/registry.cpp

  • Added check to reject scheduling when non-exact gather operations
    exist
  • Prevents compilation of fusions with unsupported gather patterns
  • +10/-0   
    Tests
    test_gather.cpp
    Remove unsupported gather tests                                                   

    tests/cpp/test_gather.cpp

  • Updated test to expect ExprEval scheduler instead of Reduction
    scheduler
  • Removed "GatherIterGoupedReduction" test case
  • Removed "SameTvUsedAsLookupAndIndex" test case
  • +1/-134 
    test_persistent_buffer.cpp
    Replace gather with takeAlongAxis                                               

    tests/cpp/test_persistent_buffer.cpp

  • Replaced gather operation with takeAlongAxis in test
  • Updated comment to explain the change
  • +3/-1     
    test_reduction.cpp
    Update cross-entropy test to use takeAlongAxis                     

    tests/cpp/test_reduction.cpp

  • Replaced gather operation with takeAlongAxis in cross-entropy test
  • Maintains test functionality with supported operation
  • +1/-1     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Potential Runtime Failures

    The added NVF_ERROR assertion will cause runtime failures for any fusion containing gather operations. While this is intentional to prevent fallback to legacy indexer, it may break existing user code unexpectedly. Consider if a deprecation path or clearer error message would be appropriate.

    NVF_ERROR(isSupported(id_model.fusion()));
    Removed Test Coverage

    Two test cases were completely removed: "GatherIterGoupedReduction" and "SameTvUsedAsLookupAndIndex". These tests covered important edge cases like grouped reductions on GatherScatter iteration types and using the same TV as both lookup and index. The removal reduces test coverage for gather functionality.

    Performance Regression Risk

    The PR acknowledges a performance regression for cross-entropy benchmark that still uses gather operations. The comment mentions a pending change in Thunder, but there's no timeline or fallback plan provided. This could significantly impact performance for users relying on cross-entropy operations.

    // Support of non-exact gather was dropped when the legacy indexer was
    // deprecated
    if (std::ranges::any_of(
            ir_utils::getOpsOfType<GatherOp>(fusion),
            [](GatherOp* gather) { return !gather->exactSizes(); })) {
      scheduler_debug_utils::canScheduleRejectReason(
          scheduler_type, "Non-exact gather ops");
      return false;
    }

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Feb 3, 2026

    !test

    @naoyam naoyam requested a review from jjsjann123 February 3, 2026 18:19
    @naoyam naoyam marked this pull request as ready for review February 3, 2026 18:20
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 3, 2026

    Greptile Overview

    Greptile Summary

    This PR removes codegen support for non-exact gather operations by routing them to ExprEval/ATen evaluation, preparing for legacy indexer removal.

    Key Changes:

    • Added registry-level rejection for non-exact gather ops across all schedulers
    • Routed all GatherOp (including exact-sized takeAlongAxis) to ExprEval scheduler
    • Added assertion in TensorIndexer constructor to validate no unsupported ops slip through
    • Converted test usages of gather to takeAlongAxis where applicable
    • Removed 2 tests that specifically tested non-exact gather functionality (GatherIterGoupedReduction, SameTvUsedAsLookupAndIndex)
    • Updated test expectations to reflect ExprEval scheduling instead of PointWise

    Impact:
    The PR intentionally introduces a performance regression for gather/takeAlongAxis operations by delegating them to ATen evaluation. This is acknowledged as necessary technical debt reduction to enable removal of the legacy indexer. Note that while takeAlongAxis (exact-sized gather) is supported by TensorIndexer, it's still routed through ExprEval for simplicity.

    Confidence Score: 5/5

    • Safe to merge with intentional performance regression for gather operations
    • Changes are well-structured and aligned with stated goal of removing legacy indexer. All modifications consistently enforce the new gather handling policy across scheduler registry, ExprEval routing, and TensorIndexer validation. Test updates properly reflect new behavior. The performance regression is explicitly acknowledged and justified as necessary for technical debt reduction.
    • No files require special attention

    Important Files Changed

    Filename Overview
    csrc/id_model/indexing.cpp Added assertion to ensure TensorIndexer only handles supported fusions (exact-sized gather ops)
    csrc/scheduler/expr_eval_sched.cpp Added GatherOp to list of operations delegated to ExprEval scheduler
    csrc/scheduler/registry.cpp Added rejection logic for non-exact gather operations across all schedulers

    Sequence Diagram

    sequenceDiagram
        participant User
        participant Registry as Scheduler Registry
        participant ExprEval as ExprEval Scheduler
        participant TensorIndexer
        participant LegacyIndexer as Legacy Indexer (to be removed)
        
        User->>Registry: Schedule fusion with GatherOp
        alt Non-exact gather (exactSizes() == false)
            Registry-->>User: Reject: "Non-exact gather ops"
            Note over Registry: checkCanSchedule returns false
            Registry->>ExprEval: Delegate to ExprEval
            ExprEval-->>User: Use ATen evaluation
        else Exact gather (takeAlongAxis)
            Registry->>TensorIndexer: Attempt scheduling
            TensorIndexer->>TensorIndexer: isSupported check
            alt isSupported fails
                TensorIndexer-->>Registry: NVF_ERROR assertion
                Note over TensorIndexer: Should not happen after<br/>scheduler changes
            else isSupported passes
                TensorIndexer-->>User: Compile with TensorIndexer
            end
        end
        
        Note over LegacyIndexer: Legacy indexer will be<br/>removed in follow-up PR
    
    Loading

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    3 files reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    // Test grouped reduction on IterType::GatherScatter
    TEST_F(GatherTest, GatherIterGoupedReduction) {
    // Codegen support of non-exact gather dropped
    TEST_F(GatherTest, DISABLED_GatherIterGoupedReduction) {
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Do we still plan to support this later? wondering if we should remove the tests instead.

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Let me remove it

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Feb 4, 2026

    !build

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    3 files reviewed, 1 comment

    Edit Code Review Agent Settings | Greptile

    // TODO: remove IndexPutAccumulateOp
    if (exprs.front()
    ->isOneOf<
    GatherOp,
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Adding GatherOp here routes ALL gather operations (including exact-sized takeAlongAxis) to ExprEval/ATen evaluation. The PR title says "but not takeAlongAxis", suggesting exact gather should still be compiled. Consider filtering to only accept non-exact gather:

    Suggested change
    GatherOp,
    !exprs.front()->isa<GatherOp>() || !exprs.front()->as<GatherOp>()->exactSizes() ? GatherOp : void,

    Or clarify if the performance regression for takeAlongAxis is intentional.

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Feb 4, 2026

    !build

    3 similar comments
    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Feb 4, 2026

    !build

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Feb 4, 2026

    !build

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Feb 4, 2026

    !build

    @naoyam naoyam merged commit 1eacd9b into main Feb 4, 2026
    18 checks passed
    @naoyam naoyam deleted the gather branch February 4, 2026 18:46
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants