Skip to content

Conversation

@cuichenx
Copy link
Contributor

@cuichenx cuichenx commented Jan 5, 2026

THD Sink attention is supported in 9.18.0

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

cuichenx and others added 2 commits January 5, 2026 11:10
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile Summary

This PR updates the attention backend selection logic to enable FusedAttention for THD (Token-Head-Dimension) format with non-vanilla softmax types when using cuDNN 9.18.0 or later. Previously, FusedAttention was unconditionally disabled for all THD formats with non-vanilla softmax. The change adds a version check that only disables FusedAttention for cuDNN versions below 9.18.0, allowing modern cuDNN versions to leverage the newly supported sink attention feature in THD format.

The change is minimal and focused: it wraps the FusedAttention disabling logic in a version check, while keeping UnfusedDotProductAttention disabled for all versions to maintain backward compatibility with older cuDNN versions that lack this feature.

Confidence Score: 5/5

  • This PR is safe to merge with no concerns. The change is straightforward version-gated logic that enables an existing feature for newer cuDNN versions.
  • The change introduces a simple, version-gated condition (cudnn_version < (9, 18, 0)) that modifies behavior based on cuDNN capabilities. The logic is correct: it allows FusedAttention to be used for THD format with non-vanilla softmax when cuDNN 9.18.0+ is available, which is the stated feature addition. The version check pattern is consistent with other version checks in the same file (e.g., lines 491, 497). The change is backward compatible as it only enables new functionality for newer versions, while older versions continue to have FusedAttention disabled as before. No new bugs are introduced, and the scope is minimal (single conditional block).
  • No files require special attention

Important Files Changed

Filename Overview
transformer_engine/pytorch/attention/dot_product_attention/utils.py Updated THD sink attention logic to conditionally disable FusedAttention only when cuDNN version is below 9.18.0. This allows newer cuDNN versions to use FusedAttention with THD format and non-vanilla softmax types, which is now supported natively.

Sequence Diagram

sequenceDiagram
    participant get_attention_backend as get_attention_backend()
    participant version_check as cudnn_version check
    participant fused_attn as FusedAttention

    get_attention_backend->>version_check: Check if softmax_type != "vanilla"
    version_check-->>get_attention_backend: True
    
    get_attention_backend->>version_check: Check if qkv_format == "thd"
    version_check-->>get_attention_backend: True
    
    rect rgb(200, 220, 255)
    Note over version_check: NEW: Version gate added
    get_attention_backend->>version_check: Check if cudnn_version < (9, 18, 0)
    end
    
    alt cuDNN < 9.18.0
        version_check-->>fused_attn: Disable FusedAttention (legacy behavior)
    else cuDNN >= 9.18.0
        version_check-->>fused_attn: Allow FusedAttention (new feature support)
    end
Loading

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@cyanguwa
Copy link
Collaborator

cyanguwa commented Jan 6, 2026

Could you please add a THD test here: https://github.com/cuichenx/TransformerEngine/blob/442699c714c3e25d1797712319e32f4d569a98e5/tests/pytorch/attention/test_attention.py#L418

Thanks!

@cuichenx cuichenx closed this Jan 6, 2026
@cuichenx cuichenx deleted the patch-1 branch January 6, 2026 23:28
@cuichenx
Copy link
Contributor Author

cuichenx commented Jan 7, 2026

accidentally closed this after renaming the branch.. opened new PR here #2568

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants