[BugFix] Fix mask_id bug in block_sparse_sage2_attn_cuda for sm90 by 1145284121 · Pull Request #54 · mit-han-lab/radial-attention

1145284121 · 2025-09-06T07:47:36Z

On sm90 architecture, sparge_mask_convert() repeats mask along query_idx dimension. This fix corrects the mask_id indexing in block_sparse_sage2_attn_cuda().

Before fix, the query's mask_id in block_sparse_sage2_attn_cuda() is wrong ,error occors:

sp-radial-attention/radial_attn/attn_mask.py", line 363, in RadialAttention return SpargeSageAttnBackend(query, key, value, mask_map, video_mask, pre_defined_mask, block_size=block_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "sp-radial-attention/radial_attn/attn_mask.py", line 276, in SpargeSageAttnBackend k=key[:pre_defined_mask[0].sum(), :, :], ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

After Fix , HunyuanVideo + Radial + SageAttention works successfully.

hunyuan_radial_sage_sp4.mp4

But Wan2.1 (no text tokens in attention)produces higher quality results：

wan_radial_sp4_sage.mp4

Limitations

Generated video is still in low quanity when using sageattention in hunyuanvideo, compared with Wan2.1/Wan2.2.This is likely because the block mask granularity of 128 causes many text_padding key/value tokens participate attention computation.
Optimal solution requires separating key_video/key_text processing (like FlashInferBackend), But this needs block_sparse_sage2_attn_cuda() to return lse like flashinfer.single_prefill_with_kv_cache().
Another approach would be for block_sparse_sage2_attn_cuda() to support a 'last_page' parameter, similar to FlashInfer's BSR sparse mask representation

[BugFix] Fix mask_id bug in block_sparse_sage2_attn_cuda for sm90

d1a3701

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix mask_id bug in block_sparse_sage2_attn_cuda for sm90#54

[BugFix] Fix mask_id bug in block_sparse_sage2_attn_cuda for sm90#54
1145284121 wants to merge 1 commit intomit-han-lab:mainfrom
1145284121:bugfix_sage

1145284121 commented Sep 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

1145284121 commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1145284121 commented Sep 6, 2025 •

edited

Loading