Offloading support for multiple attention layouts #2024

sanandaraj5597 · 2025-08-03T05:39:08Z

Description

CPU offloading currently only supports the sbhd_sbhd_sbhd layout, but we have multiple other layouts for pre-training, fine-tuning of LLM's.

This PR adds support offloading for all attention layouts.

Signed-off-by: Selvaraj Anandaraj <[email protected]>

for more information, see https://pre-commit.ci

pggPL · 2025-08-19T14:44:43Z

I think it will not work for offloaded layers, because .to() by default preserves memory format. I think it needs to be changed to .to(device="cpu", memory_format=torch.contiguous_format).

pggPL · 2025-08-19T14:45:04Z

Otherwise it looks good, have you tested it somehow?

sanandaraj5597 · 2025-08-19T18:20:01Z

I think it will not work for offloaded layers, because .to() by default preserves memory format. I think it needs to be changed to .to(device="cpu", memory_format=torch.contiguous_format).

It will work right because the cpu copy is created in a contiguous fashion. When you do .to(), the cpu copy will be moved to GPU which is also contiguous. And that's the reason why we breakdown all the formats (sbh3d/th3d/...) to contiguous (sbhd_sbhd_sbhd/thd_thd_thd/...) in attention.

have you tested it somehow?

Yes I've run E2E tests on top of this for pre-training and fine-tuning.

pggPL · 2025-08-20T08:13:50Z

https://docs.pytorch.org/docs/stable/generated/torch.Tensor.to.html It preserves memory formats. So it will split sbh3d into 3 separate tensors (non interleaved), none of them being contiguous - all of them will have the same stride as previously.

Selvaraj Anandaraj and others added 4 commits August 2, 2025 19:30

Added multi-layout support for attention

7a38ce4

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Comment/cleanup

f93f646

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Bug fix on import time

78434e4

Signed-off-by: Selvaraj Anandaraj <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a92e2ba

for more information, see https://pre-commit.ci

Merge branch 'main' into offloading_with_multi_attention_layout

e270e44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Offloading support for multiple attention layouts #2024

Offloading support for multiple attention layouts #2024

Uh oh!

sanandaraj5597 commented Aug 3, 2025

Uh oh!

pggPL commented Aug 19, 2025

Uh oh!

pggPL commented Aug 19, 2025

Uh oh!

sanandaraj5597 commented Aug 19, 2025 •

edited

Loading

Uh oh!

pggPL commented Aug 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Offloading support for multiple attention layouts #2024

Are you sure you want to change the base?

Offloading support for multiple attention layouts #2024

Uh oh!

Conversation

sanandaraj5597 commented Aug 3, 2025

Description

Uh oh!

pggPL commented Aug 19, 2025

Uh oh!

pggPL commented Aug 19, 2025

Uh oh!

sanandaraj5597 commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pggPL commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sanandaraj5597 commented Aug 19, 2025 •

edited

Loading

pggPL commented Aug 20, 2025 •

edited

Loading