Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@weifengpy @drisspg
Fix #1665
TLDR: Implement
aten.cat.default
so thatNF4Tensor
can be used when usingDDP
.Overview
DDP
syncs params and buffers during__init__
. This dispatches to a call toaten.cat.default
with (potentially) a list of tensors with mixed dtypes ifnf4
tensors fall in the same bucket as regular tensors.Implementing
aten.cat.default
fixes this issue by unpacking thenf4
to their original tensors. Other operations post the sync are already implemented such that the synced modules can be properly reconstructed.Tests
Tests are located in
tests/dtypes/ddp
and can be run by executing therun_ddp_nf4_test.sh
script.This script does the following:
LoraLinear
model (ddp_nf4.py
) with world size 1 to generate a reference checkpoint.ddp_nf4.py
with world size 2 to generate test checkpoints.Example output:
ddp_nf4.py
can be parametrized: