torchao
FP8, initial Tensor Parallel support, and memory leak fixes
torchao
FP8
This release introduces a new FP8 API and brings in a new backend: torchao
. To use, pass in AORecipeKwargs
to the Accelerator
while setting mixed_precision="fp8"
. This is initial support, as it matures we will incorporate more into it (such as accelerate config
/yaml) in future releases. See our benchmark examples here
TensorParallel
We have intial support for an in-house solution to TP when working with accelerate dataloaders. check out the PR here
Bug fixes
- fix triton version check by @faaany in #3345
- fix torch_dtype in estimate memory by @SunMarc in #3383
- works for fp8 with deepspeed by @XiaobingSuper in #3361
- [
memory leak
] Replace GradientState -> DataLoader reference with weakrefs by @tomaarsen in #3391
What's Changed
- fix triton version check by @faaany in #3345
- [tests] enable BNB test cases in
tests/test_quantization.py
on XPU by @faaany in #3349 - [Dev] Update release directions by @muellerzr in #3352
- [tests] make cuda-only test work on other hardware accelerators by @faaany in #3302
- [tests] remove
require_non_xpu
test markers by @faaany in #3301 - Support more functionalities for MUSA backend by @fmo-mt in #3359
- [tests] enable more bnb tests on XPU by @faaany in #3350
- feat: support tensor parallel & Data loader by @kmehant in #3173
- DeepSpeed github repo move sync by @stas00 in #3376
- [tests] Fix bnb cpu error by @faaany in #3351
- fix torch_dtype in estimate memory by @SunMarc in #3383
- works for fp8 with deepspeed by @XiaobingSuper in #3361
- fix: typos in documentation files by @maximevtush in #3388
- [examples] upgrade code for seed setting by @faaany in #3387
- [
memory leak
] Replace GradientState -> DataLoader reference with weakrefs by @tomaarsen in #3391 - add xpu check in
get_quantized_model_device_map
by @faaany in #3397 - Torchao float8 training by @muellerzr in #3348
New Contributors
- @kmehant made their first contribution in #3173
- @XiaobingSuper made their first contribution in #3361
- @maximevtush made their first contribution in #3388
Full Changelog: v1.3.0...v1.4.0