Releases: huggingface/accelerate
v0.6.0: Checkpointing and bfloat16 support
This release adds support for bloat16 mixed precision training (requires PyTorch >= 1.10) and a brand-new checkpoint utility to help with resuming interrupted trainings. We also get a completely revamped documentation frontend.
Checkpoints
Save the current state of all your objects (models, optimizers, RNG states) with accelerator.save_state(path_to_checkpoint)
and reload everything by calling accelerator.load_state(path_to_checkpoint)
- Add in checkpointing capability by @muellerzr in #255
- Implementation of saving and loading custom states by @muellerzr in #270
BFloat16 support
Accelerate now supports bfloat16 mixed precision training. As a result the old --fp16
argument has been deprecated to be replaced by the more generic --mixed-precision
.
- Add bfloat16 support #243 by @ikergarcia1996 in #247
New env subcommand
You can now type accelerate env
to have a copy-pastable summary of your environment and default configuration. Very convenient when opening a new issue!
New doc frontend
The documentation has been switched to the new Hugging Face frontend, like Transformers and Datasets.
What's Changed
- Fix send_to_device with non-tensor data by @sgugger in #177
- Handle UserDict in all utils by @sgugger in #179
- Use collections.abc.Mapping to handle both the dict and the UserDict types by @mariosasko in #180
- fix: use
store_true
on argparse in nlp example by @monologg in #183 - Update README.md by @TevenLeScao in #187
- Add signature check for
set_to_none
in Optimizer.zero_grad by @sgugger in #189 - fix typo in code snippet by @MrZilinXiao in #199
- Add high-level API reference to README by @Chris-hughes10 in #204
- fix rng_types in accelerator by @s-kumano in #206
- Pass along drop_last in DispatchDataLoader by @sgugger in #212
- Rename state to avoid name conflicts with pytorch's Optimizer class. by @yuxinyuan in #224
- Fix lr scheduler num samples by @sgugger in #227
- Add customization point for init_process_group kwargs by @sgugger in #228
- Fix typo in installation docs by @jaketae in #234
- make deepspeed optimizer match parameters of passed optimizer by @jmhessel in #246
- Upgrade black to version ~=22.0 by @LysandreJik in #250
- add support of gather_object by @ZhiyuanChen in #238
- Add launch flags --module and --no_python (#256) by @parameter-concern in #258
- Accelerate + Animus/Catalyst = 🚀 by @Scitator in #249
- Add
debug_launcher
by @sgugger in #259 - enhance compatibility of honor type by @ZhiyuanChen in #241
- Add a flag to use CPU only in the config by @sgugger in #263
- Basic fixes for DeepSpeed by @sgugger in #264
- Ability to set the seed with randomness from inside Accelerate by @muellerzr in #266
- Don't use dispatch_batches when torch is < 1.8.0 by @sgugger in #269
- Make accelerated model with AMP possible to pickle by @BenjaminBossan in #274
- Contributing guide by @LysandreJik in #254
- replace texts and link (master -> main) by @johnnv1 in #282
- Use workflow from doc-builder by @sgugger in #275
- Pass along execution info to the exit of autocast by @sgugger in #284
New Contributors
- @mariosasko made their first contribution in #180
- @monologg made their first contribution in #183
- @TevenLeScao made their first contribution in #187
- @MrZilinXiao made their first contribution in #199
- @Chris-hughes10 made their first contribution in #204
- @s-kumano made their first contribution in #206
- @yuxinyuan made their first contribution in #224
- @jaketae made their first contribution in #234
- @jmhessel made their first contribution in #246
- @ikergarcia1996 made their first contribution in #247
- @ZhiyuanChen made their first contribution in #238
- @parameter-concern made their first contribution in #258
- @Scitator made their first contribution in #249
- @muellerzr made their first contribution in #255
- @BenjaminBossan made their first contribution in #274
- @johnnv1 made their first contribution in #280
Full Changelog: v0.5.1...v0.6.0
v0.5.1: Patch release
v0.5.0 Dispatch batches from main DataLoader
v0.5.0 Dispatch batches from main DataLoader
This release introduces support for iterating through a DataLoader
only on the main process, that then dispatches the batches to all processes.
Dispatch batches from main DataLoader
The motivation behind this come from dataset streaming which introduces two difficulties:
- there might be some timeouts for some elements of the dataset, which might then be different in each process launched, thus it's impossible to make sure the data is iterated though the same way on each process
- when using IterableDataset, each process goes through the dataset, thus applies the preprocessing on all elements. This can yield to the training being slowed down by this preprocessing.
This new feature is activated by default for all IterableDataset
.
Various fixes
- fix fp16 covert back to fp32 for issue: unsupported operand type(s) for /: 'dict' and 'int' #149 (@Doragd)
- [Docs] Machine config is yaml not json #151 (@patrickvonplaten)
- Fix gather for 0d tensor #152 (@sgugger)
- [DeepSpeed] allow untested optimizers deepspeed #150 (@patrickvonplaten)
- Raise errors instead of warnings with better tests #170 (@sgugger)
v0.4.0 Experimental DeepSpeed and multi-node CPU support
v0.4.0 Experimental DeepSpeed support
This release adds support for DeepSpeed. While the basics are there to support ZeRO-2, ZeRo-3, as well a CPU and NVME offload, the API might evolve a little bit as we polish it in the near future.
It also adds support for multi-node CPU. In both cases, just filling the questionnaire outputted by accelerate config
and then launching your script with accelerate launch
is enough, there are no changes in the main API.
DeepSpeed support
Multinode CPU support
Various fixes
- Fix batch_sampler error for IterableDataset #62 (@ddkalamk)
- Honor namedtuples in inputs/outputs #67 (@sgugger)
- Fix examples README #70 (@cccntu)
- TPU not available in kaggle #73 (@yuangan)
- Pass args in notebook_launcher for multi-GPU #78 (@sgugger)
- Fix
accelerate test
with no config file #79 (@cccntu) - Use
optimizer
for consistency #81 (@kumapo) - Update README.md #87 (@Separius)
- Add
unscale_gradients
method. #88 (@sgugger) - Add Accelerator.free_memory #89 (@sgugger)
- [Feature] Add context manager to allow main process first. #98 (@Guillem96)
- Pass along kwargs to backward #104 (@sgugger)
- Add course banner #107 (@sgugger)
- added closure argument to optimizer.step() #105 (@pmelchior)
- Fix import error for torch 1.4.0 #108 (@sgugger)
- Unwrap optimizer before unscaling #115 (@sgugger)
- Fix DataLoader length when split_batches=True #121 (@sgugger)
- Fix
OptimWrapper
init #127 (@sgugger) - Fix fp16 by converting outputs back to FP32 #134 (@sgugger)
- Add caveat on weight-tying on TPUs #138 (@sgugger)
- Add optimizer not stepped property #139 (@sgugger)
v0.3.0 Notebook launcher and multi-node training
v0.3.0 Notebook launcher and multi-node training
Notebook launcher
After doing all the data preprocessing in your notebook, you can launch your training loop using the new notebook_launcher
functionality. This is especially useful for Colab or Kaggle with TPUs! Here is an example on Colab (don't forget to select a TPU runtime).
This launcher also works if you have multiple GPUs on your machine. You just have to pass along num_processes=your_number_of_gpus
in the call to notebook_launcher
.
- Notebook launcher #44 (@sgugger)
- Add notebook/colab example #52 (@sgugger)
- Support for multi-GPU in notebook_launcher #56 (@sgugger)
Multi-node training
Our multi-node training test setup was flawed and the previous releases of 🤗 Accelerate were not working for multi-node distributed training. This is all fixed now and we have ensured to have more robust tests!
- fix cluster.py indent error #35 (@JTT94)
- Set all defaults from config in launcher #38 (@sgugger)
- Fix port in config creation #50 (@sgugger)
Various bug fixes
- Fix typos in examples README #28 (@arjunchandra)
- Fix load from config #31 (@sgugger)
- docs: minor spelling tweaks #33 (@brettkoonce)
- Add
set_to_none
to AcceleratedOptimizer.zero_grad #43 (@sgugger) - fix #53 #54 (@Guitaricet)
- update launch.py #58 (@Jesse1eung)
v0.2.1: Patch release
Fix a bug preventing the load of a config with accelerate launch
v0.2.0 SageMaker launcher
v0.2.0 SageMaker launcher
SageMaker launcher
It's now possible to launch your training script on AWS instances using SageMaker via accelerate launch
.
- Launch script on SageMaker #26 (@philschmid )
- Add defaults for compute_environmnent #23 (@sgugger )
- Add Configuration setup for SageMaker #17 (@philschmid )
Kwargs handlers
To customize how the different objects used for mixed precision or distributed training are instantiated, a new API called KwargsHandler
is added. This allows the user to pass along the kwargs that will be passed to those objects if used (and it is ignored if those are not used in the current setup, so the script can still run on any kind of setup).
Pad across processes
Trying to gather tensors that are not of the same size across processes resulted in a process hang, a new method Accelerator.pad_across_processes
has been added to help with that.
Various bug fixes
- added thumbnail #25 (@philschmid )
- Cleaner diffs in README and index #22 (@sgugger )
- Use proper size #21 (@sgugger )
- Alternate diff #20 (@sgugger )
- Add YAML config support #16 (@sgugger )
- Don't error on non-Tensors objects in move to device #13 (@sgugger )
- Add CV example #10 (@sgugger )
- Readme clean-up #9 (@thomwolf )
- More flexible RNG synchronization #8 (@sgugger )
- Fix typos and tighten grammar in README #7 (@lewtun )
- Update README.md #6 (@voidful )
- Fix TPU training in example #4 (@thomwolf )
- Fix example name in README #3 (@LysandreJik )