Skip to content

v8.0.11: Improved GPU training time

Compare
Choose a tag to compare
@svlandeg svlandeg released this 20 Oct 07:16
2f2de92

✨ New features and improvements

  • Speed up GPU training time with up to ~25% by using cuBLAS for computing Frobenius norms in gradient clipping.
  • Give preference to AppleOps (if available) when calling get_ops("cpu").
  • Support missing values in CategoricalCrossEntropy when the labels are integers.
  • Provide the option to run model.walk with depth-first traversal.
  • Wrap forward/init callbacks of a Model in with_debug and with_nvtx_range to facilitate recursively instrumenting models.

🔴 Bug fixes

  • Fix issue #537: Fix replace_node on nodes with indirect node refs.

👥 Contributors

@adrianeboyd, @danieldk, @honnibal, @ines, @svlandeg