✨ New features and improvements
- Speed up GPU training time with up to ~25% by using cuBLAS for computing Frobenius norms in gradient clipping.
- Give preference to
AppleOps
(if available) when calling get_ops("cpu")
.
- Support missing values in
CategoricalCrossEntropy
when the labels are integers.
- Provide the option to run
model.walk
with depth-first traversal.
- Wrap
forward
/init
callbacks of a Model
in with_debug
and with_nvtx_range
to facilitate recursively instrumenting models.
🔴 Bug fixes
- Fix issue #537: Fix
replace_node
on nodes with indirect node refs.
👥 Contributors
@adrianeboyd, @danieldk, @honnibal, @ines, @svlandeg