Releases: Genesis-Embodied-AI/quadrants
v1.0.2
Release v1.0.2
This release reverts 'fix ndarrays on data oriented' which caused a regression in Genesis.
What's Changed
- [Type] Add unpacked form for qd.vector for indexed register access by @hughperkins in #718
- [DataOriented] Revert "[DataOriented] Fix ndarrays on data oriented (#704)" by @hughperkins in #719
Full Changelog: v1.0.1...v1.0.2
v1.0.1
Release v1.0.1
This release adds axes= to ndrange, bitonic sort, and adds 32x32 Cholesky tiles.
What's Changed
Perf
- [Perf] Move register-tile Cholesky optimizations from Genesis back into quadrants by @hughperkins in #714
- [Perf] Add bitonic sort to subgroup ops by @hughperkins in #713
DataOriented
- [DataOriented] Fix ndarrays on data oriented by @hughperkins in #704
Lang
- [Lang] Add axes= to ndrange by @hughperkins in https://github.com/Genesis-Embodied-
CI
- [CI] Add slow marker and remove un-necessary tests by @hughperkins in #711
- [CI] Upgrade PR change report from composer 2 to composer 2.5 by @hughperkins in #716
AI/quadrants/pull/710
Doc
Test
- [Test] Drop taichi xdist fork, use stock pytest-xdist by @hughperkins in #556
Full Changelog: v1.0.0...v1.0.1
v1.0.1b2
Pre-release v1.0.1b2
Changes:
Data-oriented
- [DataOriented] Fastcache, perf, pruning by @hughperkins in #705
Full Changelog: v1.0.0...v1.0.1b2
v1.0.1b1
Pre-release v1.0.1b1
Changes:
Data-oriented
- [DataOriented] Fastcache, perf, pruning by @hughperkins in #705
Full Changelog: v1.0.0...v1.0.1b1
v1.0.0
Release v1.0.0
This release adds new device-level ops for QIPC, and volatile_load.
What's Changed
GPU
- [GPU] New device-level ops for QIPC by @hughperkins in #693
Cleaning
- [Cleaning] PrefixSumExecutor: drop unused GRID_SZ local by @hughperkins in #701
- [Cleaning] sync(): fix unsupported-arch error message by @hughperkins in #700
Atomics
- [Atomics] add qd.volatile_load primitive (closes #648) by @hughperkins in #702
AutoDiff
- [AutoDiff] Reject recycled identity_key in AdStackCache::register_adstack_sizing_info by @hughperkins in #708
Vulkan
- [Vulkan] Declare GroupNonUniform SPIR-V caps and enable shaderSubgroupExtendedTypes by @hughperkins in #707
Full Changelog: v0.8.0...v1.0.0
v0.8.1b2
Pre-release v0.8.1b2
This pre-release is to test a new faster more streamlined data_oriented class on Genesis.
What's Changed
GPU
- [GPU] New device-level ops for QIPC by @hughperkins in #693
Cleaning
- [Cleaning] PrefixSumExecutor: drop unused GRID_SZ local by @hughperkins in #701
- [Cleaning] sync(): fix unsupported-arch error message by @hughperkins in #700
Atomics
- [Atomics] add qd.volatile_load primitive (closes #648) by @hughperkins in #702
AutoDiff
- [AutoDiff] Reject recycled identity_key in AdStackCache::register_adstack_sizing_info by @hughperkins in #708
DataOriented
- [DataOriented] Fastcache, perf, pruning by @hughperkins in #705
Full Changelog: v0.8.0...v0.8.1b2
v0.8.1b1
Pre-release v0.8.1b1
This pre-release is to test a new faster more streamlined data_oriented class on Genesis.
What's Changed
GPU
- [GPU] New device-level ops for QIPC by @hughperkins in #693
Cleaning
- [Cleaning] PrefixSumExecutor: drop unused GRID_SZ local by @hughperkins in #701
- [Cleaning] sync(): fix unsupported-arch error message by @hughperkins in #700
Atomics
- [Atomics] add qd.volatile_load primitive (closes #648) by @hughperkins in #702
AutoDiff
- [AutoDiff] Reject recycled identity_key in AdStackCache::register_adstack_sizing_info by @hughperkins in #708
DataOriented
- [DataOriented] Fastcache, perf, pruning by @hughperkins in #705
Full Changelog: v0.8.0...v0.8.1b1
v0.8.0
Release v0.8.0
This release brings many cross-GPU SIMT primitives, at both subgroup and block level. Note that subgroup reductions no longer take log2_size parameter, which is a breaking change, hence the minor version bump. In addition, AMD always uses wave64 going forward, to simplify testing.
What's Changed
GPU
- [GPU] Cross-GPU for grid ops by @hughperkins in #670
- [GPU] New bit ops for QIPC by @hughperkins in #679
- [GPU] Subgroup ops cross-gpu by @hughperkins in #665
- [BREAKING][GPU] New QIPC ops for subgroups by @hughperkins in #676
- [GPU] New QIPC ops for block by @hughperkins in #684
Math
- [Math] Make bitop operations portable cross-gpu by @hughperkins in #662
- [Math] New QIPC ops for single-threaded linalg by @hughperkins in #683
AMDGPU
- [AMDGPU] Always use wave64, on both RDNA and CDNA by @hughperkins in #687
- [AMDGPU] Use syncscope("agent") for atomix xor to avoid CAS livelock by @hughperkins in #672
Graph
- [Graph] Rename CUDA Graph to Graph in docs by @hughperkins in #691
- [Graph] HIP graph runtime support for @qd.kernel(graph=True) by @hughperkins in #692
Metal
Atomics
- [Atomics] New QIPC ops for atomics by @hughperkins in #690
Structs
- [Structs] Pass dataclass sub-structs into qd.func by @hughperkins in #698
CI
- [CI] Add per-file timing report to Mac Metal test job by @hughperkins in #695
- [CI] Enable kernel disk cache during tests by @hughperkins in #696
Full Changelog: v0.7.8...v0.8.0
v0.7.8
Release v0.7.8
This release contains further autodiff optimizations; and generalizes block and atomic operations across all GPU architectures.
What's Changed
Perf
- [Perf] Adstack max-reducer: launch cache + zero-copy result map; content-stable registry_id by @duburcqa in #671
- [Perf] CPU LLVM adstack-cache: skip per-launch bump-writes + ndarray_shapes capture on forward-only handles by @duburcqa in #685
SPIR-V
AutoDiff
- [AutoDiff] Debug-mode field/grad/dual: dtype, layout, and access-time invariants by @duburcqa in #677
Docs
- [Docs] Add user-guide page for qd.algorithms.* device-wide algorithms by @hughperkins in #642
- [Docs] Doc for existing atomics: switch support table to per-backend columns by @hughperkins in #657
GPU
- [GPU] Cross gpu atomics by @hughperkins in #666
- [GPU] Make block operations portable cross-gpu by @hughperkins in #664
Full Changelog: v0.7.6...v0.7.8
v0.7.7
v0.7.7
This release mainly targets autodiff. It fixes SPIR-V backends (Metal, Vulkan), significantly improves runtime speed (up to 30%), and add full support of debug mode.
What's Changed
AutoDiff
- [Perf] Adstack max-reducer: launch cache + zero-copy result map; content-stable registry_id by @duburcqa in #671
- [SPIR-V] dispatch_max_reducers: register each task with the real kernel name by @duburcqa in #675
- [AutoDiff] Debug-mode field/grad/dual: dtype, layout, and access-time invariants by @duburcqa in #677
Full Changelog: v0.7.6...v0.7.7