Releases: Genesis-Embodied-AI/quadrants
v0.6.3b3
What's Changed
- [Perf] Tiles 1: _load, _store, eye by @hughperkins in #466
- [Misc] Remove dead InternalFuncStmt type_check override by @hughperkins in #471
- [Perf] Tiles 2a: add cholesky and ger by @hughperkins in #472
- [Perf] Tiles 2b: add triangular solve by @hughperkins in #474
- [Misc] Tiles 2c: Refactor: use _get_col/_set_col in tiles load/store/init by @hughperkins in #475
- [Build] Fix flaky test_clock_accuracy by @hughperkins in #436
- Fix AARCH64 emitting invalid asm in CUDA kernels. by @duburcqa in #473
- [AMDGPU] Enable HIP memory pool and surface pool-exhaustion errors. by @duburcqa in #485
- [AMDGPU] Scope hsaco tmp dir per-user to avoid collisions. by @duburcqa in #484
- [Perf] Tiles 3: Add slice syntax, qd.outer() and initial doc by @hughperkins in #477
- [AMDGPU] Fix gradient computation. by @duburcqa in #486
- Enable all backends that are supported in unit tests. by @duburcqa in #488
- Fix SPIRV ID overflow for large kernels due to autodiff. by @duburcqa in #489
- [Misc] Fix purity checker to allow accessing constants from quadrants modules by @hughperkins in #487
- [Misc] Increase tolerance for clock monotonic test by @hughperkins in #492
- [CI] Serialize api doc workflow by @hughperkins in #494
- [CI] Increase tolerance for clock test by @hughperkins in #506
- [CI] Increase clock test tolerance to 20% by @hughperkins in #509
- [Perf] Tiles 4a: Add tensor_type parametrization to tile16 tests by @hughperkins in #504
- [Perf] Tiles 4b: Migrate tiles16 tests to enable fastcache by @hughperkins in #505
- [Perf] Tiles 4c: add Tiles16x16 proxy by @hughperkins in #507
- [Perf] Tiles 4d: Consolidate slice error tests using parametrize by @hughperkins in #508
- [Perf] Tiles 4: add SharedArray slice support by @hughperkins in #482
- [Doc] Add user guide page for subgroup shuffle by @hughperkins in #512
- [Perf] Implement cross-platform shuffle_down by @hughperkins in #510
- [Perf] Add portable subgroup reduce_add and reduce_all_add by @hughperkins in #511
Full Changelog: v0.6.2...v0.6.3b3
v0.6.3b2
Pre-release v0.6.3b2
This pre-release add 16x16 register-only tiles for Cholesky, portable across all GPUs supporte by Quadrants (AMD, Vulkan, Metal, CUDA).
What's Changed
Perf
- [Perf] Tiles 1: _load, _store, eye by @hughperkins in #466
- [Perf] Tiles 2a: add cholesky and ger by @hughperkins in #472
- [Perf] Tiles 2b: add triangular solve by @hughperkins in #474
- [Perf] Tiles 2c: Refactor: use _get_col/_set_col in tiles load/store/init by @hughperkins in #475
- [Perf] Add 16x16 tiles for Cholesky factorization by @hughperkins in #449
Misc
- [Misc] Remove dead InternalFuncStmt type_check override by @hughperkins in #471
Tests
- [Tests] Fix flaky test_clock_accuracy by @hughperkins in #436
Full Changelog: v0.6.2...v0.6.3b2
v0.6.2
Release v0.6.2
This release adds support for atomics on float sharedarrays in spirv, and no longer needs CUDA Toolkit on SM90+ GPUs when using graph do while.
What's Changed
SPIRV
Misc
- [Misc] Change clang format to 120 characters by @hughperkins in #463
- [Misc] CUDA graph 5 Add fatbin by @hughperkins in #464
Bug
- [Bug] Reuse VkInstance across init/reset cycles by @hughperkins in #465
Full Changelog: v0.6.1...v0.6.2
v0.6.3b1
Pre-release v0.6.3b1
This pre-release add 16x16 register-only tiles for Cholesky, portable across all GPUs supporte by Quadrants (AMD, Vulkan, Metal, CUDA).
What's Changed
- [Perf] [Perf] Add 16x16 tiles for Cholesky factorization by @hughperkins in #449
Full Changelog: v0.6.2...v0.6.3b1
v0.6.1
Release v0.6.1
This release improves search for CUDA toolkit on Windows.
What's Changed
- [Bug] Also search default CUDA toolkit install location on Windows by @hughperkins in #461
Full Changelog: v0.6.0...v0.6.1
v0.6.0
Release v0.6.0
This release fixes a bug with CUDA graph on Windows. We also rename gpu_graph to just graph, which is backwards incompatible, hence the minor version bump.
What's Changed
Misc
- [Misc] Rename gpu_graph to graph by @hughperkins in #446
SIMT
- [SIMT] Add cross-platform shuffle by @hughperkins in #447
Bug
- [Bug] Fix graph_do_while on Windows: search for cudadevrt.lib by @hughperkins in #456
Full Changelog: v0.5.2...v0.6.0
v0.6.0b13
Pre-release v0.6.0b12
This pre-release add 16x16 register-only tiles for Cholesky, portable across all GPUs supporte by Quadrants (AMD, Vulkan, Metal, CUDA).
What's Changed
- [Misc] Rename gpu_graph to graph by @hughperkins in #446
- [Misc] Add cross-platform shuffle by @hughperkins in #447
- [Perf] [Perf] Add 16x16 tiles for Cholesky factorization by @hughperkins in #449
Full Changelog: v0.5.2...v0.6.0b12
v0.6.0b12
Pre-release v0.6.0b12
This pre-release add 16x16 register-only tiles for Cholesky, portable across all GPUs supporte by Quadrants (AMD, Vulkan, Metal, CUDA).
What's Changed
- [Misc] Rename gpu_graph to graph by @hughperkins in #446
- [Misc] Add cross-platform shuffle by @hughperkins in #447
- [Perf] Add 16x16 tiles for Cholesky factorization by @hughperkins in #449
Full Changelog: v0.5.2...v0.6.0b12
v0.6.0b11
Pre-release v0.6.0b11
This pre-release add 16x16 register-only tiles for Cholesky, portable across all GPUs supporte by Quadrants (AMD, Vulkan, Metal, CUDA).
What's Changed
- [Misc] Rename gpu_graph to graph by @hughperkins in #446
- [Misc] Add cross-platform shuffle by @hughperkins in #447
- [Perf] Add 16x16 tiles for Cholesky factorization by @hughperkins in #449
Full Changelog: v0.5.2...v0.6.0b11
v0.6.0b10
Pre-release v0.6.0b10
This pre-release add 16x16 register-only tiles for Cholesky, portable across all GPUs supporte by Quadrants (AMD, Vulkan, Metal, CUDA).
What's Changed
- [Misc] Rename gpu_graph to graph by @hughperkins in #446
- [Misc] Add cross-platform shuffle by @hughperkins in #447
- [Perf] Add 16x16 tiles for Cholesky factorization by @hughperkins in #449
Full Changelog: v0.5.2...v0.6.0b10