Release highlights
iree-turbine core
-
#434 : iree-turbine has a new website: https://iree-turbine.readthedocs.io/. See the
docs/
folder for contributing instructions. -
#373 : The deprecated
shark_turbine
namespace has been fully deleted, users should migrate toiree.turbine
. -
#418: There are new utility APIs for preparing tensors as input arguments for IREE tools:
# iree.turbine.support.conversions torch_dtyped_shape_to_iree_format(...) # iree.turbine.support.tools iree_tool_format_cli_input_arg(...) iree_tool_prepare_input_args(...)
TKW
Improved support and performance for attention kernel variants:
- #387: Added a new paged decode attention kernel.
- #412: Added a new implementation of prefill attention.
- #452: Add self_index, predicate, and selectOp to implement causal attention.
- #424: Reordering of shared load-write to minimize LDS barriers improved performance for some attention shapes by up to 10%.
Other optimizations:
- #394: A memory layout attribute for the MemoryType now allows users to specify a physical shape that differs from the logical shape. This is useful in scenarios like kv-caches where certain dimensions physically are quite large but map to fixed logical dimensions.
- #436: Use buffer ops for masked load/stores.
Development quality of life features:
- #406: Tests parameterized by shapes now have better names.
- #423: Wave pass pipelines now feature printing options.
Changelog
Full list of changes: v3.1.0...v3.2.0