Release v3.2.0

Latest

Latest

ScottTodd released this 10 Feb 19:56

· 33 commits to main since this release

7038127

Release highlights

iree-turbine core

#434 : iree-turbine has a new website: https://iree-turbine.readthedocs.io/. See the docs/ folder for contributing instructions.
#373 : The deprecated shark_turbine namespace has been fully deleted, users should migrate to iree.turbine.

#418: There are new utility APIs for preparing tensors as input arguments for IREE tools:

# iree.turbine.support.conversions
torch_dtyped_shape_to_iree_format(...)

# iree.turbine.support.tools
iree_tool_format_cli_input_arg(...)
iree_tool_prepare_input_args(...)

TKW

Improved support and performance for attention kernel variants:

#387: Added a new paged decode attention kernel.
#412: Added a new implementation of prefill attention.
#452: Add self_index, predicate, and selectOp to implement causal attention.
#424: Reordering of shared load-write to minimize LDS barriers improved performance for some attention shapes by up to 10%.

Other optimizations:

#394: A memory layout attribute for the MemoryType now allows users to specify a physical shape that differs from the logical shape. This is useful in scenarios like kv-caches where certain dimensions physically are quite large but map to fixed logical dimensions.
#436: Use buffer ops for masked load/stores.

Development quality of life features:

#406: Tests parameterized by shapes now have better names.
#423: Wave pass pipelines now feature printing options.

Changelog

Full list of changes: v3.1.0...v3.2.0

Assets 3