Skip to content

Release v3.2.0

Latest
Compare
Choose a tag to compare
@ScottTodd ScottTodd released this 10 Feb 19:56
· 33 commits to main since this release
v3.2.0
7038127

Release highlights

iree-turbine core

  • #434 : iree-turbine has a new website: https://iree-turbine.readthedocs.io/. See the docs/ folder for contributing instructions.

  • #373 : The deprecated shark_turbine namespace has been fully deleted, users should migrate to iree.turbine.

  • #418: There are new utility APIs for preparing tensors as input arguments for IREE tools:

    # iree.turbine.support.conversions
    torch_dtyped_shape_to_iree_format(...)
    
    # iree.turbine.support.tools
    iree_tool_format_cli_input_arg(...)
    iree_tool_prepare_input_args(...)

TKW

Improved support and performance for attention kernel variants:

  • #387: Added a new paged decode attention kernel.
  • #412: Added a new implementation of prefill attention.
  • #452: Add self_index, predicate, and selectOp to implement causal attention.
  • #424: Reordering of shared load-write to minimize LDS barriers improved performance for some attention shapes by up to 10%.

Other optimizations:

  • #394: A memory layout attribute for the MemoryType now allows users to specify a physical shape that differs from the logical shape. This is useful in scenarios like kv-caches where certain dimensions physically are quite large but map to fixed logical dimensions.
  • #436: Use buffer ops for masked load/stores.

Development quality of life features:

  • #406: Tests parameterized by shapes now have better names.
  • #423: Wave pass pipelines now feature printing options.

Changelog

Full list of changes: v3.1.0...v3.2.0