Skip to content

Conversation

@ldjurovicTT
Copy link
Contributor

Ticket

Problem description

Make calculating exponential in fast and approx mode faster

What's changed

Moved both sanitization and calculation to one LOADMACRO.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

Thank you for your contribution! 🚀

You can run tt-metal integration tests by adding the blackhole-integration-tests and/or wormhole-integration-tests labels to this pull request.

If you want to run metal post-commit tests, you can add the metal-post-commit-tests label to this pull request.

📖 For more information, please refer to our CONTRIBUTING guide.

@github-actions github-actions bot added blackhole test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework wormhole labels Dec 1, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes exponential calculation in fast approximation mode by consolidating the sanitization and calculation steps into a single LOADMACRO sequence that is recorded and replayed using the lltt::replay mechanism. This reduces instruction overhead and improves performance.

Key Changes

  • Replaced ~100 lines of manual LOADMACRO invocations with ~25 lines using the replay buffer approach
  • Updated threshold value from -88.5 to -86.6 and adjusted B_MINUS_C constant
  • Added comprehensive test suite for fast exponential approximation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_exp.h Refactored fast approximation mode to use replay buffer with 24 recorded instructions; updated constants and LOADMACRO setup
tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_exp.h Similar refactoring for Blackhole architecture with 16 recorded instructions; includes variable rename from in to val
tests/sources/fast_exp_test.cpp New C++ test implementation for fast exponential calculation across all TRISC kernels
tests/python_tests/test_fast_exp.py New Python test suite with multiple input dimensions and format configurations
tests/python_tests/helpers/utils.py Extended passed_test function to support custom tolerances and one-face checking
Comments suppressed due to low confidence (1)

tests/python_tests/test_fast_exp.py:72

  • Variable generate_golden is not used.
    generate_golden = get_golden_generator(UnarySFPUGolden)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TTI_SFPCONFIG(0x0000, 0x4, 0x0); // Load it into macro sequence register 0 (destination = 4)

TTI_SFPCONFIG(
0x0010, 0x8 /*LOADMACRO control*/, 0x1); // Specifies that the store in LOAMACRO “Sequence 0” will inherit the instr_mod0 field from the LOADMACRO
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "LOAMACRO" should be "LOADMACRO".

Suggested change
0x0010, 0x8 /*LOADMACRO control*/, 0x1); // Specifies that the store in LOAMACRO “Sequence 0” will inherit the instr_mod0 field from the LOADMACRO
0x0010, 0x8 /*LOADMACRO control*/, 0x1); // Specifies that the store in LOADMACRO “Sequence 0” will inherit the instr_mod0 field from the LOADMACRO

Copilot uses AI. Check for mistakes.
L1_to_L1_iterations: int = 1,
custom_rtol: float = None,
custom_atol: float = None,
one_face_check: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better yet, set num_faces and default it to 4. and then in your case just set 1. That sounds like something more scalable.

input_dimensions=[[32, 32], [32, 64], [64, 32], [64, 64], [128, 32], [32, 128]],
approx_mode=[ApproximationMode.Yes],
mathop=[MathOperation.Exp],
dest_acc=[DestAccumulation.No], # , DestAccumulation.Yes],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

golden_tensor,
res_tensor,
formats.output_format,
custom_atol=0.1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is big, can we go lower?

using namespace ckernel;
using namespace ckernel::sfpu;

const int iterations = 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const int iterations = 32;

Seems unused.

@ldjurovicTT ldjurovicTT added the metal-post-commit-tests This label is used for running tt-metal post-commit tests: wormhole and blackhole label Dec 3, 2025
Comment on lines +222 to +223
TTI_SFPLOADI(0, 0xA, lo16(B_MINUS_C));
TTI_SFPLOADI(0, 0x8, hi16(B_MINUS_C));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TTI_SFPLOADI(0, 0xA, lo16(B_MINUS_C));
TTI_SFPLOADI(0, 0x8, hi16(B_MINUS_C));
TTI_SFPLOADI(ckernel::p_sfpu::LREG0, sfpi::SFPLOADI_MOD0_LOWER, lo16(B_MINUS_C));
TTI_SFPLOADI(ckernel::p_sfpu::LREG0, sfpi::SFPLOADI_MOD0_UPPER, hi16(B_MINUS_C));

It'd be nice to replace the magic numbers with some constants. It's much easier to read the code later.

@ldjurovicTT ldjurovicTT force-pushed the ldjurovic/fast_exp_new branch from 37a5248 to 39aba02 Compare December 3, 2025 10:41
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

  • Wormhole tests: 🟢 Enabled
  • Blackhole tests: ⚪ Disabled

Test Results:

  • Wormhole tests: 🛑 Failed - View run
  • C++ post-commit tests: 🛑 Failed - View run
  • Blackhole tests: ⚪ Not run

🔗 Links

📊 Post-commit workflow: #19889451922

@ldjurovicTT ldjurovicTT force-pushed the ldjurovic/fast_exp_new branch from ba3c2e9 to a2f80dd Compare December 22, 2025 09:25
@github-actions
Copy link
Contributor

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

  • Wormhole tests: 🟢 Enabled
  • Blackhole tests: ⚪ Disabled

Test Results:

  • Wormhole tests: 🛑 Failed - View run
  • C++ post-commit tests: 🛑 Workflow Failed - Check logs for details
  • Blackhole tests: ⚪ Not run

🔗 Links

📊 Post-commit workflow: #20427636630

@github-actions
Copy link
Contributor

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

  • Wormhole tests: 🟢 Enabled
  • Blackhole tests: ⚪ Disabled

Test Results:

  • Wormhole tests: 🛑 Failed - View run
  • C++ post-commit tests: 🛑 Workflow Failed - Check logs for details
  • Blackhole tests: ⚪ Not run

🔗 Links

📊 Post-commit workflow: #20428364068

@github-actions
Copy link
Contributor

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

  • Wormhole tests: 🟢 Enabled
  • Blackhole tests: ⚪ Disabled

Test Results:

🔗 Links

📊 Post-commit workflow: #20460108882

@github-actions
Copy link
Contributor

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

  • Wormhole tests: 🟢 Enabled
  • Blackhole tests: ⚪ Disabled

Test Results:

🔗 Links

📊 Post-commit workflow: #20462076957

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

metal-post-commit-tests This label is used for running tt-metal post-commit tests: wormhole and blackhole test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework wormhole

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants