Ldjurovic/fast exp new #897

ldjurovicTT · 2025-12-01T08:51:15Z

Ticket

Problem description

Make calculating exponential in fast and approx mode faster

What's changed

Moved both sanitization and calculation to one LOADMACRO.

github-actions · 2025-12-01T08:51:25Z

Thank you for your contribution! 🚀

You can run tt-metal integration tests by adding the blackhole-integration-tests and/or wormhole-integration-tests labels to this pull request.

If you want to run metal post-commit tests, you can add the metal-post-commit-tests label to this pull request.

📖 For more information, please refer to our CONTRIBUTING guide.

Copilot

Pull request overview

This PR optimizes exponential calculation in fast approximation mode by consolidating the sanitization and calculation steps into a single LOADMACRO sequence that is recorded and replayed using the lltt::replay mechanism. This reduces instruction overhead and improves performance.

Key Changes

Replaced ~100 lines of manual LOADMACRO invocations with ~25 lines using the replay buffer approach
Updated threshold value from -88.5 to -86.6 and adjusted B_MINUS_C constant
Added comprehensive test suite for fast exponential approximation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_exp.h`	Refactored fast approximation mode to use replay buffer with 24 recorded instructions; updated constants and LOADMACRO setup
`tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_exp.h`	Similar refactoring for Blackhole architecture with 16 recorded instructions; includes variable rename from `in` to `val`
`tests/sources/fast_exp_test.cpp`	New C++ test implementation for fast exponential calculation across all TRISC kernels
`tests/python_tests/test_fast_exp.py`	New Python test suite with multiple input dimensions and format configurations
`tests/python_tests/helpers/utils.py`	Extended `passed_test` function to support custom tolerances and one-face checking

Comments suppressed due to low confidence (1)

tests/python_tests/test_fast_exp.py:72

Variable generate_golden is not used.

    generate_golden = get_golden_generator(UnarySFPUGolden)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/python_tests/helpers/utils.py

tests/python_tests/test_fast_exp.py

tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_exp.h

Copilot · 2025-12-01T08:55:38Z

tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_exp.h

+        TTI_SFPCONFIG(0x0000, 0x4, 0x0); // Load it into macro sequence register 0 (destination = 4)
+
+        TTI_SFPCONFIG(
+            0x0010, 0x8 /*LOADMACRO control*/, 0x1); // Specifies that the store in LOAMACRO “Sequence 0” will inherit the instr_mod0 field from the LOADMACRO


Typo in comment: "LOAMACRO" should be "LOADMACRO".

Suggested change

0x0010, 0x8 /*LOADMACRO control*/, 0x1); // Specifies that the store in LOAMACRO “Sequence 0” will inherit the instr_mod0 field from the LOADMACRO

0x0010, 0x8 /*LOADMACRO control*/, 0x1); // Specifies that the store in LOADMACRO “Sequence 0” will inherit the instr_mod0 field from the LOADMACRO

tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_exp.h

fvranicTT · 2025-12-03T09:41:15Z

tests/python_tests/helpers/utils.py

    L1_to_L1_iterations: int = 1,
+    custom_rtol: float = None,
+    custom_atol: float = None,
+    one_face_check: bool = False,


better yet, set num_faces and default it to 4. and then in your case just set 1. That sounds like something more scalable.

fvranicTT · 2025-12-03T09:42:27Z

tests/python_tests/test_fast_exp.py

+    input_dimensions=[[32, 32], [32, 64], [64, 32], [64, 64], [128, 32], [32, 128]],
+    approx_mode=[ApproximationMode.Yes],
+    mathop=[MathOperation.Exp],
+    dest_acc=[DestAccumulation.No],  # , DestAccumulation.Yes],


DestAccumulation.Yes?

fvranicTT · 2025-12-03T09:43:29Z

tests/python_tests/test_fast_exp.py

+        golden_tensor,
+        res_tensor,
+        formats.output_format,
+        custom_atol=0.1,


this is big, can we go lower?

fvranicTT · 2025-12-03T09:43:57Z

tests/sources/fast_exp_test.cpp

+using namespace ckernel;
+using namespace ckernel::sfpu;
+
+const int iterations = 32;


Suggested change

const int iterations = 32;

Seems unused.

fvranicTT · 2025-12-03T09:51:14Z

tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_exp.h

+        TTI_SFPLOADI(0, 0xA, lo16(B_MINUS_C));
+        TTI_SFPLOADI(0, 0x8, hi16(B_MINUS_C));


Suggested change

TTI_SFPLOADI(0, 0xA, lo16(B_MINUS_C));

TTI_SFPLOADI(0, 0x8, hi16(B_MINUS_C));

TTI_SFPLOADI(ckernel::p_sfpu::LREG0, sfpi::SFPLOADI_MOD0_LOWER, lo16(B_MINUS_C));

TTI_SFPLOADI(ckernel::p_sfpu::LREG0, sfpi::SFPLOADI_MOD0_UPPER, hi16(B_MINUS_C));

It'd be nice to replace the magic numbers with some constants. It's much easier to read the code later.

github-actions · 2025-12-03T12:58:01Z

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

Wormhole tests: 🟢 Enabled
Blackhole tests: ⚪ Disabled

Test Results:

Wormhole tests: 🛑 Failed - View run
C++ post-commit tests: 🛑 Failed - View run
Blackhole tests: ⚪ Not run

🔗 Links

📊 Post-commit workflow: #19889451922

…ng in metal

… VectorMode::RC

github-actions · 2025-12-22T11:14:03Z

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

Wormhole tests: 🟢 Enabled
Blackhole tests: ⚪ Disabled

Test Results:

Wormhole tests: 🛑 Failed - View run
C++ post-commit tests: 🛑 Workflow Failed - Check logs for details
Blackhole tests: ⚪ Not run

🔗 Links

📊 Post-commit workflow: #20427636630

github-actions · 2025-12-22T12:42:10Z

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

Wormhole tests: 🟢 Enabled
Blackhole tests: ⚪ Disabled

Test Results:

Wormhole tests: 🛑 Failed - View run
C++ post-commit tests: 🛑 Workflow Failed - Check logs for details
Blackhole tests: ⚪ Not run

🔗 Links

📊 Post-commit workflow: #20428364068

github-actions · 2025-12-23T17:59:54Z

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

Wormhole tests: 🟢 Enabled
Blackhole tests: ⚪ Disabled

Test Results:

🔗 Links

📊 Post-commit workflow: #20460108882

github-actions · 2025-12-24T00:00:44Z

🚀 tt-metal post-commit tests

Branch: ldjurovic/fast_exp_new
Test Configuration:

Wormhole tests: 🟢 Enabled
Blackhole tests: ⚪ Disabled

Test Results:

🔗 Links

📊 Post-commit workflow: #20462076957

Copilot AI review requested due to automatic review settings December 1, 2025 08:51

ldjurovicTT requested review from amahmudTT, amokanTT, fvranicTT, nvelickovicTT, rdjogoTT, rtawfik01, skotaracTT, skrsmanovicTT, sstanisicTT and vkrsmanovicTT as code owners December 1, 2025 08:51

Copilot started reviewing on behalf of ldjurovicTT December 1, 2025 08:51 View session

github-actions bot added blackhole test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework wormhole labels Dec 1, 2025

Copilot finished reviewing on behalf of ldjurovicTT December 1, 2025 08:54

Copilot AI reviewed Dec 1, 2025

View reviewed changes

fvranicTT reviewed Dec 1, 2025

View reviewed changes

tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_exp.h Outdated Show resolved Hide resolved

ldjurovicTT force-pushed the ldjurovic/fast_exp_new branch from 181479b to a605af0 Compare December 2, 2025 14:43

github-actions bot removed the blackhole label Dec 2, 2025

ldjurovicTT enabled auto-merge December 2, 2025 18:28

fvranicTT reviewed Dec 3, 2025

View reviewed changes

ldjurovicTT added the metal-post-commit-tests This label is used for running tt-metal post-commit tests: wormhole and blackhole label Dec 3, 2025

fvranicTT reviewed Dec 3, 2025

View reviewed changes

ldjurovicTT force-pushed the ldjurovic/fast_exp_new branch from 37a5248 to 39aba02 Compare December 3, 2025 10:41

ldjurovicTT added 21 commits December 22, 2025 09:25

New replay with 20 instructions; 16.6 us

1043916

Removed 2 more nops

ee72801

New replay with 20 instructions; 16.6 us

3186aae

Using enums

bf38a12

Code more readable

dfdef4c

Better overlaping; 15.6 us

b3b2534

More overlaping; 14.58 us

88a68bc

LREG names from p_sfpu

4cdd91e

Add comment

5585f11

Falling back to old implementation due to sumerical errors; not worki…

34b7e69

…ng in metal

Fast approx standalone passing on BH

9f2e994

Single core perf on 15.85 us with optimized old algorithm and replay;…

def8a1a

… VectorMode::RC

Fp32 model with Hifi3 passing on multiple cores in metal; 15.95us

c69224a

Multi core test passing in metal; 18.55 us

3d7719d

Corrected comment

84cc0a8

Clean test; multiple tiles; fix typo in llk

6f6fdaf

Rebase and uplift test again

4e15997

Some pr suggestions

c6b2378

Cleanup

9eae7d2

Skip for BH

8a4623f

BH fast exp from main

a2f80dd

ldjurovicTT force-pushed the ldjurovic/fast_exp_new branch from ba3c2e9 to a2f80dd Compare December 22, 2025 09:25

Passing in llk

9238411

ldjurovicTT added 3 commits December 23, 2025 11:58

Fp32 initial work

d1f7ad1

Inits

10b650b

Use default modifier 0

46ae6eb

	0x0010, 0x8 /LOADMACRO control/, 0x1); // Specifies that the store in LOAMACRO “Sequence 0” will inherit the instr_mod0 field from the LOADMACRO
	0x0010, 0x8 /LOADMACRO control/, 0x1); // Specifies that the store in LOADMACRO “Sequence 0” will inherit the instr_mod0 field from the LOADMACRO

		TTI_SFPLOADI(0, 0xA, lo16(B_MINUS_C));
		TTI_SFPLOADI(0, 0x8, hi16(B_MINUS_C));

Ldjurovic/fast exp new #897

Are you sure you want to change the base?

Ldjurovic/fast exp new #897

Uh oh!

Conversation

ldjurovicTT commented Dec 1, 2025

Ticket

Problem description

What's changed

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fvranicTT Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

fvranicTT Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

fvranicTT Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

fvranicTT Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

fvranicTT Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 3, 2025

🚀 tt-metal post-commit tests

🔗 Links

Uh oh!

github-actions bot commented Dec 22, 2025

🚀 tt-metal post-commit tests

🔗 Links

Uh oh!

github-actions bot commented Dec 22, 2025

🚀 tt-metal post-commit tests

🔗 Links

Uh oh!

github-actions bot commented Dec 23, 2025

🚀 tt-metal post-commit tests

🔗 Links

Uh oh!

github-actions bot commented Dec 24, 2025

🚀 tt-metal post-commit tests

🔗 Links

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants