Add SFPU eltwise max test #1008

ldjurovicTT · 2025-12-26T13:51:09Z

Ticket

no

Problem description

We are not testing sfpu eltwise max anywhere explicitly. Also this will serve as starting point for some optimizations needed to be done for SDPA. Test is AI generated so I guess I will have to pass through it once to make sure everything is written fine

What's changed

Added python and cpp test

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

github-actions · 2025-12-26T13:51:21Z

Thank you for your contribution! 🚀
If you want to run metal post-commit tests, you can add the metal-post-commit-tests label to this pull request.
📖 For more information, please refer to our CONTRIBUTING guide.

Copilot

Pull request overview

This PR adds comprehensive test coverage for the SFPU eltwise max operation, which was previously untested. The implementation includes both C++ kernel tests and Python validation tests to verify correctness across multiple data formats and architectures.

Key changes:

Added C++ kernel test (sfpu_eltwise_max_test.cpp) with unpack, math, and pack sections for hardware validation
Added Python test (test_sfpu_eltwise_max.py) with parametrized test cases covering Float32, Float16, Float16_b, and Bfp8_b formats
Test validates element-wise max operation by comparing two tiles and storing the result

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
tests/sources/sfpu_eltwise_max_test.cpp	Implements low-level kernel test that unpacks 2 tiles, performs element-wise max comparison using `_calculate_max_`, and packs results back to L1 memory
tests/python_tests/test_sfpu_eltwise_max.py	Python test driver that generates stimuli, executes the kernel test, and validates results against PyTorch's `torch.maximum` golden reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-26T13:54:56Z

tests/python_tests/test_sfpu_eltwise_max.py

+    res_address = write_stimuli_to_l1(
+        test_config,
+        src_A,
+        src_B,


The src_B variable is generated but never used in this test. The eltwise max operation only uses src_A which contains 2 tiles. Consider either removing src_B from the generate_stimuli call or updating the function to not generate it if it's not needed.

Copilot · 2025-12-26T13:54:56Z

tests/python_tests/test_sfpu_eltwise_max.py

+        "testname": test_name,
+        "dest_acc": dest_acc,
+        "input_A_dimensions": input_dimensions,
+        "input_B_dimensions": input_dimensions,


The input_B_dimensions parameter is set but not actually used since the test only operates on src_A data. Consider removing this parameter from the test_config or setting it to None/an empty list to clarify that only input A is being used.

Suggested change

"input_B_dimensions": input_dimensions,

"input_B_dimensions": None,

Copilot · 2025-12-26T13:54:56Z

tests/python_tests/test_sfpu_eltwise_max.py

+    # Force dest_acc for certain formats
+    if formats.input_format in [DataFormat.Float16, DataFormat.Float32]:
+        dest_acc = DestAccumulation.Yes


The dest_acc parameter is overridden for Float16 and Float32 formats, but this happens after it's already been parametrized in the test. This means the test will effectively ignore the DestAccumulation.No parameter for these formats. Consider either removing DestAccumulation.No from the parametrization for these formats, or documenting why it's overridden.

Suggested change

# Force dest_acc for certain formats

if formats.input_format in [DataFormat.Float16, DataFormat.Float32]:

dest_acc = DestAccumulation.Yes

# For Float16/Float32, destination accumulation must be enabled in hardware.

# If the test was parametrized with dest_acc=No for these formats, skip this

# combination instead of silently overriding the parameter value.

if (

formats.input_format in [DataFormat.Float16, DataFormat.Float32]

and dest_acc == DestAccumulation.No

):

pytest.skip(

"dest_acc=No is not supported for Float16/Float32; destination accumulation must be enabled."

)

Copilot · 2025-12-26T13:54:57Z

tests/sources/sfpu_eltwise_max_test.cpp

+    for (int i = 0; i < TILE_CNT; i++)
+    {
+        _llk_pack_<DstSync::SyncHalf, is_fp32_dest_acc_en, false>(i, L1_ADDRESS(buffer_Res[i]));
+    }


The pack loop packs TILE_CNT tiles back to L1, but only the first tile contains the result of the max operation. The second tile remains unchanged from its input value. Consider packing only the first tile (i = 0) since that's the only one that contains meaningful output.

Suggested change

for (int i = 0; i < TILE_CNT; i++)

{

_llk_pack_<DstSync::SyncHalf, is_fp32_dest_acc_en, false>(i, L1_ADDRESS(buffer_Res[i]));

}

_llk_pack_<DstSync::SyncHalf, is_fp32_dest_acc_en, false>(0, L1_ADDRESS(buffer_Res[0]));

tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_max.h

tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_max.h

tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_max.h

tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_max.h

ldjurovicTT · 2025-12-30T15:42:14Z

@fvranicTT @nvelickovicTT I still need to measure peft difference between old sfpi and new TTI code and see if this really gives us perf gain

fvranicTT · 2026-01-12T09:57:53Z

tests/python_tests/perf_eltwise_max.py

+    ],  # Number of iterations to run the test in order to minimize profiler overhead in measurement
+    input_dimensions=[
+        [32, 32],
+        # [32, 64],


Are these here to stay?

I will see once I fix CI errors I am getting

tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_max.h

Copilot AI review requested due to automatic review settings December 26, 2025 13:51

ldjurovicTT requested review from amokanTT, fvranicTT, nvelickovicTT, skotaracTT, skrsmanovicTT, sstanisicTT and vkrsmanovicTT as code owners December 26, 2025 13:51

Copilot started reviewing on behalf of ldjurovicTT December 26, 2025 13:51 View session

github-actions bot added the test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework label Dec 26, 2025

Copilot AI reviewed Dec 26, 2025

View reviewed changes

ldjurovicTT requested review from amahmudTT, lpremovicTT, ncvetkovicTT, rdjogoTT and rtawfik01 as code owners December 26, 2025 15:20

github-actions bot added the wormhole label Dec 26, 2025

ldjurovicTT force-pushed the ldjurovic/sfpu_elwmax_test branch from 30fe32d to 8d06a61 Compare December 29, 2025 13:41

fvranicTT reviewed Dec 29, 2025

View reviewed changes

tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_max.h Outdated Show resolved Hide resolved

fvranicTT reviewed Dec 29, 2025

View reviewed changes

tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_max.h Outdated Show resolved Hide resolved

fvranicTT reviewed Dec 29, 2025

View reviewed changes

tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_max.h Outdated Show resolved Hide resolved

github-actions bot added the blackhole label Dec 30, 2025

nvelickovicTT reviewed Dec 30, 2025

View reviewed changes

tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_max.h Outdated Show resolved Hide resolved

nvelickovicTT reviewed Dec 30, 2025

View reviewed changes

tt_llk_wormhole_b0/common/inc/sfpu/ckernel_sfpu_max.h Outdated Show resolved Hide resolved

github-actions bot added the performance label Dec 30, 2025

fvranicTT reviewed Dec 30, 2025

View reviewed changes

tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_max.h Outdated Show resolved Hide resolved

ldjurovicTT force-pushed the ldjurovic/sfpu_elwmax_test branch from 0a335b7 to 02f4ee6 Compare December 30, 2025 16:08

fvranicTT mentioned this pull request Jan 8, 2026

Remove obsolete max/max_int32 LLKs. #1060

Open

6 tasks

ldjurovicTT added 11 commits January 12, 2026 09:52

Initial test

5cd2343

Rewrite LLK as TTI

e717322

Multiple tiles

849f853

Forgot to save

0f720da

Uplift to new infra

5a2a41f

Remove redundant header includes

5d08553

Add blackhole code

16b0213

Initial perf test

695d50f

cleanup and fixes

0b09e7d

Fix assertion on lengths

09b55c7

Rebase and fix typo

0c8b17f

ldjurovicTT force-pushed the ldjurovic/sfpu_elwmax_test branch from 91fbc8f to 0c8b17f Compare January 12, 2026 09:55

fvranicTT reviewed Jan 12, 2026

View reviewed changes

ldjurovicTT added 4 commits January 12, 2026 12:23

Separating init and calculate. Plain TTI with replay

5ce27cb

BH addrmod update

c3e2bef

SDPA specific row 0 kernel

7357ce1

dummy reshiffle code

d75bc30

fvranicTT reviewed Jan 12, 2026

View reviewed changes

tt_llk_blackhole/common/inc/sfpu/ckernel_sfpu_max.h Show resolved Hide resolved

Hiding needed nops with STORES

476e301

	"input_B_dimensions": input_dimensions,
	"input_B_dimensions": None,

-    # Force dest_acc for certain formats
-    if formats.input_format in [DataFormat.Float16, DataFormat.Float32]:
-        dest_acc = DestAccumulation.Yes
+    # For Float16/Float32, destination accumulation must be enabled in hardware.
+    # If the test was parametrized with dest_acc=No for these formats, skip this
+    # combination instead of silently overriding the parameter value.
+    if (
+        formats.input_format in [DataFormat.Float16, DataFormat.Float32]
+        and dest_acc == DestAccumulation.No
+    ):
+        pytest.skip(
+            "dest_acc=No is not supported for Float16/Float32; destination accumulation must be enabled."
+        )

Add SFPU eltwise max test #1008

Are you sure you want to change the base?

Add SFPU eltwise max test #1008

Conversation

ldjurovicTT commented Dec 26, 2025

Ticket

Problem description

What's changed

Type of change

Uh oh!

github-actions bot commented Dec 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldjurovicTT commented Dec 30, 2025

Uh oh!

fvranicTT Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

ldjurovicTT Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants