Skip to content

Conversation

@ldjurovicTT
Copy link
Contributor

Ticket

no

Problem description

We are not testing sfpu eltwise max anywhere explicitly. Also this will serve as starting point for some optimizations needed to be done for SDPA. Test is AI generated so I guess I will have to pass through it once to make sure everything is written fine

What's changed

Added python and cpp test

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

@github-actions
Copy link
Contributor

Thank you for your contribution! 🚀
If you want to run metal post-commit tests, you can add the metal-post-commit-tests label to this pull request.
📖 For more information, please refer to our CONTRIBUTING guide.

@github-actions github-actions bot added the test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework label Dec 26, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive test coverage for the SFPU eltwise max operation, which was previously untested. The implementation includes both C++ kernel tests and Python validation tests to verify correctness across multiple data formats and architectures.

Key changes:

  • Added C++ kernel test (sfpu_eltwise_max_test.cpp) with unpack, math, and pack sections for hardware validation
  • Added Python test (test_sfpu_eltwise_max.py) with parametrized test cases covering Float32, Float16, Float16_b, and Bfp8_b formats
  • Test validates element-wise max operation by comparing two tiles and storing the result

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
tests/sources/sfpu_eltwise_max_test.cpp Implements low-level kernel test that unpacks 2 tiles, performs element-wise max comparison using _calculate_max_, and packs results back to L1 memory
tests/python_tests/test_sfpu_eltwise_max.py Python test driver that generates stimuli, executes the kernel test, and validates results against PyTorch's torch.maximum golden reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

res_address = write_stimuli_to_l1(
test_config,
src_A,
src_B,
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The src_B variable is generated but never used in this test. The eltwise max operation only uses src_A which contains 2 tiles. Consider either removing src_B from the generate_stimuli call or updating the function to not generate it if it's not needed.

Copilot uses AI. Check for mistakes.
"testname": test_name,
"dest_acc": dest_acc,
"input_A_dimensions": input_dimensions,
"input_B_dimensions": input_dimensions,
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input_B_dimensions parameter is set but not actually used since the test only operates on src_A data. Consider removing this parameter from the test_config or setting it to None/an empty list to clarify that only input A is being used.

Suggested change
"input_B_dimensions": input_dimensions,
"input_B_dimensions": None,

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +77
# Force dest_acc for certain formats
if formats.input_format in [DataFormat.Float16, DataFormat.Float32]:
dest_acc = DestAccumulation.Yes
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dest_acc parameter is overridden for Float16 and Float32 formats, but this happens after it's already been parametrized in the test. This means the test will effectively ignore the DestAccumulation.No parameter for these formats. Consider either removing DestAccumulation.No from the parametrization for these formats, or documenting why it's overridden.

Suggested change
# Force dest_acc for certain formats
if formats.input_format in [DataFormat.Float16, DataFormat.Float32]:
dest_acc = DestAccumulation.Yes
# For Float16/Float32, destination accumulation must be enabled in hardware.
# If the test was parametrized with dest_acc=No for these formats, skip this
# combination instead of silently overriding the parameter value.
if (
formats.input_format in [DataFormat.Float16, DataFormat.Float32]
and dest_acc == DestAccumulation.No
):
pytest.skip(
"dest_acc=No is not supported for Float16/Float32; destination accumulation must be enabled."
)

Copilot uses AI. Check for mistakes.
Comment on lines 104 to 107
for (int i = 0; i < TILE_CNT; i++)
{
_llk_pack_<DstSync::SyncHalf, is_fp32_dest_acc_en, false>(i, L1_ADDRESS(buffer_Res[i]));
}
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pack loop packs TILE_CNT tiles back to L1, but only the first tile contains the result of the max operation. The second tile remains unchanged from its input value. Consider packing only the first tile (i = 0) since that's the only one that contains meaningful output.

Suggested change
for (int i = 0; i < TILE_CNT; i++)
{
_llk_pack_<DstSync::SyncHalf, is_fp32_dest_acc_en, false>(i, L1_ADDRESS(buffer_Res[i]));
}
_llk_pack_<DstSync::SyncHalf, is_fp32_dest_acc_en, false>(0, L1_ADDRESS(buffer_Res[0]));

Copilot uses AI. Check for mistakes.
@ldjurovicTT
Copy link
Contributor Author

@fvranicTT @nvelickovicTT I still need to measure peft difference between old sfpi and new TTI code and see if this really gives us perf gain

@ldjurovicTT ldjurovicTT force-pushed the ldjurovic/sfpu_elwmax_test branch from 0a335b7 to 02f4ee6 Compare December 30, 2025 16:08
@ldjurovicTT ldjurovicTT force-pushed the ldjurovic/sfpu_elwmax_test branch from 91fbc8f to 0c8b17f Compare January 12, 2026 09:55
], # Number of iterations to run the test in order to minimize profiler overhead in measurement
input_dimensions=[
[32, 32],
# [32, 64],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these here to stay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will see once I fix CI errors I am getting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackhole performance test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework wormhole

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants