Skip to content

Conversation

@vkrsmanovicTT
Copy link
Contributor

Ticket

N/A - Initial SFPU rsqrt implementation for Quasar architecture
Problem description

The Quasar architecture needed an implementation of the rsqrt (reciprocal square root: 1/sqrt(x)) SFPU operation.
What's changed
Core Implementation:
Added ckernel_sfpu_rsqrt.h for Quasar with SFPU instructions to compute rsqrt using hardware sqrt and reciprocal operations
Implemented rsqrt by chaining SQRT_MODE and RECIP_MODE SFPU nonlinear instructions
Test Infrastructure:
Added test_sfpu_rsqrt_quasar.py Python test with random input generation in range [0.01, 1.0]
Added sfpu_rsqrt_quasar_test.cpp C++ kernel implementing datacopy + SFPU pipeline
Current Status:
Implementation is working for a limited set of format combinations (Float16 input/output)
Test sweep currently covers basic approximation mode and destination accumulation variants
Test sweep will be expanded in future updates to cover additional data formats, tile sizes, and edge cases
Type of change
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Documentation update

@github-actions
Copy link
Contributor

Thank you for your contribution! 🚀
If you want to run metal post-commit tests, you can add the metal-post-commit-tests label to this pull request.
📖 For more information, please refer to our CONTRIBUTING guide.

@github-actions github-actions bot added quasar test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework labels Dec 30, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the rsqrt (reciprocal square root: 1/sqrt(x)) SFPU operation for the Quasar architecture. The implementation chains hardware SQRT and RECIP operations to compute rsqrt, adds comprehensive test infrastructure with Python and C++ tests, and includes supporting changes to helper utilities.

  • Adds core rsqrt kernel implementation by chaining SQRT_MODE and RECIP_MODE SFPU instructions
  • Implements test infrastructure with random input generation in range [0.01, 1.0] and golden reference validation
  • Enhances PCC calculation to better handle edge cases with masked/invalid values

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tt_llk_quasar/llk_lib/llk_defs.h Adds rsqrt enum value to SfpuType enumeration
tt_llk_quasar/common/inc/sfpu/ckernel_sfpu_rsqrt.h Core rsqrt implementation chaining sqrt and reciprocal SFPU operations
tests/sources/quasar/sfpu_rsqrt_quasar_test.cpp C++ kernel test implementing datacopy + SFPU rsqrt pipeline
tests/python_tests/quasar/test_sfpu_rsqrt_quasar.py Python test with parametrized configurations and golden reference validation
tests/python_tests/helpers/utils.py Refactors PCC calculation to handle edge cases more robustly
tests/python_tests/helpers/test_variant_parameters.py Updates operation constant generation to handle Quasar SfpuType namespace
tests/python_tests/helpers/test_config.py Adds null check for dest_acc before format inference
tests/python_tests/helpers/device.py Skips assert handling for Quasar architecture

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fvranicTT and others added 7 commits December 31, 2025 13:08
- Implement rsqrt operation test for Quasar architecture
- Fix dvalid synchronization for 3-stage FPU->SFPU->PACK pipeline
- Call set_up_dest_dvalid_per_thread twice (FPU and SFPU) in MATH kernel
- Support Float16, Float16_b, and Float32 formats
- Support 32x32 and 64x64 tile dimensions
- Skip unsupported format combinations (non-Float32->Float32 with dest_acc=No, Float32->Float16 with dest_acc=No)
- Use _llk_math_eltwise_unary_sfpu_params_ wrapper for proper face iteration
- Correct SFPU iteration count per face (not total)
- Add wait_idle calls at end of MATH kernel
@vkrsmanovicTT vkrsmanovicTT added this pull request to the merge queue Jan 21, 2026
Merged via the queue into main with commit ab381b4 Jan 21, 2026
32 checks passed
@vkrsmanovicTT vkrsmanovicTT deleted the sfpu_merge branch January 21, 2026 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quasar test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants