-
Notifications
You must be signed in to change notification settings - Fork 22
llk_san 🧼: sanitizer architecture #961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution! 🚀 You can run tt-metal integration tests by adding the If you want to run metal post-commit tests, you can add the 📖 For more information, please refer to our CONTRIBUTING guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces the base architecture for a hardware state sanitizer (LLK Sanitizer) that tracks and validates hardware configuration state across unpacker, math, and pack operations. The implementation provides a type-safe state tracking system with three state types (determinate, indeterminate, dontcare) and includes comprehensive unit tests.
Key Changes
- Introduced
state_t<T>template class with three-state semantics for tracking hardware configuration state - Implemented configuration and validation functions for unpack, math, and pack operations
- Added unit tests verifying state transitions and validation logic for all three operation types
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| tt_llk_wormhole_b0/llk_lib/llk_assert.h | Added LLK_PANIC macro for inverted assertion conditions |
| tt_llk_common/llk_san_types.h | Core type system including state_t template and hardware state structures |
| tt_llk_common/llk_san_output.h | Assertion and panic macros for sanitizer validation |
| tt_llk_common/llk_san_impl.h | Implementation of configuration and validation functions |
| tt_llk_common/llk_san_extended.h | Placeholder for future extended state mask functionality |
| tt_llk_common/llk_san_backend.h | Backend configuration (thread index placeholder) |
| tt_llk_common/llk_san.h | Main API facade delegating to implementation functions |
| tests/sources/sanitizer/san_hw_config_test.cpp | Comprehensive unit tests for all three operation types |
| tests/python_tests/sanitizer/test_san_hw_config.py | Python test wrapper |
| tests/helpers/src/trisc.cpp | Global state variable definitions |
| tests/Makefile | Updated include paths and source directory search order |
| setup_clangd.sh | Added tt_llk_common to include paths for IDE support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tt_llk_common/llk_san_impl.h
Outdated
| LLK_PANIC(reconfig && !state.is_configured, "panic: llk_san: user tried to reconfigure packer before configuring it"); | ||
| LLK_PANIC(!reconfig && state.is_configured, "panic: llk_san: user tried to configure packer twice"); |
Copilot
AI
Dec 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The panic checks in pack_hw_configure_impl are in reverse order compared to unpack_hw_configure_impl and math_hw_configure_impl. For consistency and maintainability, the order should match: check reconfigure-before-configure first (line 111), then check double-configure second (line 112).
| LLK_PANIC(reconfig && !state.is_configured, "panic: llk_san: user tried to reconfigure packer before configuring it"); | |
| LLK_PANIC(!reconfig && state.is_configured, "panic: llk_san: user tried to configure packer twice"); | |
| LLK_PANIC(!reconfig && state.is_configured, "panic: llk_san: user tried to configure packer twice"); | |
| LLK_PANIC(reconfig && !state.is_configured, "panic: llk_san: user tried to reconfigure packer before configuring it"); |
tt_llk_common/llk_san_types.h
Outdated
| return this->template operator= <T>(rhs); | ||
| } | ||
|
|
||
| state_t& operator=(state_t&& rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should also be a noexcept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think of it, to complete the rule of 5, you need to define a destructor. Sure, it will be defined implicitly, but since you already have the other 4, adding one more line won't hurt.
| return *this; | ||
| } | ||
|
|
||
| state_type = rhs.state_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work if U != T? state_t<int> and state_t<float> are different classes, they can't access each other's private members without a friend declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was sure I added a friend declaration somewhere to avoid this exact problem. Might have accidentally removed it accidentally
| }; | ||
|
|
||
| template <typename T, typename U> | ||
| static inline bool _assert_condition(const state_t<T>& lhs, const state_t<U>& rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static inline is redundant for function templates in headers, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL template implies static inline on functions. Though it makes sense
tt_llk_common/llk_san_types.h
Outdated
| return *this; | ||
| } | ||
|
|
||
| bool is_determinate() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be constexpr, right?
| return state_type == state_type_t::dontcare; | ||
| } | ||
|
|
||
| const T& get_underlying() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be constexpr.
tt_llk_common/llk_san_types.h
Outdated
| typename U, | ||
| typename = std::enable_if_t< | ||
| !is_state_t_v<std::decay_t<U>> && !std::is_same_v<std::decay_t<U>, dontcare_t> && !std::is_same_v<std::decay_t<U>, indeterminate_t>>> | ||
| constexpr state_t(U&& value) : underlying(std::forward<U>(value)), state_type(state_type_t::determinate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| constexpr state_t(U&& value) : underlying(std::forward<U>(value)), state_type(state_type_t::determinate) | |
| constexpr state_t(U&& value) noexcept(std::is_nothrow_constructible_v<T, U&&>) : underlying(std::forward<U>(value)), state_type(state_type_t::determinate) |
tt_llk_common/llk_san_types.h
Outdated
| template <typename U> | ||
| state_t& operator=(state_t<U>&& rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| template <typename U> | |
| state_t& operator=(state_t<U>&& rhs) | |
| template <typename U, typename = std::enable_if_t<std::is_same_v<T, U> || std::is_assignable_v<T&, U&&>>> | |
| state_t& operator=(state_t<U>&& rhs) noexcept(std::is_nothrow_assignable_v<T&, U&&>) |
Ditto. I hope I haven't made a typo somewhere...
|
|
||
| #define LLK_SAN_ASSERT(lhs, rhs, message) LLK_ASSERT(llk_san::_assert_condition(lhs, rhs), message) | ||
|
|
||
| #define LLK_SAN_PANIC(lhs, rhs, message) LLK_ASSERT(llk_san::_panic_condition(lhs, rhs), message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is not used much in the code (one other mention in commented out section). So should we be using LLK_PANIC instead? It's easy to confuse the macros at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This define will be modified in the future to have different behavior in tt-metal so I wanted to prepare for it. I will consider possible better names to remove the confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that you meant should we be using LLK_PANIC instead of LLK_ASSERT for LLK_SAN_PANIC. The answer is that this is currently correct code due to how _panic_condition is implemented. I will figure out a way to refactor this so that it makes more sense logically.
c17f3e3 to
3cb0b49
Compare
Ticket
LLK Sanitizer base architecture PR
Problem description
Provides the architecture for HW state sanitizer + unit test. The bindings in
llk_libwill be part of a separate PR that will come after HW configure functions are unified.What's changed
Type of change
Checklist