Skip to content

Conversation

@sstanisicTT
Copy link
Contributor

Ticket

LLK Sanitizer base architecture PR

Problem description

Provides the architecture for HW state sanitizer + unit test. The bindings in llk_lib will be part of a separate PR that will come after HW configure functions are unified.

What's changed

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist

@github-actions
Copy link
Contributor

Thank you for your contribution! 🚀

You can run tt-metal integration tests by adding the blackhole-integration-tests and/or wormhole-integration-tests labels to this pull request.

If you want to run metal post-commit tests, you can add the metal-post-commit-tests label to this pull request.

📖 For more information, please refer to our CONTRIBUTING guide.

@sstanisicTT sstanisicTT changed the title Sstanisic/san arch pr llk_san 🧼: sanitizer hw state tracking architecture Dec 17, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the base architecture for a hardware state sanitizer (LLK Sanitizer) that tracks and validates hardware configuration state across unpacker, math, and pack operations. The implementation provides a type-safe state tracking system with three state types (determinate, indeterminate, dontcare) and includes comprehensive unit tests.

Key Changes

  • Introduced state_t<T> template class with three-state semantics for tracking hardware configuration state
  • Implemented configuration and validation functions for unpack, math, and pack operations
  • Added unit tests verifying state transitions and validation logic for all three operation types

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
tt_llk_wormhole_b0/llk_lib/llk_assert.h Added LLK_PANIC macro for inverted assertion conditions
tt_llk_common/llk_san_types.h Core type system including state_t template and hardware state structures
tt_llk_common/llk_san_output.h Assertion and panic macros for sanitizer validation
tt_llk_common/llk_san_impl.h Implementation of configuration and validation functions
tt_llk_common/llk_san_extended.h Placeholder for future extended state mask functionality
tt_llk_common/llk_san_backend.h Backend configuration (thread index placeholder)
tt_llk_common/llk_san.h Main API facade delegating to implementation functions
tests/sources/sanitizer/san_hw_config_test.cpp Comprehensive unit tests for all three operation types
tests/python_tests/sanitizer/test_san_hw_config.py Python test wrapper
tests/helpers/src/trisc.cpp Global state variable definitions
tests/Makefile Updated include paths and source directory search order
setup_clangd.sh Added tt_llk_common to include paths for IDE support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 111 to 112
LLK_PANIC(reconfig && !state.is_configured, "panic: llk_san: user tried to reconfigure packer before configuring it");
LLK_PANIC(!reconfig && state.is_configured, "panic: llk_san: user tried to configure packer twice");
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic checks in pack_hw_configure_impl are in reverse order compared to unpack_hw_configure_impl and math_hw_configure_impl. For consistency and maintainability, the order should match: check reconfigure-before-configure first (line 111), then check double-configure second (line 112).

Suggested change
LLK_PANIC(reconfig && !state.is_configured, "panic: llk_san: user tried to reconfigure packer before configuring it");
LLK_PANIC(!reconfig && state.is_configured, "panic: llk_san: user tried to configure packer twice");
LLK_PANIC(!reconfig && state.is_configured, "panic: llk_san: user tried to configure packer twice");
LLK_PANIC(reconfig && !state.is_configured, "panic: llk_san: user tried to reconfigure packer before configuring it");

Copilot uses AI. Check for mistakes.
return this->template operator= <T>(rhs);
}

state_t& operator=(state_t&& rhs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also be a noexcept.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think of it, to complete the rule of 5, you need to define a destructor. Sure, it will be defined implicitly, but since you already have the other 4, adding one more line won't hurt.

return *this;
}

state_type = rhs.state_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work if U != T? state_t<int> and state_t<float> are different classes, they can't access each other's private members without a friend declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sure I added a friend declaration somewhere to avoid this exact problem. Might have accidentally removed it accidentally

};

template <typename T, typename U>
static inline bool _assert_condition(const state_t<T>& lhs, const state_t<U>& rhs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static inline is redundant for function templates in headers, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL template implies static inline on functions. Though it makes sense

return *this;
}

bool is_determinate() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be constexpr, right?

return state_type == state_type_t::dontcare;
}

const T& get_underlying() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be constexpr.

typename U,
typename = std::enable_if_t<
!is_state_t_v<std::decay_t<U>> && !std::is_same_v<std::decay_t<U>, dontcare_t> && !std::is_same_v<std::decay_t<U>, indeterminate_t>>>
constexpr state_t(U&& value) : underlying(std::forward<U>(value)), state_type(state_type_t::determinate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr state_t(U&& value) : underlying(std::forward<U>(value)), state_type(state_type_t::determinate)
constexpr state_t(U&& value) noexcept(std::is_nothrow_constructible_v<T, U&&>) : underlying(std::forward<U>(value)), state_type(state_type_t::determinate)

Comment on lines 113 to 114
template <typename U>
state_t& operator=(state_t<U>&& rhs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename U>
state_t& operator=(state_t<U>&& rhs)
template <typename U, typename = std::enable_if_t<std::is_same_v<T, U> || std::is_assignable_v<T&, U&&>>>
state_t& operator=(state_t<U>&& rhs) noexcept(std::is_nothrow_assignable_v<T&, U&&>)

Ditto. I hope I haven't made a typo somewhere...


#define LLK_SAN_ASSERT(lhs, rhs, message) LLK_ASSERT(llk_san::_assert_condition(lhs, rhs), message)

#define LLK_SAN_PANIC(lhs, rhs, message) LLK_ASSERT(llk_san::_panic_condition(lhs, rhs), message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is not used much in the code (one other mention in commented out section). So should we be using LLK_PANIC instead? It's easy to confuse the macros at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This define will be modified in the future to have different behavior in tt-metal so I wanted to prepare for it. I will consider possible better names to remove the confusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that you meant should we be using LLK_PANIC instead of LLK_ASSERT for LLK_SAN_PANIC. The answer is that this is currently correct code due to how _panic_condition is implemented. I will figure out a way to refactor this so that it makes more sense logically.

@github-actions github-actions bot added test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework wormhole labels Dec 25, 2025
@sstanisicTT sstanisicTT changed the title llk_san 🧼: sanitizer hw state tracking architecture llk_san 🧼: sanitizer architecture Dec 25, 2025
@sstanisicTT sstanisicTT force-pushed the sstanisic/san-arch-pr branch from c17f3e3 to 3cb0b49 Compare December 25, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackhole test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework wormhole

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants