-
Notifications
You must be signed in to change notification settings - Fork 22
infra 🚧: poll device assertions #903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution! 🚀 You can run tt-metal integration tests by adding the If you want to run metal post-commit tests, you can add the 📖 For more information, please refer to our CONTRIBUTING guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces device assertion polling functionality that enables immediate detection of device assertions during test execution instead of waiting for timeouts. The implementation adds a new DeviceAssertionError exception class and modifies the polling loop to check for assertions periodically.
Key Changes:
- Added polling for device assertions in
wait_for_tensix_operations_finishedto detect failures early - Created a new
DeviceAssertionErrorexception with acoresattribute containing the affected RiscCore enums - Added a test (
test_device_assert_hit.py) to verify assertion detection works correctly
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tests/sources/device_assert_hit_test.cpp |
New C++ test kernel that intentionally triggers an assertion on TRISC1 (math core) to validate the polling mechanism |
tests/python_tests/test_device_assert_hit.py |
New Python test that verifies DeviceAssertionError is raised and captures the correct core information |
tests/python_tests/helpers/device.py |
Introduces DeviceAssertionError class, modifies handle_if_assert_hit to raise it instead of a generic exception, adds assertion polling to the wait loop, and removes unused HardwareController import |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
Co-authored-by: Copilot <[email protected]>
|
I gave up trying to implement resetting a single Tensix core after a runtime assertion. For now once a runtime assertion happens the board gets reset so the following tests can pass correctly. Once I get a response from the Tensix team I will fix this. |
Ticket
Problem description
Add device assertion polling so that we can implement assertion testing without waiting for the test to timeout
Also add a simple test to check if we can catch an assert from the device
What's changed
Type of change
Checklist