Skip to content

Conversation

@sstanisicTT
Copy link
Contributor

Ticket

Problem description

Add device assertion polling so that we can implement assertion testing without waiting for the test to timeout

Also add a simple test to check if we can catch an assert from the device

What's changed

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

Thank you for your contribution! 🚀

You can run tt-metal integration tests by adding the blackhole-integration-tests and/or wormhole-integration-tests labels to this pull request.

If you want to run metal post-commit tests, you can add the metal-post-commit-tests label to this pull request.

📖 For more information, please refer to our CONTRIBUTING guide.

@github-actions github-actions bot added the test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework label Dec 2, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces device assertion polling functionality that enables immediate detection of device assertions during test execution instead of waiting for timeouts. The implementation adds a new DeviceAssertionError exception class and modifies the polling loop to check for assertions periodically.

Key Changes:

  • Added polling for device assertions in wait_for_tensix_operations_finished to detect failures early
  • Created a new DeviceAssertionError exception with a cores attribute containing the affected RiscCore enums
  • Added a test (test_device_assert_hit.py) to verify assertion detection works correctly

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/sources/device_assert_hit_test.cpp New C++ test kernel that intentionally triggers an assertion on TRISC1 (math core) to validate the polling mechanism
tests/python_tests/test_device_assert_hit.py New Python test that verifies DeviceAssertionError is raised and captures the correct core information
tests/python_tests/helpers/device.py Introduces DeviceAssertionError class, modifies handle_if_assert_hit to raise it instead of a generic exception, adds assertion polling to the wait loop, and removes unused HardwareController import

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

@sstanisicTT
Copy link
Contributor Author

I gave up trying to implement resetting a single Tensix core after a runtime assertion. For now once a runtime assertion happens the board gets reset so the following tests can pass correctly. Once I get a response from the Tensix team I will fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test-infra This label is used for issues, pull requests, or tasks related to the LLK testing framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants