Skip to content

Conversation

@brnorris03
Copy link
Contributor

@brnorris03 brnorris03 commented Jan 5, 2026

  • Problem: TTNN package instability and configure-time availability check - ttnn package in tt-mlir/third_party/tt-metal may disappear or change, and ttlang_check_ttnn_available() checked at configure time when ttnn only available after build.
    Fix: Created cmake/modules/CopyTTNNPythonPackage.cmake to copy ttnn to build/python_packages/ttnn/ during build. Removed ttlang_check_ttnn_available() from cmake/modules/TTLangUtils.cmake. Tests use runtime checks (REQUIRES: ttnn directive, pytest.importorskip()).

  • Problem: CAPI library resolution issues - python/CMakeLists.txt could find wrong libTTMLIRPythonCAPI.so from different tt-mlir install at configure time, and Python extension could resolve wrong library at runtime, causing MLIRContext registry mismatches.
    Fix: Made configure-time search paths mutually exclusive (only search TTMLIR_BUILD_DIR when defined, else search TTMLIR_PATH). Added BUILD_RPATH and INSTALL_RPATH properties to constrain runtime library search to configured tt-mlir directories.

  • Problem: Missing pytest infrastructure and broken test files - no pytest check target or configuration, lit tests incorrectly collected by pytest, test_elementwise_ops.py imported from non-existent examples/utils (deleted in metal/tt-lang single and multi-core matmul #67), test_block_allocation.py used wrong function name new_split_work_to_cores and unsafe ttnn import.
    Fix: Added check-ttlang-pytest target to test/CMakeLists.txt. Created test/python/conftest.py with feature detection, markers, fixtures, and collect_ignore list. Recovered the deleted examples/utils.py that was required in pytests -- now test/python/utils.py. Fixed corresponding imports and function names in test files, added # REQUIRES: ttnn directive, changed to pytest.importorskip("ttnn").

  • Problem: test/python/simple_add_multitile.py fails on qb because of ordering differences.
    Fix: Use CHECK-DAG: appropriately.

@brnorris03 brnorris03 requested a review from a team as a code owner January 5, 2026 02:43
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not new, recovered and moved after examples/utils.py deleted in #67.

@brnorris03 brnorris03 force-pushed the bnorris/fix-broken-build-and-tests branch from d05598e to ce2c183 Compare January 5, 2026 07:21
Comment on lines 84 to 87
# CB operations - capture based on cb_index attribute value
# CHECK-DAG: %[[CB0:.+]] = ttl.bind_cb{cb_index = 0
# CHECK-DAG: %[[CB1:.+]] = ttl.bind_cb{cb_index = 1
# CHECK-DAG: %[[CB2:.+]] = ttl.bind_cb{cb_index = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry if this is pedantic, but for similar tests in #195 I maintained check order (no dag) and just re-ordered. I think maybe that's better than losing the wait order mapping? What if pop comes before wait? Anyway we have other tests to cover this so I don't feel too strongly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the order of independent CB binding matter? I don't think it should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dag might mean it's in another function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there are CHECK-LABEL lines (if they are missing, then they should be added, but for consts and things like ttl.bind, order within a function should not matter, that's what CHECK-DAG is for).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could still pop before wait or some other failure mode, but as I said earlier, this is non-blocking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verifier would not allow a pop or other use before a bind.

from pathlib import Path

# Add examples to path for utils
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "examples"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!


import os

os.environ["TTLANG_COMPILE_ONLY"] = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to lose this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think individual tests should need to set an environment variable. Why is it necessary (as opposed to just having a test-specific variable)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by test-specific variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if a specific test needs to disable something it should have its own variable, I think relying on environment variables is messy and fragile in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would that variable look like though? You mean a lit config? Or a parameter on the kernel decorator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know, I am not really sure what's best for python lit tests.


# Set compile-only mode if no hardware
if not _hardware_available:
os.environ["TTLANG_COMPILE_ONLY"] = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we let individual tests control compile only?

Copy link
Contributor Author

@brnorris03 brnorris03 Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym? Should be possible to set in individual tests regardless of this.

Copy link
Contributor Author

@brnorris03 brnorris03 Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests (in addition to the invalid ones) should be always compile only? And if they are, why keep the dead code for execution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a default that can be overridden, OK. I still think it might lead to some config error causing runtime checking to be disabled, but that's not a huge concern.

My opinion is that it would be better for tests to explicitly say wether they want to be compile only tests (check codegen) or runtime tests (assert runtime values), I think it makes the testing intent clearer.

Given that we can override this, I don't feel too strongly, so feel free to leave the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems clearer to have "unsupported" on local machine + "fail" on CI with hardware than "pass" on local machine and "fail" on CI with hardware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know how you can have them be no-device-required compile-only tests (the python lit ones) given that a device is required to create the inputs.

Am I missing something obvious that ttnn allows (how do I open_device successfully without a device)?

 device = ttnn.open_device(device_id=0)
...
        lhs = ttnn.from_torch(
            lhs_torch,
            dtype=ttnn.bfloat16,
            layout=ttnn.TILE_LAYOUT,
            device=device,
            memory_config=ttnn.DRAM_MEMORY_CONFIG,
        )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ttlang import make_circular_buffer_like

# CHECK: buffer_factor must be in range [1, 32]
# Validation happens in CircularBuffer.__init__, no ttnn needed
make_circular_buffer_like(None, shape=(1, 1), buffer_factor=0)

For example. But fair enough, it's a narrow case where we'd have a python test that doesn't require ttnn and I'm fine making that a blanket requirement.

By the same logic, why make compile only the default if we aren't going to run them anyway?

Regardless, I don't know if it's productive to continue down this rabbit hole. We can override the env var, so whatever you decide here is fine with me.

@brnorris03 brnorris03 force-pushed the bnorris/fix-broken-build-and-tests branch from afb5ea5 to 1bb6948 Compare January 5, 2026 22:49
@brnorris03 brnorris03 requested a review from zoecarver January 6, 2026 00:39
from utils import require_hardware

print("=== Add Kernel Test ===")
require_hardware()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this matches existing behavior, but in future, do you think this should be a lit requirement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'd much rather not require it so would suggest rewriting the tests to be hw-independent, e.g., pickle inputs instead of initializing a device for each test just to create inputs if that's the main reason it's done (the python lit tests now take >20 minutes on qb).

Copy link
Contributor

@zoecarver zoecarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!

Comment on lines +25 to +27
# CHECK-NEXT: --> {{.*}}invalid_3d_grid.py:[[LINE:[0-9]+]]:1
# CHECK-NEXT: |
# CHECK-NEXT: 34 | @ttl.kernel(grid=(1, 1, 1))
# CHECK-NEXT: [[LINE]] | @ttl.kernel(grid=(1, 1, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@brnorris03 brnorris03 enabled auto-merge (squash) January 6, 2026 16:26
@brnorris03 brnorris03 merged commit e1a639c into main Jan 6, 2026
5 checks passed
@brnorris03 brnorris03 deleted the bnorris/fix-broken-build-and-tests branch January 6, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants