-
Notifications
You must be signed in to change notification settings - Fork 1
[fix] Bugfixes for missing ttnn package, tt-mlir CAPI library resolution, and broken pytests #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not new, recovered and moved after examples/utils.py deleted in #67.
d05598e to
ce2c183
Compare
test/python/simple_add_multitile.py
Outdated
| # CB operations - capture based on cb_index attribute value | ||
| # CHECK-DAG: %[[CB0:.+]] = ttl.bind_cb{cb_index = 0 | ||
| # CHECK-DAG: %[[CB1:.+]] = ttl.bind_cb{cb_index = 1 | ||
| # CHECK-DAG: %[[CB2:.+]] = ttl.bind_cb{cb_index = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry if this is pedantic, but for similar tests in #195 I maintained check order (no dag) and just re-ordered. I think maybe that's better than losing the wait order mapping? What if pop comes before wait? Anyway we have other tests to cover this so I don't feel too strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the order of independent CB binding matter? I don't think it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dag might mean it's in another function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there are CHECK-LABEL lines (if they are missing, then they should be added, but for consts and things like ttl.bind, order within a function should not matter, that's what CHECK-DAG is for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could still pop before wait or some other failure mode, but as I said earlier, this is non-blocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verifier would not allow a pop or other use before a bind.
| from pathlib import Path | ||
|
|
||
| # Add examples to path for utils | ||
| sys.path.insert(0, str(Path(__file__).parent.parent.parent / "examples")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this!
…: ttnn (should all be there now)
|
|
||
| import os | ||
|
|
||
| os.environ["TTLANG_COMPILE_ONLY"] = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to lose this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think individual tests should need to set an environment variable. Why is it necessary (as opposed to just having a test-specific variable)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by test-specific variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if a specific test needs to disable something it should have its own variable, I think relying on environment variables is messy and fragile in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would that variable look like though? You mean a lit config? Or a parameter on the kernel decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know, I am not really sure what's best for python lit tests.
|
|
||
| # Set compile-only mode if no hardware | ||
| if not _hardware_available: | ||
| os.environ["TTLANG_COMPILE_ONLY"] = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we let individual tests control compile only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdym? Should be possible to set in individual tests regardless of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests (in addition to the invalid ones) should be always compile only? And if they are, why keep the dead code for execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a default that can be overridden, OK. I still think it might lead to some config error causing runtime checking to be disabled, but that's not a huge concern.
My opinion is that it would be better for tests to explicitly say wether they want to be compile only tests (check codegen) or runtime tests (assert runtime values), I think it makes the testing intent clearer.
Given that we can override this, I don't feel too strongly, so feel free to leave the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems clearer to have "unsupported" on local machine + "fail" on CI with hardware than "pass" on local machine and "fail" on CI with hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know how you can have them be no-device-required compile-only tests (the python lit ones) given that a device is required to create the inputs.
Am I missing something obvious that ttnn allows (how do I open_device successfully without a device)?
device = ttnn.open_device(device_id=0)
...
lhs = ttnn.from_torch(
lhs_torch,
dtype=ttnn.bfloat16,
layout=ttnn.TILE_LAYOUT,
device=device,
memory_config=ttnn.DRAM_MEMORY_CONFIG,
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from ttlang import make_circular_buffer_like
# CHECK: buffer_factor must be in range [1, 32]
# Validation happens in CircularBuffer.__init__, no ttnn needed
make_circular_buffer_like(None, shape=(1, 1), buffer_factor=0)
For example. But fair enough, it's a narrow case where we'd have a python test that doesn't require ttnn and I'm fine making that a blanket requirement.
By the same logic, why make compile only the default if we aren't going to run them anyway?
Regardless, I don't know if it's productive to continue down this rabbit hole. We can override the env var, so whatever you decide here is fine with me.
afb5ea5 to
1bb6948
Compare
| from utils import require_hardware | ||
|
|
||
| print("=== Add Kernel Test ===") | ||
| require_hardware() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this matches existing behavior, but in future, do you think this should be a lit requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'd much rather not require it so would suggest rewriting the tests to be hw-independent, e.g., pickle inputs instead of initializing a device for each test just to create inputs if that's the main reason it's done (the python lit tests now take >20 minutes on qb).
zoecarver
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!
| # CHECK-NEXT: --> {{.*}}invalid_3d_grid.py:[[LINE:[0-9]+]]:1 | ||
| # CHECK-NEXT: | | ||
| # CHECK-NEXT: 34 | @ttl.kernel(grid=(1, 1, 1)) | ||
| # CHECK-NEXT: [[LINE]] | @ttl.kernel(grid=(1, 1, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
Problem: TTNN package instability and configure-time availability check - ttnn package in
tt-mlir/third_party/tt-metalmay disappear or change, andttlang_check_ttnn_available()checked at configure time when ttnn only available after build.Fix: Created
cmake/modules/CopyTTNNPythonPackage.cmaketo copy ttnn tobuild/python_packages/ttnn/during build. Removedttlang_check_ttnn_available()fromcmake/modules/TTLangUtils.cmake. Tests use runtime checks (REQUIRES: ttnndirective,pytest.importorskip()).Problem: CAPI library resolution issues -
python/CMakeLists.txtcould find wronglibTTMLIRPythonCAPI.sofrom different tt-mlir install at configure time, and Python extension could resolve wrong library at runtime, causing MLIRContext registry mismatches.Fix: Made configure-time search paths mutually exclusive (only search
TTMLIR_BUILD_DIRwhen defined, else searchTTMLIR_PATH). AddedBUILD_RPATHandINSTALL_RPATHproperties to constrain runtime library search to configured tt-mlir directories.Problem: Missing pytest infrastructure and broken test files - no pytest check target or configuration, lit tests incorrectly collected by pytest,
test_elementwise_ops.pyimported from non-existentexamples/utils(deleted in metal/tt-lang single and multi-core matmul #67),test_block_allocation.pyused wrong function namenew_split_work_to_coresand unsafe ttnn import.Fix: Added
check-ttlang-pytesttarget totest/CMakeLists.txt. Createdtest/python/conftest.pywith feature detection, markers, fixtures, andcollect_ignorelist. Recovered the deletedexamples/utils.pythat was required in pytests -- nowtest/python/utils.py. Fixed corresponding imports and function names in test files, added# REQUIRES: ttnndirective, changed topytest.importorskip("ttnn").Problem:
test/python/simple_add_multitile.pyfails on qb because of ordering differences.Fix: Use
CHECK-DAG:appropriately.