|
| 1 | +# Debugging guide and checklists |
| 2 | + |
| 3 | +The point of this guide is to serve as a checklist for everything you may have forgotten to put in your test files. All checklists are sorted by bug frequency. |
| 4 | + |
| 5 | +## Table of contents |
| 6 | +| | | |
| 7 | +|:----|:----| |
| 8 | +| 1 | Quality of life tips & tricks | |
| 9 | +| 2 | Compilation errors | |
| 10 | +| 3 | Runtime errors | |
| 11 | +| 4 | Assertion errors | |
| 12 | +--- |
| 13 | + |
| 14 | +# Quality of life tips & tricks |
| 15 | + |
| 16 | +## Enhanced view of error matrices |
| 17 | +If your terminal is too narrow or too short to display a complete dump of all tiles your test variant processed, the better approach is to redirect `pytest`'s `stderr` to a file like this: |
| 18 | + |
| 19 | +`pytest --compile-consumer -x ./my_test_name.py 2>./my_file_path.txt` |
| 20 | + |
| 21 | +In order to view an error matrix as you can in your terminal, you need to install the VSCode/Cursor extension [ANSI Colors](https://marketplace.visualstudio.com/items?itemName=iliazeus.vscode-ansi). Afterwards, open the file and select ANSI preview to see the colors as they are in the terminal. |
| 22 | + |
| 23 | +You can use this approach when you have many errors in your variants, to speed up their execution by redirecting the `stderr` to a file. In most execution cases, `pytest` is actually bound by the terminal throughput of relatively large error messages produced. |
| 24 | + |
| 25 | +# Compilation errors |
| 26 | + |
| 27 | +- Did you include all default headers provided in the example for your test type? |
| 28 | + - `#include "params.h"` is mandatory because it's the source of your entire `cpp` test configuration; |
| 29 | +- Does your `run_kernel` look like this: |
| 30 | + - ```void run_kernel(const volatile struct RuntimeParams *params)```; |
| 31 | + - Did you put all the keywords? |
| 32 | +- How are my Python passed template and runtime parameters accessed in my C++ kernel code? |
| 33 | + - TODO |
| 34 | +- I'm getting a compilation error when I compile with coverage enabled. |
| 35 | + - This can be a consequence of a bad LLK API call, that is written in such a way that the compiler fails to deal with it when coverage is enabled; |
| 36 | + - If errors are of type: `Can't fit 32-bit value in 16-bit TTI buffer`, it's probably an LLK API error that is only caught when compiling for coverage; |
| 37 | + |
| 38 | +# Runtime errors |
| 39 | + |
| 40 | +- TTException - can't find an object file: |
| 41 | + - TODO |
| 42 | +- My kernel hangs the core when I add my new runtime parameter to the runtimes list of TestConfig. |
| 43 | + - TODO |
| 44 | + |
| 45 | +# Assertion errors |
| 46 | + |
| 47 | +- Do you know exactly which assert failed? |
| 48 | + - If no, **please** put a small comment after your asserts like this to enhance your visibility: |
| 49 | + ``` |
| 50 | + assert len(res_from_L1) == len(golden_tensor), "Result tensor and golden tensor are not of the same length" |
| 51 | + ``` |
| 52 | +- Did you hardcode your stimuli addresses? |
| 53 | + - Firstly, you're not supposed to do this. Stimuli is accessed from the kernel code using `buffer_A`, `buffer_B`, and `buffer_Res` variables. |
| 54 | + - If you are 110% sure you must hardcode your addresses, **please** consult the `L1 memory layouts` section of `infra_architecture.md`, to be sure your stimuli is in L1 where your kernel expects it. |
| 55 | + - To make Python actually write stimuli to your specific address, you need to reassign `StimuliConfig.STIMULI_L1_ADDRESS` static field with your new address. Keep in mind that this will make other tests that use default addresses fail because you changed where their stimuli is loaded to L1. |
| 56 | +- Did you access any hardcoded addresses of your choosing? |
| 57 | + - If you are 110% sure you must do this, **[please see L1 layout](infra_architecture.md#l1-memory-layouts)** to be sure you didn't accidentally overwrite some other important piece of data used by the kernel or read some garbage. |
| 58 | +- Is your error matrix the same every time you run your failing variant? |
| 59 | + - To be extra sure run `tt-smi -r` between every `pytest` invocation; |
| 60 | + - If this is indeed the case, your kernel really does process the data you supplied, but it's configured in an invalid way. Please check all arguments to the `TestConfig` object to be sure everything is as you expect it. If you're sure, check the build.h of your variant to check if C++ gets parameterized correctly; |
| 61 | +- Is your error matrix different every time you run your failing variant? |
| 62 | + - This means that your kernel is not processing any stimuli you supplied to it, thus your kernel is malconfigured. |
0 commit comments