Skip to content

Commit 983baed

Browse files
AndrejaajankovicTT
authored andcommitted
Initial docs
1 parent f8a9c72 commit 983baed

File tree

4 files changed

+593
-0
lines changed

4 files changed

+593
-0
lines changed

docs/tests/debugging_guide.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Debugging guide and checklists
2+
3+
The point of this guide is to serve as a checklist for everything you may have forgotten to put in your test files. All checklists are sorted by bug frequency.
4+
5+
## Table of contents
6+
| | |
7+
|:----|:----|
8+
| 1 | Quality of life tips & tricks |
9+
| 2 | Compilation errors |
10+
| 3 | Runtime errors |
11+
| 4 | Assertion errors |
12+
---
13+
14+
# Quality of life tips & tricks
15+
16+
## Enhanced view of error matrices
17+
If your terminal is too narrow or too short to display a complete dump of all tiles your test variant processed, the better approach is to redirect `pytest`'s `stderr` to a file like this:
18+
19+
`pytest --compile-consumer -x ./my_test_name.py 2>./my_file_path.txt`
20+
21+
In order to view an error matrix as you can in your terminal, you need to install the VSCode/Cursor extension [ANSI Colors](https://marketplace.visualstudio.com/items?itemName=iliazeus.vscode-ansi). Afterwards, open the file and select ANSI preview to see the colors as they are in the terminal.
22+
23+
You can use this approach when you have many errors in your variants, to speed up their execution by redirecting the `stderr` to a file. In most execution cases, `pytest` is actually bound by the terminal throughput of relatively large error messages produced.
24+
25+
# Compilation errors
26+
27+
- Did you include all default headers provided in the example for your test type?
28+
- `#include "params.h"` is mandatory because it's the source of your entire `cpp` test configuration;
29+
- Does your `run_kernel` look like this:
30+
- ```void run_kernel(const volatile struct RuntimeParams *params)```;
31+
- Did you put all the keywords?
32+
- How are my Python passed template and runtime parameters accessed in my C++ kernel code?
33+
- TODO
34+
- I'm getting a compilation error when I compile with coverage enabled.
35+
- This can be a consequence of a bad LLK API call, that is written in such a way that the compiler fails to deal with it when coverage is enabled;
36+
- If errors are of type: `Can't fit 32-bit value in 16-bit TTI buffer`, it's probably an LLK API error that is only caught when compiling for coverage;
37+
38+
# Runtime errors
39+
40+
- TTException - can't find an object file:
41+
- TODO
42+
- My kernel hangs the core when I add my new runtime parameter to the runtimes list of TestConfig.
43+
- TODO
44+
45+
# Assertion errors
46+
47+
- Do you know exactly which assert failed?
48+
- If no, **please** put a small comment after your asserts like this to enhance your visibility:
49+
```
50+
assert len(res_from_L1) == len(golden_tensor), "Result tensor and golden tensor are not of the same length"
51+
```
52+
- Did you hardcode your stimuli addresses?
53+
- Firstly, you're not supposed to do this. Stimuli is accessed from the kernel code using `buffer_A`, `buffer_B`, and `buffer_Res` variables.
54+
- If you are 110% sure you must hardcode your addresses, **please** consult the `L1 memory layouts` section of `infra_architecture.md`, to be sure your stimuli is in L1 where your kernel expects it.
55+
- To make Python actually write stimuli to your specific address, you need to reassign `StimuliConfig.STIMULI_L1_ADDRESS` static field with your new address. Keep in mind that this will make other tests that use default addresses fail because you changed where their stimuli is loaded to L1.
56+
- Did you access any hardcoded addresses of your choosing?
57+
- If you are 110% sure you must do this, **[please see L1 layout](infra_architecture.md#l1-memory-layouts)** to be sure you didn't accidentally overwrite some other important piece of data used by the kernel or read some garbage.
58+
- Is your error matrix the same every time you run your failing variant?
59+
- To be extra sure run `tt-smi -r` between every `pytest` invocation;
60+
- If this is indeed the case, your kernel really does process the data you supplied, but it's configured in an invalid way. Please check all arguments to the `TestConfig` object to be sure everything is as you expect it. If you're sure, check the build.h of your variant to check if C++ gets parameterized correctly;
61+
- Is your error matrix different every time you run your failing variant?
62+
- This means that your kernel is not processing any stimuli you supplied to it, thus your kernel is malconfigured.

0 commit comments

Comments
 (0)