Skip to content

feat!: Support for compilation unit, subprogram and location debug metadata#1554

Open
tatiana-s wants to merge 22 commits intomainfrom
ts/debug-info
Open

feat!: Support for compilation unit, subprogram and location debug metadata#1554
tatiana-s wants to merge 22 commits intomainfrom
ts/debug-info

Conversation

@tatiana-s
Copy link
Contributor

@tatiana-s tatiana-s commented Mar 9, 2026

Closes #1507

Main changes with some points for discussion (+ see comments for further issues in specific files)

  • Specification for debug records using the Hugr metadata protocol is in metadata.debug_info.py for now - would it make sense to have this in hugr-py instead given the key is core.debug_info (should at least the key be defined there?) now moved to a hugr PR: feat: Add debug info metadata specification in hugr-py hugr#2971
  • turn_on_debug_mode() / turn_off_debug_mode() to toggle global flag for adding debug metadata during compilation (off by default)
  • Debug records attached to module root, function definitions, calls and most extension ops - you should now use the add_op methods defined in the Guppy compiler as opposed to the ones from Hugr directly, as the Guppy ones now include a check for adding debug metadata
    • there are however some extension ops used in various custom compilation where metadata is currently missing (still need to figure out which are really necessary and how to fix those), mainly where it is harder to propagate the needed AST node to the location (for example, in modifiers, comptime unpacking, bool conversion, option unwrapping) - what would currently happen under the current plan of using this data further down if it is missing?
    • on the other hand, some ops which aren't extension ops like MakeTuple are getting annotations when used in struct constructors for example

BREAKING CHANGE: (guppylang-internals) Refactored metadata-related code structure and renamed GuppyMetadata to FunctionMetadata

@tatiana-s tatiana-s requested a review from cgh-qtnm March 9, 2026 14:04
@tatiana-s tatiana-s requested a review from a team as a code owner March 9, 2026 14:04
@tatiana-s tatiana-s requested a review from hsemenenko March 9, 2026 14:04
@hugrbot
Copy link
Collaborator

hugrbot commented Mar 9, 2026

This PR contains breaking changes to the public Python API.

Breaking changes summary
guppylang-internals/src/guppylang_internals/definition/metadata.py:0: <module>:
Public object was removed

guppylang-internals/src/guppylang_internals/definition/function.py:317: compile_call(call_ast):
Parameter was added as required

guppylang-internals/src/guppylang_internals/definition/pytket_circuits.py:0: ParsedPytketDef.__init__(source_span):
Parameter was added as required

guppylang-internals/src/guppylang_internals/definition/pytket_circuits.py:0: CompiledPytketDef.__init__(func_def):
Positional parameter was moved
Details: position: from 7 to 8 (+1)

guppylang-internals/src/guppylang_internals/definition/pytket_circuits.py:0: CompiledPytketDef.__init__(source_span):
Parameter was added as required


@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

🐰 Bencher Report

Branchts/debug-info
TestbedLinux

🚨 3 Alerts

BenchmarkMeasure
Units
ViewBenchmark Result
(Result Δ%)
Upper Boundary
(Limit %)
tests/benchmarks/test_big_array.py::test_big_array_compileLatency
seconds (s)
📈 plot
🚷 threshold
🚨 alert (🔔)
2.01 s
(+10.89%)Baseline: 1.81 s
1.90 s
(105.61%)

tests/benchmarks/test_big_array.py::test_big_array_executableLatency
seconds (s)
📈 plot
🚷 threshold
🚨 alert (🔔)
8.57 s
(+8.57%)Baseline: 7.89 s
8.29 s
(103.40%)

tests/benchmarks/test_ctrl_flow.py::test_many_ctrl_flow_compileLatency
milliseconds (ms)
📈 plot
🚷 threshold
🚨 alert (🔔)
110.63 ms
(+5.12%)Baseline: 105.24 ms
110.50 ms
(100.11%)

Click to view all benchmark results
BenchmarkLatencyBenchmark Result
microseconds (µs)
(Result Δ%)
Upper Boundary
microseconds (µs)
(Limit %)
tests/benchmarks/test_big_array.py::test_big_array_check📈 view plot
🚷 view threshold
711,502.20 µs
(+3.28%)Baseline: 688,929.63 µs
723,376.11 µs
(98.36%)
tests/benchmarks/test_big_array.py::test_big_array_compile📈 view plot
🚷 view threshold
🚨 view alert (🔔)
2,007,385.47 µs
(+10.89%)Baseline: 1,810,196.20 µs
1,900,706.01 µs
(105.61%)

tests/benchmarks/test_big_array.py::test_big_array_executable📈 view plot
🚷 view threshold
🚨 view alert (🔔)
8,570,552.94 µs
(+8.57%)Baseline: 7,894,214.45 µs
8,288,925.18 µs
(103.40%)

tests/benchmarks/test_ctrl_flow.py::test_many_ctrl_flow_check📈 view plot
🚷 view threshold
48,719.88 µs
(+0.22%)Baseline: 48,611.31 µs
51,041.88 µs
(95.45%)
tests/benchmarks/test_ctrl_flow.py::test_many_ctrl_flow_compile📈 view plot
🚷 view threshold
🚨 view alert (🔔)
110,628.18 µs
(+5.12%)Baseline: 105,239.74 µs
110,501.73 µs
(100.11%)

tests/benchmarks/test_ctrl_flow.py::test_many_ctrl_flow_executable📈 view plot
🚷 view threshold
616,675.58 µs
(+2.49%)Baseline: 601,670.15 µs
631,753.66 µs
(97.61%)
tests/benchmarks/test_prelude.py::test_import_guppy📈 view plot
🚷 view threshold
52.35 µs
(-1.25%)Baseline: 53.01 µs
55.66 µs
(94.05%)
🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

🐰 Bencher Report

Branchts/debug-info
TestbedLinux

🚨 2 Alerts

BenchmarkMeasure
Units
ViewBenchmark Result
(Result Δ%)
Upper Boundary
(Limit %)
tests/benchmarks/test_big_array.py::test_big_array_compilehugr_bytes
bytes x 1e3
📈 plot
🚷 threshold
🚨 alert (🔔)
154.51 x 1e3
(+9.02%)Baseline: 141.73 x 1e3
143.15 x 1e3
(107.94%)

tests/benchmarks/test_ctrl_flow.py::test_many_ctrl_flow_compilehugr_bytes
bytes x 1e3
📈 plot
🚷 threshold
🚨 alert (🔔)
18.77 x 1e3
(+7.17%)Baseline: 17.52 x 1e3
17.69 x 1e3
(106.11%)

Click to view all benchmark results
Benchmarkhugr_bytesBenchmark Result
bytes x 1e3
(Result Δ%)
Upper Boundary
bytes x 1e3
(Limit %)
hugr_nodesBenchmark Result
nodes
(Result Δ%)
Upper Boundary
nodes
(Limit %)
tests/benchmarks/test_big_array.py::test_big_array_compile📈 view plot
🚷 view threshold
🚨 view alert (🔔)
154.51 x 1e3
(+9.02%)Baseline: 141.73 x 1e3
143.15 x 1e3
(107.94%)

📈 view plot
🚷 view threshold
6,620.00
(0.00%)Baseline: 6,620.00
6,686.20
(99.01%)
tests/benchmarks/test_ctrl_flow.py::test_many_ctrl_flow_compile📈 view plot
🚷 view threshold
🚨 view alert (🔔)
18.77 x 1e3
(+7.17%)Baseline: 17.52 x 1e3
17.69 x 1e3
(106.11%)

📈 view plot
🚷 view threshold
581.00
(0.00%)Baseline: 581.00
586.81
(99.01%)
🐰 View full continuous benchmarking report in Bencher

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 94.31818% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.56%. Comparing base (08e3397) to head (c772396).

Files with missing lines Patch % Lines
...als/src/guppylang_internals/metadata/max_qubits.py 66.66% 5 Missing ⚠️
...ternals/src/guppylang_internals/metadata/common.py 94.54% 3 Missing ⚠️
...guppylang_internals/std/_internal/compiler/list.py 82.35% 3 Missing ⚠️
.../src/guppylang_internals/compiler/expr_compiler.py 95.00% 2 Missing ⚠️
...als/src/guppylang_internals/metadata/debug_info.py 94.11% 1 Missing ⚠️
...pylang_internals/std/_internal/compiler/prelude.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1554    +/-   ##
========================================
  Coverage   93.56%   93.56%            
========================================
  Files         130      133     +3     
  Lines       12286    12410   +124     
========================================
+ Hits        11495    11612   +117     
- Misses        791      798     +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.



@dataclass
class FunctionMetadata:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This likely isn't the best way for storing this data, but after trying a couple of solutions that were all not ideal, my opinion is it doesn't make sense to overthink the design too much at the moment because this #1393, modifiers, and future debug metadata could all influence the design so for now something that is working is all that's needed

filename = get_file(node)
# If we can't fine a file for a node, we default to 0 which corresponds to the
# entrypoint file.
file_idx = ctx.metadata_file_table.get_index(filename) if filename else 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case that we are unable to find a file for the node (shouldn't be the case in theory but in practice you could get issues with custom node) I don't think it would be good to error during compilation but rather through whatever consumes the debug data - just setting it to the root file 0 probably isn't the best solution in that case, should it be something like -1 to indicate the issue?

Copy link
Contributor

@cgh-qtnm cgh-qtnm Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it's best to have a specific null value so we can tell the user "unknown filename" rather than give them an incorrect file name. -1 is fine, could also just not include the filename member in the record if it's not found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally I think this is also the answer to your question about what to do when it's harder to acquire the correct annotation for a node. I would suggest just marking it with some kind of null placeholder, which the backend can report as "debug info missing", and then we can revisit it when the missing information becomes an issue.

@ss2165 ss2165 requested review from ss2165 and removed request for hsemenenko March 9, 2026 15:13
Copy link
Contributor

@acl-cqc acl-cqc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks @tatiana-s. Just some very broad points.

  • I like the proposal to move some of the debug definition into hugr-py. Not sure if that should include HugrDebugInfo or just the keys.
  • Should turn_on_debug_mode just be an extra argument to .compile (and friends)?

Of course as per your description...the big thing about debug info is remembering to set it everywhere 😒.

  • In time you might make some of your ast_node: ASTNode | None = None non-optional, or at least remove the default (maybe you could do the latter now).

  • I wonder about changing that builder: DFBase[P] and top-level def add_op to a new class wrapping builder and exposing its own add_op. Then one cannot accidentally call the underlying builder.add_op directly (skipping the metadata).

  • Finally there is the question (your description hints at this): if a caller to add_op doesn't specify an ast_node (or passes None) - we'll end up with no source location on the node at all - is there anything we can do? Can we "inherit" from the nearest enclosing ast-node, say? E.g. if the builder-wrapper from previous, stores the current ast-node as a field and has a context-manager to enter a new ASTNode (overriding the field until you exit the context manager), then if a call to add_op doesn't specify ast_node explicitly we could use that from the nearest-enclosing context?

"test_debug_info.py",
"metadata_example.py",
]
funcs = hugr.children(hugr.module_root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I prefer patterns where you also check that funcs has the elements you expect. (E.g. if funcs was empty, you'd trivially pass the for loop + match test.)

You could assert names of funcs first, but that duplicates the names between the assert and all the individual cases; better still would be to build a dict from f_name to func, and then pull the functions out one-by-one to make the checks on them. (So, no match.) At the end you can assert there are no functions left in the dict.

def add_op(
self, op: ops.DataflowOp, /, *args: Wire, ast_node: AstNode | None = None
) -> Node:
"""Adds an op to the builder, with optional debug info."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the ast_node the debug info? seems more correct to say "with optional AST node related to the op"

*args: Wire,
ast_node: AstNode | None = None,
) -> Node:
"""Adds an op to the builder, with optional debug info."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above about "debug info"

hugr_func, *(input_list + bool_wires + param_wires)
)
if debug_mode_enabled():
if self.defined_at is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to comment the two cases

return self._node_metadata

def set_debug_info(self, debug_info: DebugRecord) -> None:
self._node_metadata[HugrDebugInfo.KEY] = debug_info.to_json()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to to_json() at this stage? Why not keep structured and call this at the serialization stage?


@dataclass
class DICompileUnit(DebugRecord):
"""Debug information for a compilation unit, corresponds to a module node."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Debug information for a compilation unit, corresponds to a module node."""
"""Debug information for a compilation unit, corresponds to a HUGR module node."""

"""Global state for determining whether to attach debug information to Hugr nodes
during compilation."""

DEBUG_MODE_ENABLED = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be private?

Suggested change
DEBUG_MODE_ENABLED = False
_DEBUG_MODE_ENABLED = False

# Add debug info about the module to the root node
if debug_mode_enabled():
module_info = DICompileUnit(
directory=Path.cwd().as_uri(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be the cwd or the directory where the file is?

module_info = DICompileUnit(
directory=Path.cwd().as_uri(),
# We know this file is always the first entry in the file table.
filename=ctx.metadata_file_table.get_index(filename),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get appends if not present - do you need to initialise with the file up top?

meta = hugr.module_root.metadata
assert HugrDebugInfo in meta
debug_info = DICompileUnit.from_json(meta[HugrDebugInfo.KEY])
assert get_last_uri_part(debug_info.directory) == "guppylang"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wouldn't be true if the tests were run from somewhere else?

assert lines == [113, 114, 28, 116]


# TODO: Improve this test.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand on what needs to be improved/raise an issue and link it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support for compilation unit and source locations

6 participants