Summary
Running device tests with INTERACT=1 hangs in any signing flow whose ButtonRequest reports more than one page. After the user confirms on the device, the device cleanly moves past the BR and is left waiting for the next protocol message; the host stays stuck inside _callback_button and never sends the next TxAck. Reproduces on T2T1 — not THP- or Eckhart-specific.
Reproduction
INTERACT=1 pytest tests/device_tests/bitcoin/test_signtx.py -k test_one_one_fee
Confirm each prompt on the device. The test hangs after the last user confirmation. Last device-side log:
trezor.ui DBG UI task exited by itself: <generator object 'br_task' ...>
trezor.loop DBG close spawned task: <generator object '_waiting_screen' ...>
trezor.wire.codec.codec_context DBG [USBIF] write: TxRequest
trezor.wire.codec.codec_context DBG [USBIF] expect: TxAckInput
The device has acknowledged the user's confirmation and is waiting on TxAckInput. The host is the side that's stuck.
Root cause
Under INTERACT=1, conftest.py sets auto_interact=False → DebugLink.allow_interactions = False.
_callback_button (client.py:555) writes ButtonAck, then runs the UI callback before reading the next message:
session.write(messages.ButtonAck())
self.app._callback_button(msg) # ← blocks here
return session.read()
The UI callback flows into DebugUI.default_input_flow → _paginate_and_confirm(br.pages) (debuglink.py:1070, :1184). That issues pages debug decisions (swipe/click + press_yes). Each _decision() under allow_interactions=False does (debuglink.py:810-812):
if not self.allow_interactions:
self.wait_layout()
return
wait_layout() blocks on NEXT_LAYOUT. After the user confirms on-device:
- The first
wait_layout() returns when the device transitions through the BR.
- The remaining
wait_layout() calls block — the device is now on a non-interactive screen (_waiting_screen) and will produce no further layout changes until the host sends the next TxAck.
- The host can't send
TxAck because it's still inside _callback_button.
Deadlock: host waits for a layout change, device waits for a protocol message.
When this broke
dd496c1a5d ("feat(core): unify RustLayout, implement single global layout") changed _decision for allow_interactions=False from return None (no-op) to return self.wait_layout(). Before that change, the pages-many decisions issued by the input flow were all no-ops under INTERACT and the host did not block on layout transitions at all. The synchronization point was implicitly the device's next protocol message, which is exactly the right point.
Local hotfix
To unblock myself I short-circuited default_input_flow when allow_interactions=False:
def default_input_flow(self, on_page=None):
while True:
br = yield
if not self.debuglink.allow_interactions:
continue
...
This restores pre-dd496c1a5d behavior on this path, but:
- It throws away the host/device synchronization point that
dd496c1a5d introduced (the single wait_layout() per decision).
- It disables on-host PIN entry on T1 in INTERACT mode (since the
PinEntry branch is also skipped).
So it's a workaround, not a fix.
Suggested clean fix
The fundamental issue is that under INTERACT the host shouldn't try to anticipate how many layout transitions the user will produce — br.pages is the host's idea of how many auto-clicks would be needed, not how many transitions a human will cause. Two plausible directions, in increasing order of niceness:
-
Collapse the input flow to a single sync point under allow_interactions=False. Instead of running _paginate_and_confirm, do exactly one wait_layout(wait_for_external_change=True) per BR (and keep the PinEntry branch). The single wait_layout call preserves the sync intent of dd496c1a5d without overcommitting on br.pages.
-
Remove wait_layout() from _decision under allow_interactions=False and rely on the protocol-level read in _callback_button (session.read() after the UI callback) as the natural sync point. This is the cleanest in terms of layering — debug-link stops trying to gate the protocol — but it requires confirming that no other caller of _decision depends on the layout having advanced before returning.
(1) is the smaller, safer change. (2) is what the pre-dd496c1a5d design effectively did.
cc @matejcik
Summary
Running device tests with
INTERACT=1hangs in any signing flow whoseButtonRequestreports more than one page. After the user confirms on the device, the device cleanly moves past the BR and is left waiting for the next protocol message; the host stays stuck inside_callback_buttonand never sends the nextTxAck. Reproduces on T2T1 — not THP- or Eckhart-specific.Reproduction
Confirm each prompt on the device. The test hangs after the last user confirmation. Last device-side log:
The device has acknowledged the user's confirmation and is waiting on
TxAckInput. The host is the side that's stuck.Root cause
Under
INTERACT=1,conftest.pysetsauto_interact=False→DebugLink.allow_interactions = False._callback_button(client.py:555) writesButtonAck, then runs the UI callback before reading the next message:The UI callback flows into
DebugUI.default_input_flow→_paginate_and_confirm(br.pages)(debuglink.py:1070,:1184). That issuespagesdebug decisions (swipe/click + press_yes). Each_decision()underallow_interactions=Falsedoes (debuglink.py:810-812):wait_layout()blocks onNEXT_LAYOUT. After the user confirms on-device:wait_layout()returns when the device transitions through the BR.wait_layout()calls block — the device is now on a non-interactive screen (_waiting_screen) and will produce no further layout changes until the host sends the nextTxAck.TxAckbecause it's still inside_callback_button.Deadlock: host waits for a layout change, device waits for a protocol message.
When this broke
dd496c1a5d("feat(core): unify RustLayout, implement single global layout") changed_decisionforallow_interactions=Falsefromreturn None(no-op) toreturn self.wait_layout(). Before that change, thepages-many decisions issued by the input flow were all no-ops under INTERACT and the host did not block on layout transitions at all. The synchronization point was implicitly the device's next protocol message, which is exactly the right point.Local hotfix
To unblock myself I short-circuited
default_input_flowwhenallow_interactions=False:This restores pre-
dd496c1a5dbehavior on this path, but:dd496c1a5dintroduced (the singlewait_layout()per decision).PinEntrybranch is also skipped).So it's a workaround, not a fix.
Suggested clean fix
The fundamental issue is that under INTERACT the host shouldn't try to anticipate how many layout transitions the user will produce —
br.pagesis the host's idea of how many auto-clicks would be needed, not how many transitions a human will cause. Two plausible directions, in increasing order of niceness:Collapse the input flow to a single sync point under
allow_interactions=False. Instead of running_paginate_and_confirm, do exactly onewait_layout(wait_for_external_change=True)per BR (and keep thePinEntrybranch). The singlewait_layoutcall preserves the sync intent ofdd496c1a5dwithout overcommitting onbr.pages.Remove
wait_layout()from_decisionunderallow_interactions=Falseand rely on the protocol-level read in_callback_button(session.read()after the UI callback) as the natural sync point. This is the cleanest in terms of layering — debug-link stops trying to gate the protocol — but it requires confirming that no other caller of_decisiondepends on the layout having advanced before returning.(1) is the smaller, safer change. (2) is what the pre-
dd496c1a5ddesign effectively did.cc @matejcik