Context
Discovered during investigation of #2981 (pending_op_results memory leak). The fix adds cleanup on TransactionCompleted/TransactionTimedOut events plus a periodic safety-net sweep for entries where the sender is closed.
The reviewer asked: why do some transactions complete without emitting these events?
Root Cause Analysis
Several code paths in handle_op_result() (crates/core/src/operations/mod.rs) reach terminal states without emitting TransactionCompleted:
1. Local state queueing (UpdateBroadcasting)
When OperationResult has return_msg: None and state: Some(updated_state), the state is pushed to OpManager but no completion event is emitted (line ~258-266).
2. StatePushed error path
Err(OpError::StatePushed) returns Ok(None) — the operation continues locally without any event (line ~113-117).
3. Duplicate message for completed op
Err(OpError::OpNotPresent(tx)) returns Ok(None) — the original transaction's callback was already inserted but if the original completion path didn't emit an event, the entry leaks (line ~118-127).
4. Root operation waiting for sub-operations
When a finalized operation has sub-operations that haven't completed, it's stored in root_ops_awaiting_sub_ops and returns Ok(None) — no event emitted until all children complete (line ~180-213).
5. Callback invocation guard
In process_message_decoupled() (crates/core/src/node/mod.rs), the callback sender is only invoked when is_operation_completed(&op_result) returns true. The pending_op_results entry is inserted regardless, but the callback may never fire for the paths above.
Impact
The periodic safety-net cleanup in #2981 handles all these cases by checking sender.is_closed(). The event-driven cleanup handles the common case. Together they prevent the leak.
Recommendation
Consider auditing whether TransactionCompleted should be emitted in more of these terminal paths, particularly:
- When
StatePushed operations actually finish their local processing
- When root operations' sub-operations all complete (verify the parent event fires)
This would make the system more consistent, though the safety-net cleanup makes it non-critical.
Related
Context
Discovered during investigation of #2981 (pending_op_results memory leak). The fix adds cleanup on
TransactionCompleted/TransactionTimedOutevents plus a periodic safety-net sweep for entries where the sender is closed.The reviewer asked: why do some transactions complete without emitting these events?
Root Cause Analysis
Several code paths in
handle_op_result()(crates/core/src/operations/mod.rs) reach terminal states without emittingTransactionCompleted:1. Local state queueing (UpdateBroadcasting)
When
OperationResulthasreturn_msg: Noneandstate: Some(updated_state), the state is pushed to OpManager but no completion event is emitted (line ~258-266).2. StatePushed error path
Err(OpError::StatePushed)returnsOk(None)— the operation continues locally without any event (line ~113-117).3. Duplicate message for completed op
Err(OpError::OpNotPresent(tx))returnsOk(None)— the original transaction's callback was already inserted but if the original completion path didn't emit an event, the entry leaks (line ~118-127).4. Root operation waiting for sub-operations
When a finalized operation has sub-operations that haven't completed, it's stored in
root_ops_awaiting_sub_opsand returnsOk(None)— no event emitted until all children complete (line ~180-213).5. Callback invocation guard
In
process_message_decoupled()(crates/core/src/node/mod.rs), the callback sender is only invoked whenis_operation_completed(&op_result)returns true. Thepending_op_resultsentry is inserted regardless, but the callback may never fire for the paths above.Impact
The periodic safety-net cleanup in #2981 handles all these cases by checking
sender.is_closed(). The event-driven cleanup handles the common case. Together they prevent the leak.Recommendation
Consider auditing whether
TransactionCompletedshould be emitted in more of these terminal paths, particularly:StatePushedoperations actually finish their local processingThis would make the system more consistent, though the safety-net cleanup makes it non-critical.
Related