Skip to content

Fix a crash in BackgroundTaskManager when background task initialization races with early call termination#1360

Open
zscgeek wants to merge 1 commit intojambonz:mainfrom
consigai:consig/cancel-crash
Open

Fix a crash in BackgroundTaskManager when background task initialization races with early call termination#1360
zscgeek wants to merge 1 commit intojambonz:mainfrom
consigai:consig/cancel-crash

Conversation

@zscgeek
Copy link
Copy Markdown
Contributor

@zscgeek zscgeek commented Sep 12, 2025

  • Summary
    • Fix a crash in BackgroundTaskManager when background task initialization races with early call termination (e.g., quick CANCEL). Add defensive guards and ensure partial tasks are not tracked.
  • Repro (observed in production)
    • Inbound INVITE followed by quick CANCEL
    • Feature server attempts to initialize background transcribe; logs:
      • “BackgroundTaskManager:_initTranscribe - Error creating transcribe task”
      • “Response#send: final response already sent”
    • Teardown triggers BackgroundTaskManager.stopAll()
    • Crash:
      • TypeError: Cannot read properties of undefined (reading 'end') at background-task-manager.js:67:17
  • Root cause
    • stop/_taskCompleted/_taskError unconditionally call task.span.end()
    • Init methods return a task even if initialization fails before task.span assignment
    • newTask stores tasks regardless of initialization completeness
  • Changes
    • Guard task.span.end() in stop, _taskCompleted, _taskError
    • Only track tasks in newTask when task.span exists (fully initialized)
    • In _initListen, _initBargeIn, _initTranscribe, _initTtsStream, return undefined in catch blocks
  • Why this approach
    • Keeps existing behavior intact for successful flows while making teardown idempotent and safe
    • Eliminates class of crashes from partial initialization and teardown races
  • Compatibility
    • No API changes
    • Logging preserved; spans end when present

…ion races with early call termination (e.g., quick CANCEL). Add defensive guards and ensure partial tasks are not tracked.

Repro (observed in production)
Inbound INVITE followed by quick CANCEL
Feature server attempts to initialize background transcribe; logs:
“BackgroundTaskManager:initTranscribe - Error creating transcribe task”
“Response#send: final response already sent”
Teardown triggers BackgroundTaskManager.stopAll()
Crash:
TypeError: Cannot read properties of undefined (reading 'end') at background-task-manager.js:67:17
Root cause
stop/_taskCompleted/_taskError unconditionally call task.span.end()
Init methods return a task even if initialization fails before task.span assignment
newTask stores tasks regardless of initialization completeness
Changes
Guard task.span.end() in stop, _taskCompleted, _taskError
Only track tasks in newTask when task.span exists (fully initialized)
In _initListen, _initBargeIn, _initTranscribe, _initTtsStream, return undefined in catch blocks
Why this approach
Keeps existing behavior intact for successful flows while making teardown idempotent and safe
Eliminates class of crashes from partial initialization and teardown races
Compatibility
No API changes
Logging preserved; spans end when present
Tests/Follow-ups
Suggest adding a test where an INVITE is canceled before answer and verify no crash, and that background tasks are not tracked if init fails.
Consider deferring some background tasks until after upstream 200 OK/ACK to reduce early-race exposure.

(cherry picked from commit 115dd70)
@davehorton davehorton requested a review from xquanluu September 12, 2025 17:30
@zscgeek zscgeek closed this Sep 12, 2025
@davehorton
Copy link
Copy Markdown
Contributor

why closed @zscgeek ? Is another PR coming?

@zscgeek
Copy link
Copy Markdown
Contributor Author

zscgeek commented Sep 12, 2025

closing while i complete more testing - i think it's good but had meant to open as draft while I validate 100% for sure

@zscgeek
Copy link
Copy Markdown
Contributor Author

zscgeek commented Sep 12, 2025

btw i'm fine to reopen if you guys want to review in parallel

@zscgeek zscgeek reopened this Sep 17, 2025
@zscgeek
Copy link
Copy Markdown
Contributor Author

zscgeek commented Sep 17, 2025

@davehorton @sammachin - We've tested this more in our environment, and it seems to be working well, fixing the crashes we were seeing on inbound call race conditions.

@davehorton
Copy link
Copy Markdown
Contributor

@xquanluu can you review?

@xquanluu
Copy link
Copy Markdown
Contributor

Thanks @zscgeek

all of background tasks we currently have required to have media endpoint from freeswitch and should be enabled after call is answered.

even in Config verb, if background transcribe is used, TaskPreconditions.Endpoint should be checked.

We cannot have a case that the call initiated quickly got cancel, and transcribe task should be running.

Do you have log for this call? could you send it to support@jambonz.org please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants