Skip to content

docs(rfc): propose persistent flow queue and completion webhooks#306

Open
mason5052 wants to merge 2 commits intovxcontrol:feature/next-releasefrom
mason5052:codex/issue-298-flow-concurrency-rfc
Open

docs(rfc): propose persistent flow queue and completion webhooks#306
mason5052 wants to merge 2 commits intovxcontrol:feature/next-releasefrom
mason5052:codex/issue-298-flow-concurrency-rfc

Conversation

@mason5052
Copy link
Copy Markdown
Contributor

@mason5052 mason5052 commented May 7, 2026

Summary

Adds a design RFC at examples/proposals/flow_concurrency.md describing a possible direction for native flow concurrency control and completion webhooks, as raised in issue #298. The RFC is documentation only; no runtime, schema, GraphQL, REST, or UI behavior changes here.

Problem

Issue #298 covers two real pain points for external orchestrators that drive PentAGI from a pipeline:

  • createFlow accepts unlimited concurrent calls, so calling it N times immediately spins up N Kali containers and saturates the host.
  • There is no native completion notification, so external schedulers maintain a polling loop on GET /api/v1/flows/{id}.

A naive fix can easily slip back into the anti-pattern PR #268 was rejected for: a hidden in-memory queue with no persistence, no UI/API/DB visibility, and no cancelability. PentAGI maintainers have asked, both in PR #268 and in the assistant flow management direction shipped in PR #292, for explicit, inspectable, manageable flow control instead of implicit background lifecycle.

Solution

The RFC follows the maintainer's relocated proposal pattern at examples/proposals/<topic>.md (commit 47de4e4 moved backend/docs/evidence_chain_rfc.md to examples/proposals/evidence_chain.md). It deliberately mirrors the structure of examples/proposals/evidence_chain.md (PR #277) and includes:

  • Goals limited to capping concurrent flows, making queued flows first-class, replacing external polling with at-least-once webhooks, and preserving the existing createFlow contract.
  • Non-Goals that forbid hidden in-memory queues, multi-tenant scheduling/SLA, a generic event bus, and any change to what finished means for tasks/subtasks/toolcalls.
  • Design Principles centered on persistence, visibility, manageability, explicit promotion, clear finished semantics, and at-least-once delivery.
  • Proposed concurrency model: a new persisted queued status, a single MAX_CONCURRENT_FLOWS env var, an explicit promoter, full UI/API visibility (listings, filters, detail, assistant view), and a user-driven cancel path with a documented terminal status.
  • Proposed completion webhook model: per-flow webhookUrl plus a global FLOW_WEBHOOK_URL fallback, HMAC-SHA256 signed payloads via X-PentAGI-Signature, stable delivery_id for receiver dedup, persisted flow_webhook_deliveries rows, bounded exponential backoff, and SSRF mitigations (config-time and delivery-time DNS validation, redacted secrets in logs, no retry on most 4xx).
  • Storage and API surface sketches that intentionally do not commit to a final migration shape.
  • Open Questions covering per-user limits, createFlow blocking semantics, alignment of the signature scheme with the issue RFC: Cryptographic evidence chain for PentAGI pentest operations #235 receipt direction, and queued-flow handling for resources/uploads/messages.
  • Suggested First Milestone that lands the queue end-to-end before webhooks, so each lifecycle change can be reviewed in isolation and PR sizes stay small.

The RFC explicitly rejects in-memory queues, hidden lifecycle state, and implicit promotion semantics, and references the PR #268 lesson directly.

User Impact

  • Maintainers and contributors get a written design surface to push back on before any code lands. The RFC is opinionated about what is out of scope so a future implementation PR cannot quietly grow into hidden lifecycle work.
  • Operators evaluating PentAGI for batch pipelines see the intended direction (a single MAX_CONCURRENT_FLOWS knob plus signed completion webhooks) and the staged milestones, so they can plan integration work even before any feature ships.
  • External orchestrators learn what to expect long-term: a queueable flow status, signed webhooks with a documented payload, and idempotent delivery semantics, instead of building yet another polling loop.

Test Plan

  • git diff --stat shows only examples/proposals/flow_concurrency.md changed.
  • Markdown renders cleanly (no broken anchor, list, table, or code-fence syntax).
  • Section structure mirrors examples/proposals/evidence_chain.md (Goals / Non-Goals / Proposed model / Open Questions / Suggested First Milestone) and adds an explicit Safety and Security section.
  • No code, schema, GraphQL, REST, or UI changes.

Refs #298

Proposes a design direction for native flow concurrency control and
completion notifications. The RFC follows the maintainer's relocated
proposal pattern at examples/proposals/<topic>.md and explicitly
builds on the lessons from PR vxcontrol#268 (rejected because the in-memory
queue was hidden lifecycle state).

The RFC covers:

- Goals limited to capping concurrent flows, persisting queued flows
  as first-class lifecycle, replacing external polling with at-least-
  once webhooks, and preserving the existing createFlow contract.
- Non-Goals that explicitly forbid hidden in-memory queues, multi-
  tenant scheduling, generic event bus features, and changing the
  meaning of 'finished' for tasks/subtasks/toolcalls.
- Design Principles for persistence, visibility, manageability,
  explicit promotion, clear finished semantics, and at-least-once
  delivery.
- A proposed concurrency model with a new persisted 'queued' status,
  a single MAX_CONCURRENT_FLOWS knob, an explicit promoter, and
  full UI/API visibility plus user cancellation.
- A proposed completion webhook model with per-flow and global URLs,
  HMAC-SHA256 signatures, persisted deliveries, bounded retries,
  and SSRF mitigations.
- Storage and API surface sketches that do not commit to a final
  schema.
- Open Questions covering per-user limits, blocking semantics on
  createFlow, signature alignment with the issue vxcontrol#235 receipt
  direction, and behavior of resources/uploads against queued flows.
- A Suggested First Milestone that lands the queue end-to-end before
  webhooks, to keep PR sizes reviewable.

This is documentation only. No runtime code, schema, GraphQL, REST,
or UI behavior changes here.

Refs vxcontrol#298

Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Copilot AI review requested due to automatic review settings May 7, 2026 16:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an RFC proposal documenting a persisted flow queue (with a global concurrency cap) and at-least-once completion webhooks, providing a design direction for addressing orchestration pain points raised in #298 without introducing runtime changes.

Changes:

  • Introduces a new design RFC covering a persisted queued flow status and explicit promotion model.
  • Proposes a completion webhook design (configuration, payload, signing, retries, SSRF mitigations).
  • Captures open questions and a staged milestone plan for implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/proposals/flow_concurrency.md
Comment thread examples/proposals/flow_concurrency.md Outdated
Comment thread examples/proposals/flow_concurrency.md
…ames

Address Copilot review feedback on PR vxcontrol#306:

- Lifecycle diagram: show running <-> waiting (waiting is a paused state that resumes to running when user input arrives) and document that both running and waiting can reach the terminal statuses finished or failed.

- Replace overloaded 'finished' wording with explicit 'terminal' semantics throughout. finished and failed remain distinct terminal statuses; the queue and webhook layers treat both as terminal.

- Align webhook event names with status terminology: flow.finished for success, flow.failed for failure. Update payload example accordingly and note the failed-flow shape.

Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants