Skip to content

Support lazy-starting the browser stack to reduce AIO sandbox idle memory #198

@LittleChenLiya

Description

@LittleChenLiya

中文版本在下面

Background

I ran into a sandbox memory issue while investigating bytedance/deer-flow#3213. DeerFlow uses the AIO sandbox image (enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest) as its containerized sandbox runtime.

The current all-in-one model is very convenient, but in high-concurrency deployments the idle memory cost per sandbox becomes a bottleneck. In my local DeerFlow/kind test environment, the memory profile looked like this:

  • Default AIO sandbox idle memory: about 769-895Mi
  • With DISABLE_JUPYTER=true and DISABLE_CODE_SERVER=true: about 413Mi
  • After additionally stopping the browser/VNC/browser MCP/Openbox stack: about 143-178Mi

The browser-related services are therefore the largest remaining idle-memory component after Jupyter and code-server are disabled.

Local finding

Inside the running AIO sandbox container, I found these supervisor programs:

  • browser
  • mcp-server-browser
  • tigervnc
  • openbox
  • websocat

Manually stopping these services significantly reduced idle memory. Manually starting them again worked in my local test:

supervisorctl start tigervnc openbox browser mcp-server-browser websocat

After startup:

  • CDP at http://127.0.0.1:9222/json/version became ready in about 1 second.
  • mcp-server-browser listened on port 8100.
  • Memory increased back to the expected browser-enabled level.

This suggests that a lazy-start browser stack could reduce idle memory while preserving browser functionality for workloads that actually need it.

Proposed improvement

I would like to contribute a PR to support opt-in lazy startup for the browser stack, if this is an acceptable direction.

A possible design:

  1. Add an opt-in environment variable, for example:
DISABLE_BROWSER_STACK=true

or:

AUTOSTART_BROWSER_STACK=false
  1. Keep the current default behavior unchanged. The browser stack should still start by default unless users explicitly opt in to lazy startup.

  2. When lazy startup is enabled, do not autostart these supervisor programs:

  • tigervnc
  • openbox
  • browser
  • mcp-server-browser
  • websocat
  1. Add an internal ensure_browser_stack_started() mechanism that starts only this fixed allowlist of services and waits until they are ready.

Readiness checks could include:

  • CDP /json/version is reachable.
  • browser MCP port is reachable.
  • VNC/websocket proxy is running when VNC is enabled.
  1. Call this ensure step from browser-dependent entry points, such as browser screenshot/actions/page APIs and MCP browser tooling.

Why this helps DeerFlow

For DeerFlow users, this would allow high-concurrency deployments to keep many idle sandboxes around with much lower memory usage, while still allowing browser workloads to start the browser stack on demand.

After this is supported in AIO sandbox, DeerFlow can add a small follow-up PR to pass through a SANDBOX_DISABLE_BROWSER_STACK-style setting from its provisioner/compose config.

Questions

  • Is lazy-starting the browser stack an acceptable direction for this project?
  • Is the image-side source for the supervisor/browser service configuration available for external contributions?
  • If yes, which files or branch should I base a PR on?
  • Would maintainers prefer one combined DISABLE_BROWSER_STACK switch, or separate switches for browser, browser MCP, and VNC?

背景

我在排查 bytedance/deer-flow#3213 时遇到了 AIO sandbox 的内存占用问题。DeerFlow 当前使用 AIO sandbox 镜像(enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest)作为容器化 sandbox runtime。

all-in-one 模式很方便,但在高并发部署里,单个 sandbox 的空闲内存会成为瓶颈。我在本地 DeerFlow/kind 环境中测到的结果大致是:

  • 默认 AIO sandbox 空闲内存约 769-895Mi
  • 设置 DISABLE_JUPYTER=trueDISABLE_CODE_SERVER=true 后约 413Mi
  • 再额外停止 browser/VNC/browser MCP/Openbox 相关服务后约 143-178Mi

也就是说,在关闭 Jupyter 和 code-server 之后,browser 相关服务是剩下最大的空闲内存来源。

本地发现

在运行中的 AIO sandbox 容器里,我看到了这些 supervisor program:

  • browser
  • mcp-server-browser
  • tigervnc
  • openbox
  • websocat

手动停止这些服务后,空闲内存明显下降。之后我本地手动重新启动它们也可以正常恢复:

supervisorctl start tigervnc openbox browser mcp-server-browser websocat

启动后:

  • CDP http://127.0.0.1:9222/json/version 约 1 秒 ready。
  • mcp-server-browser8100 端口可用。
  • 内存回升到启用 browser 栈后的水平。

这说明 browser 栈按需启动是可行的:空闲时降低内存,需要 browser 能力时再启动。

希望改进的方向

如果维护者认可这个方向,我希望写一个 PR 来支持 browser 栈的显式 opt-in lazy startup。

一个可能的设计是:

  1. 增加一个显式环境变量,例如:
DISABLE_BROWSER_STACK=true

或者:

AUTOSTART_BROWSER_STACK=false
  1. 默认行为保持不变。也就是说,用户不设置这个变量时,browser 栈仍然像现在一样随 sandbox 启动。

  2. 当启用 lazy startup 时,不自启动这些 supervisor program:

  • tigervnc
  • openbox
  • browser
  • mcp-server-browser
  • websocat
  1. 增加一个内部的 ensure_browser_stack_started() 机制,只允许启动上述固定 allowlist 服务,并等待它们 ready。

ready 检查可以包括:

  • CDP /json/version 可访问。
  • browser MCP 端口可访问。
  • 如果启用 VNC,则 VNC/websocket proxy 已运行。
  1. 在依赖 browser 的入口里调用 ensure,例如 browser screenshot/actions/page API,以及 MCP browser 工具入口。

这对 DeerFlow 的帮助

这样 DeerFlow 在高并发部署时,可以保留更多低内存的空闲 sandbox;只有真正需要 browser 工具的任务才启动 browser 栈。

等 AIO sandbox 支持这个能力之后,DeerFlow 可以再做一个很小的 follow-up PR,把 SANDBOX_DISABLE_BROWSER_STACK 之类的配置从 provisioner/compose 传进 sandbox Pod。

想请教的问题

  • browser 栈按需启动这个方向是否可以接受?
  • 镜像里 supervisor/browser 服务配置相关的源码是否可以外部贡献?
  • 如果可以,PR 应该基于哪些文件或分支?
  • 维护者更倾向于一个统一的 DISABLE_BROWSER_STACK 开关,还是 browser、browser MCP、VNC 分开的多个开关?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions