Skip to content

feat: filesystem grep, read, write, edit file and workspace support#7402

Draft
Soulter wants to merge 9 commits intomasterfrom
feat/fs-grep-read-edit
Draft

feat: filesystem grep, read, write, edit file and workspace support#7402
Soulter wants to merge 9 commits intomasterfrom
feat/fs-grep-read-edit

Conversation

@Soulter
Copy link
Copy Markdown
Member

@Soulter Soulter commented Apr 6, 2026

Modifications / 改动点

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add filesystem tools for searching, reading, and editing files across local, sandbox, and Shipyard runtimes, with user-aware access restrictions and pagination support.

New Features:

  • Introduce a read-file tool that supports offsets and limits for partial file reads.
  • Introduce a file-edit tool that performs string replacements in files with optional replace-all behavior.
  • Introduce a grep-style search tool for querying file contents using ripgrep or grep with context and result limiting.
  • Add workspace path handling to scope filesystem operations per user/session.

Enhancements:

  • Extend filesystem abstraction and booters (local, shipyard, shipyard_neo, boxlite) to support search and edit operations in addition to basic CRUD.
  • Wire the new filesystem tools into local and sandbox toolsets so agents can use them alongside existing shell and Python tools.
  • Relax local booter path restrictions while enforcing read-only directory restrictions at the tool level for non-admin users.

Build:

  • Add python-ripgrep as a project dependency for content searching.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new filesystem tools—ReadFileTool, FileEditTool, and GrepTool—and updates the local and sandbox booters to support these operations. It also implements a security layer to restrict file access for non-admin users in local environments. The review feedback highlights potential memory issues when reading or editing large files in their entirety and suggests applying the documented default limit for file reads to prevent excessive memory consumption.

limit: int | None = None,
) -> dict[str, Any]:
_ = encoding
content = await self._sandbox.filesystem.read_file(path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation reads the full content of the file from the sandbox into the bot's memory before slicing. This is inefficient for large files. If the Shipyard Neo SDK supports range-based reads, they should be used here to fetch only the requested slice.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @RC-CHN

@Soulter Soulter changed the title feat: filesystem grep, read, edit file feat: filesystem grep, read, write, edit file and workspace support Apr 7, 2026
Copy link
Copy Markdown
Contributor

@whatevertogo whatevertogo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码审查反馈(claude code生成的不要喷我呜呜呜~)

我仔细审查了这个 PR 的代码实现,发现了几个需要关注的问题。已在相关代码行添加了详细评论。

主要发现

✅ 做得好的地方:

  • read_file 在 local booter 中使用了逐行迭代读取,内存安全
  • GrepTool 有完善的 result_limit 机制(默认100),防止结果过多
  • 测试覆盖充分,包含了大文件、图片、二进制文件等场景
  • 工作空间隔离和权限控制设计合理

⚠️ 需要修复的问题:

  1. edit_file 内存问题 - 使用 f.read() 读取整个文件,大文件会 OOM
  2. 默认 limit 未生效 - _validate_read_window 没有应用文档中的默认值 4000
  3. Shipyard Neo 性能 - 先读取完整文件再切片,效率低

这些问题都有具体的修复建议,请查看代码行评论。

总体评价

这是一个功能完整、架构良好的 PR,修复上述问题后可以合并。建议优先处理前两个问题,它们对内存安全有直接影响。

self,
path: str,
encoding: str = "utf-8",
offset: int | None = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是没有默认值啊喵

def _run() -> dict[str, Any]:
abs_path = os.path.abspath(path)
with open(abs_path, encoding=encoding) as f:
content = f.read()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

内存安全问题: 使用 f.read() 一次性读取整个文件到内存,对于大文件可能导致 OOM。

建议改为逐行处理或使用临时文件:

# 方案1: 逐行处理(适合简单替换)
with open(abs_path, 'r', encoding=encoding) as f_in:
    with open(temp_path, 'w', encoding=encoding) as f_out:
        for line in f_in:
            if replace_all:
                line = line.replace(old_string, new_string)
            else:
                if old_string in line and replacements < 1:
                    line = line.replace(old_string, new_string, 1)
                    replacements += 1
            f_out.write(line)

# 方案2: 限制可编辑文件大小
file_size = os.path.getsize(abs_path)
if file_size > 10 * 1024 * 1024:  # 10MB
    return {"success": False, "error": "File too large for editing"}

path: str,
offset: int | None = None,
limit: int | None = None,
) -> ToolExecResult:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认值未生效: 文档中 limit 参数说明默认为 4000,但这里的验证逻辑没有应用这个默认值。当 LLM 不提供 limit 参数时,会传递 None 给 booter,可能导致读取整个文件。

建议应用默认值:

def _validate_read_window(
    self,
    offset: int | None,
    limit: int | None,
) -> tuple[int | None, int | None]:
    if offset is not None and offset < 0:
        raise ValueError("`offset` must be greater than or equal to 0.")
    if limit is not None and limit < 1:
        raise ValueError("`limit` must be greater than or equal to 1.")
    # 应用默认值 4000
    return offset, limit if limit is not None else 4000

这样可以确保即使 LLM 忘记提供 limit,也不会尝试读取超大文件。

limit: int | None = None,
) -> dict[str, Any]:
_ = encoding
content = await self._sandbox.filesystem.read_file(path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

性能问题: 这里先通过 self._sandbox.filesystem.read_file(path) 读取完整文件内容到内存,然后再用 _slice_content_by_lines 切片。对于大文件这样做效率很低且可能导致 OOM。

如果 Shipyard Neo SDK 支持范围读取(range-based read),应该优先使用 SDK 的原生能力,只获取需要的部分。

如果 SDK 不支持范围读取,建议:

  1. 在工具层面添加文件大小检查,拒绝读取过大的文件
  2. 或者在文档中明确说明 Shipyard Neo 环境下的文件大小限制

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants