Skip to content
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ec83e50
feat: Add `run_single_task.py` example, `mobile_action_helper.py`, an…
flyfox666 Dec 19, 2025
da089d0
feat: Introduce a Gradio web UI for AutoGLM, enabling task execution,…
flyfox666 Dec 19, 2025
633ad6f
feat: Add Gradio Web UI for AutoGLM with command execution, real-time…
flyfox666 Dec 19, 2025
70ef9ad
Add Gradio web UI for AutoGLM, including trajectory visualization and…
flyfox666 Dec 20, 2025
9937777
feat: add Gradio web UI for AutoGLM with trajectory visualization and…
flyfox666 Dec 20, 2025
c476733
feat: implement Gradio web UI for stepui with command runner and trac…
flyfox666 Dec 21, 2025
6a72f0d
build: Update project dependencies
flyfox666 Dec 21, 2025
b490da9
chore: add venv/ to .gitignore
flyfox666 Dec 21, 2025
59e396d
feat: introduce Gradio web UI for AutoGLM, supporting command executi…
flyfox666 Dec 21, 2025
cd74274
feat: add detailed Web UI instructions and feature descriptions to RE…
flyfox666 Dec 21, 2025
2b45d37
Revise README_CN.md with updated information
flyfox666 Dec 21, 2025
db507fe
Update README.md by removing old news and contact info
flyfox666 Dec 21, 2025
e289875
chore: Clear stepfun API key in model config.
flyfox666 Dec 21, 2025
e956172
Enhance README_CN with wireless debugging instructions
flyfox666 Dec 21, 2025
9e965bc
Enhance README with wireless debugging instructions
flyfox666 Dec 21, 2025
408fc8e
function update
flyfox666 Dec 23, 2025
674e80a
Merge branch 'main' of https://github.com/flyfox666/gelab-zero-webui
flyfox666 Dec 23, 2025
ae44d91
feat: Add Gradio web UI for AutoGLM Android automation with trajector…
flyfox666 Dec 24, 2025
3629d12
feat: Add generated Python bytecode for `app.py`.
flyfox666 Dec 25, 2025
ae16bec
feat: Implement GUI agent loop with pause, auto-reply, and image capt…
flyfox666 Dec 29, 2025
24f9afa
feat: 更新Gradio界面,添加模型名称输入框并调整相关逻辑
flyfox666 Dec 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ output/
running_log
gelab-zero-4b-preview/

model_config.yaml
model_config.yaml
venv/
59 changes: 36 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
![GELab-Zero Main Image](./images/main_en.png)

<img width="1920" height="1606" alt="Stepfun-ai-gelab-zero-12-21-2025_09_05_PM" src="https://github.com/user-attachments/assets/b8f2227b-56c6-4f60-ba86-4dcbe7b49835" />

> 👋 Hi, everyone! We are proud to present the first fully open-source GUI Agent with both model and infrastructure. Our solution features plug-and-play engineering with no cloud dependencies, giving you complete privacy control.

Expand All @@ -16,18 +17,6 @@
<a href="./README_CN.md">简体中文</a>
</p>

## 📰 News

* 🎁 **[2025-12-18]** We release **Step-GUI Technical Report** on [**arXiv**](https://arxiv.org/abs/2512.15431)!
* 🎁 **[2025-12-18]** We release a more powerful **API** for GUI automation tasks. [Apply for API access here](https://wvixbzgc0u7.feishu.cn/share/base/form/shrcnNStxEmuE7aY6jTW07CZHMf)!
* 🎁 **[2025-12-12]** We release **MCP-Server** support for multi-device management and task distribution. See [Installation & Quick Start](#-installation-quick-start) and [MCP-Server Setup](#optional-mcp-server-setup) for setup instructions.
* 🎁 **[2025-12-1]** We thank the following projects and authors for providing quantization tools & tutorials: [GGUF_v1](https://huggingface.co/bartowski/stepfun-ai_GELab-Zero-4B-preview-GGUF), [GGUF_v2](https://huggingface.co/noctrex/GELab-Zero-4B-preview-GGUF), [EXL3](https://huggingface.co/ArtusDev/stepfun-ai_GELab-Zero-4B-preview-EXL3), [Tutorials_CN](http://xhslink.com/o/1WrmgHGWFYh), [Tutorials_EN](https://www.youtube.com/watch?v=4BMiDyQOpos)
* 🎁 **[2025-11-31]** We release a lightweight **4B** model GELab-Zero-4B-preview on [**Hugging Face**](https://huggingface.co/stepfun-ai/GELab-Zero-4B-preview) and [**Model Scope**](https://modelscope.cn/models/stepfun-ai/GELab-Zero-4B-preview).
* 🎁 **[2025-11-31]** We release the tasks from the [**AndroidDaily**](https://huggingface.co/datasets/stepfun-ai/AndroidDaily) benchmark.
* 🎁 **[2025-11-30]** We release the current **GELab-Zero** engineering infrastructure.
* 🎁 **[2025-10]** Our [**research**](https://github.com/summoneryhl/gelab-engine) paper on GELab-Engine is accepted by **NeurIPS 2025**.



## 📑 Table of Contents

Expand All @@ -38,15 +27,6 @@
- [📝 Citation](#-citation)


## 📧 Contact

You can contact us and communicate with us by joining our WeChat group:

| WeChat Group |
|:-------------------------:|
| <img src="images/wechat_group2.jpeg" width="200"> |



## 📖 Background

Expand Down Expand Up @@ -396,10 +376,43 @@ cd gelab-zero
# Install dependencies
pip install -r requirements.txt

# To inference a single task
# To inference a single task (Command Line)
python examples/run_single_task.py

# Or use the Web UI (Recommended)
python start_web_ui.py
```

#### Web UI Features

The Web UI provides a more user-friendly way to interact with GELab-Zero, featuring a two-column layout:

**Left Panel - Control**

| Module | Features |
|--------|----------|
| **📱 Device Management** | Check device status, view device list, restart ADB service |
| **📶 Wireless Debugging** | Connect device via IP address, enable TCP/IP mode, disconnect |
| **📊 Task Monitoring** | View task status (Ready/Running/Waiting for Input), select historical Sessions |
| **💬 Command/Reply** | Enter task instructions or reply to Agent queries, supports `Ctrl+Enter` shortcut |
| **⚙️ Model Configuration** | Select model provider (auto-loaded from `model_config.yaml`), set Base URL and API Key |
| **🛠 Utilities** | Launch scrcpy screen mirroring, get installed app list |

**Right Panel - Display**

| Module | Features |
|--------|----------|
| **📱 Task Trajectory** | Visual replay of each execution step, including screenshots, thought process, and action details |
| **📋 Real-time Logs** | Real-time display of task execution terminal output, with clear and copy buttons |

**Interaction Enhancements**

- **🔄 Smart Auto-scroll**: Auto-scrolls during task execution; stops when task completes, allowing free navigation through history
- **🖼️ Image Lightbox**: Click screenshots in trajectory to view full-size, with download support
- **⌨️ Keyboard Shortcut**: `Ctrl+Enter` to quickly submit commands/replies

After starting the Web UI, open your browser and go to `http://localhost:8866` to access the interface.

### (Optional) Step 4: Trajectory Visualization Environment Setup

The trajectory will be defult saved in the `running_log/server_log/os-copilot-local-eval-logs/` directory. You can visualize the trajectory using streamlit:
Expand Down
58 changes: 36 additions & 22 deletions README_CN.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

![GELab-Zero 主图](./images/main_cn.png)
<img width="1920" height="1606" alt="Stepfun-ai-gelab-zero-12-21-2025_09_05_PM" src="https://github.com/user-attachments/assets/d1ab326c-5927-4f81-95a4-c6c72ba521ea" />



> 👋 hi大家好!我们很荣幸推出首个同时包含模型和基础设施的全开源 GUI Agent。我们的解决方案主打即插即用的工程化体验,无需依赖云端,赋予您完全的隐私控制权。
Expand All @@ -18,17 +19,6 @@
<a href="./README_CN.md">简体中文</a>
</p>

## 📰 新闻

* 🎁 **[2025-12-18]** 我们在 **[arXiv](https://arxiv.org/abs/2512.15431)** 上发布了 **Step-GUI 技术报告**!
* 🎁 **[2025-12-18]** 我们发布了更强大的 GUI 自动化任务 **API**。[点击此处申请 API 访问权限](https://wvixbzgc0u7.feishu.cn/share/base/form/shrcnNStxEmuE7aY6jTW07CZHMf)!
* 🎁 **[2025-12-12]** 我们发布了支持多设备管理和任务分发的 **MCP-Server**。请参阅 [安装-快速开始](#-安装-快速开始) 和 [MCP-Server 配置](#可选-mcp-server-配置) 了解配置说明。
* 🎁 **[2025-12-01]** 感谢以下项目和作者提供量化工具及教程:[GGUF_v1](https://huggingface.co/bartowski/stepfun-ai_GELab-Zero-4B-preview-GGUF)、[GGUF_v2](https://huggingface.co/noctrex/GELab-Zero-4B-preview-GGUF)、[EXL3](https://huggingface.co/ArtusDev/stepfun-ai_GELab-Zero-4B-preview-EXL3)、[中文教程](http://xhslink.com/o/1WrmgHGWFYh)、[英文教程](https://www.youtube.com/watch?v=4BMiDyQOpos)。
* 🎁 **[2025-11-31]** 我们在 **[Hugging Face](https://huggingface.co/stepfun-ai/GELab-Zero-4B-preview)** 和 **[Model Scope](https://modelscope.cn/models/stepfun-ai/GELab-Zero-4B-preview)** 上发布了轻量级 **4B** 模型 GELab-Zero-4B-preview。
* 🎁 **[2025-11-31]** 我们发布了 **[AndroidDaily](https://huggingface.co/datasets/stepfun-ai/AndroidDaily)** 基准测试中的任务数据。
* 🎁 **[2025-11-30]** 我们发布了当前的 **GELab-Zero** 工程基础设施。
* 🎁 **[2025-10]** 我们关于 GELab-Engine 的 **[研究论文](https://github.com/summoneryhl/gelab-engine)** 被 **NeurIPS 2025** 录用。


## 📑 目录

Expand All @@ -39,15 +29,6 @@
- [📝 引用](#-引用)


## 📧 联系我们

欢迎加入我们的微信群与我们联系和交流:

| WeChat Group |
|:-------------------------:|
| <img src="images/wechat_group2.jpeg" width="200"> |


## 📖 背景

随着 AI 体验日益深入消费级终端设备,移动 Agent 研究正处于从 **“可行性验证”** 向 **“大规模应用”** 转型的关键节点。虽然基于 GUI 的方案具有通用兼容性,但移动生态的碎片化带来了沉重的工程负担,阻碍了创新。GELab-Zero 旨在打破这些壁垒。
Expand Down Expand Up @@ -361,10 +342,43 @@ cd gelab-zero
# 安装依赖
pip install -r requirements.txt

# 运行单个任务推理示例
# 运行单个任务推理示例(命令行方式)
python examples/run_single_task.py

# 或使用 Web UI(推荐)
python start_web_ui.py
```

#### Web UI 功能特性

Web UI 提供了更友好的交互方式,界面分为左右两栏布局:

**左栏 - 控制面板**

| 模块 | 功能 |
|------|------|
| **📱 设备管理** | 检查设备状态、查看设备列表、重启 ADB 服务 |
| **📶 无线调试** | 通过 IP 地址无线连接设备、启用 TCP/IP 模式、断开连接 |
| **📊 任务监控** | 查看任务状态(就绪/运行中/等待输入)、选择历史 Session |
| **💬 命令/回复** | 输入任务指令或回复 Agent 询问,支持 `Ctrl+Enter` 快捷提交 |
| **⚙️ 参数配置** | 选择模型提供商(从 `model_config.yaml` 自动加载)、设置 Base URL 和 API Key |
| **� 实用工具** | 启动 scrcpy 屏幕镜像、获取手机应用列表 |

**右栏 - 任务展示**

| 模块 | 功能 |
|------|------|
| **📱 任务轨迹** | 可视化回放每个执行步骤,包含截图、思考过程、动作详情 |
| **📋 实时日志** | 实时显示任务执行的终端输出,支持清空和复制 |

**交互优化**

- **🔄 智能滚动**:任务运行时自动滚动到最新内容;任务结束后停止滚动,可自由翻阅历史日志
- **🖼️ 图片放大**:点击轨迹中的截图可放大查看,支持下载
- **⌨️ 快捷键**:`Ctrl+Enter` 快速提交命令/回复

启动 Web UI 后,在浏览器中访问 `http://localhost:8866` 即可使用。

### (可选)Step 4: 轨迹可视化环境搭建

任务轨迹会默认保存在 `running_log/server_log/os-copilot-local-eval-logs/` 目录下。你可以使用 streamlit 对轨迹进行可视化:
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
24 changes: 21 additions & 3 deletions copilot_front_end/mobile_action_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,9 +134,27 @@ def dectect_screen_on(device_id, print_command = False):
command = f"{adb_command} shell dumpsys display"
if print_command:
print(f"Executing command: {command}")
result = subprocess.run(command, shell=True, capture_output=True, text=True)
result.stdout = result.stdout.encode('utf-8').decode('utf-8')
screen_state = local_str_grep(result.stdout, "mScreenState").strip()

# Use text=False (or capture_output=True default) to get bytes, avoiding implicit decoding errors
result = subprocess.run(command, shell=True, capture_output=True, text=False)

if result.stdout:
# Decode carefully, ignoring errors if necessary
# Try utf-8 first, then gbk, or just replace errors
try:
# ADB output is usually UTF-8, but on Windows shell it might get mixed.
# using errors='ignore' or 'replace' is safest for logging/grepping
output_str = result.stdout.decode('utf-8', errors='replace')
except Exception:
output_str = result.stdout.decode('gbk', errors='replace')
else:
output_str = ""

screen_state = local_str_grep(output_str, "mScreenState")
if screen_state:
screen_state = screen_state.strip()
else:
screen_state = ""
else:
command = f"{adb_command} shell dumpsys display | grep mScreenState"
if print_command:
Expand Down
Binary file not shown.
93 changes: 70 additions & 23 deletions examples/run_single_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,49 +63,96 @@ def timed_automate_step(payload):
server_instance.automate_step = timed_automate_step

if __name__ == "__main__":
import argparse

parser = argparse.ArgumentParser(description="Run a single task solely.")
parser.add_argument("task", type=str, nargs='?', help="The task description.")
parser.add_argument("--device-id", type=str, help="The device ID to use.")
parser.add_argument("--model", type=str, default="gelab-zero-4b-preview", help="Model name.")
parser.add_argument("--base-url", type=str, help="Base URL for the model API.")
parser.add_argument("--api-key", type=str, help="API Key for the model.")

args = parser.parse_args()

# task = "打开微信,给柏茗,发helloworld"
# task = "打开 给到 app,在主页,下滑寻找,员工权益-奋斗食代,帮我领劵。如果不能领取就退出。"
# task = "open wechat to send a message 'helloworld' to 'TKJ'"
#task = "去淘宝帮我买本书"
if len(sys.argv) < 2:
if not args.task:
print("❌ 错误:未传入任务参数!")
print("📝 使用方法:")
print(f" python {sys.argv[0]} \"你的任务描述\"")
print(f" python {sys.argv[0]} \"你的任务描述\" [options]")
print(" 示例1:python script.py \"去淘宝帮我买本书\"")
print(" 示例2:python script.py \"打开微信,给柏茗发helloworld\"")
sys.exit(1)

task = ' '.join(sys.argv[1:])
print(" 示例2:python script.py \"打开微信,给柏茗发helloworld\" --device-id 123456")
sys.exit(1)

task = args.task

# Use provided device_id or find the first available one
if args.device_id:
device_id = args.device_id
# Verify device is connected
available_devices = list_devices()
if device_id not in available_devices:
print(f"Warning: Device {device_id} not found in connected devices: {available_devices}")
else:
devices = list_devices()
if not devices:
print("❌ Error: No devices connected.")
sys.exit(1)
device_id = devices[0]
print(f"Auto-selected device: {device_id}")

# The device ID you want to use
device_id = list_devices()[0]
device_wm_size = get_device_wm_size(device_id)
device_info = {
"device_id": device_id,
"device_wm_size": device_wm_size
}



tmp_rollout_config = local_model_config
# Update model configuration based on arguments
tmp_rollout_config = local_model_config.copy()
if args.model:
tmp_rollout_config["model_config"]["model_name"] = args.model

if args.base_url or args.api_key:
# Switch provider to openai if URL/Key provided, or keep local if just overriding local params?
# Assuming if URL is provided, we might want to treat it as an OpenAI-compatible endpoint
# BUT for now, let's just inject these into args or model_config if the backend supports it.
# Looking at local_server.py might be needed to see how it handles base_url/api_key.
# For 'local' provider, it might not use them. Let's assume user knows what they are doing.
# If it is 'custom' or 'openai', provider might need to change.
# FOR NOW: We just update the 'args' or specific keys if the server class supports it.

# NOTE: The current LocalServer implementation details are not fully visible here.
# But commonly these are passed in model_config.
if args.base_url:
tmp_rollout_config["model_config"]["base_url"] = args.base_url
if args.api_key:
tmp_rollout_config["model_config"]["api_key"] = args.api_key

# If external URL is used, we might need to change provider from 'local' to 'openai' or similar if logic dictates
if args.base_url and "local" in tmp_rollout_config["model_config"]["model_provider"]:
# Heuristic: if base_url is set, it's likely not just 'local' weights but an invalidference server
pass

# Ensure log directories exist
if "log_dir" in tmp_server_config and not os.path.exists(tmp_server_config["log_dir"]):
os.makedirs(tmp_server_config["log_dir"], exist_ok=True)
if "image_dir" in tmp_server_config and not os.path.exists(tmp_server_config["image_dir"]):
os.makedirs(tmp_server_config["image_dir"], exist_ok=True)

# Use tmp_server_config for LocalServer initialization as it expects log_dir etc.
l2_server = LocalServer(tmp_server_config)

# 注入计时逻辑
wrap_automate_step_with_timing(l2_server)

# 执行任务并计总时间
total_start = time.time()

print(f"Starting task: {task}")
print(f"Device: {device_id}")
print(f"Model: {tmp_rollout_config['model_config']['model_name']}")

# Disable auto reply
evaluate_task_on_device(l2_server, device_info, task, tmp_rollout_config, reflush_app=True)
total_time = time.time() - total_start

# 在最后加一行总时间
print(f"总计执行时间为 {total_time} 秒")

pass
# Enable auto reply
# evaluate_task_on_device(l2_server, device_info, task, tmp_rollout_config, reflush_app=True, auto_reply=True)



pass
2 changes: 1 addition & 1 deletion model_config.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哥们可以啊

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哈哈,有个前端用起来方便,不过我还是等能够有一个合适的注入暂停

Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ local:

stepfun:
api_base: "https://api.stepfun.com/v1"
api_key: "EMPTY"
api_key: ""
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ tqdm
requests

fastmcp
gradio
Binary file added scrcpy-win64-v3.3.3/AdbWinApi.dll
Binary file not shown.
Binary file added scrcpy-win64-v3.3.3/AdbWinUsbApi.dll
Binary file not shown.
Binary file added scrcpy-win64-v3.3.3/SDL2.dll
Binary file not shown.
Binary file added scrcpy-win64-v3.3.3/adb.exe
Binary file not shown.
Binary file added scrcpy-win64-v3.3.3/avcodec-61.dll
Binary file not shown.
Binary file added scrcpy-win64-v3.3.3/avformat-61.dll
Binary file not shown.
Binary file added scrcpy-win64-v3.3.3/avutil-59.dll
Binary file not shown.
Binary file added scrcpy-win64-v3.3.3/icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scrcpy-win64-v3.3.3/libusb-1.0.dll
Binary file not shown.
1 change: 1 addition & 0 deletions scrcpy-win64-v3.3.3/open_a_terminal_here.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@cmd
2 changes: 2 additions & 0 deletions scrcpy-win64-v3.3.3/scrcpy-console.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
@echo off
scrcpy.exe --pause-on-exit=if-error %*
7 changes: 7 additions & 0 deletions scrcpy-win64-v3.3.3/scrcpy-noconsole.vbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
strCommand = "cmd /c scrcpy.exe"

For Each Arg In WScript.Arguments
strCommand = strCommand & " """ & replace(Arg, """", """""""""") & """"
Next

CreateObject("Wscript.Shell").Run strCommand, 0, false
Binary file added scrcpy-win64-v3.3.3/scrcpy-server
Binary file not shown.
Binary file added scrcpy-win64-v3.3.3/scrcpy.exe
Binary file not shown.
Binary file added scrcpy-win64-v3.3.3/swresample-5.dll
Binary file not shown.
Loading