stepfun-ai · flyfox666 · Dec 19, 2025 · Dec 19, 2025 · Dec 19, 2025 · Dec 20, 2025
diff --git a/.gitignore b/.gitignore
@@ -5,4 +5,5 @@ output/
 running_log
 gelab-zero-4b-preview/
 
-model_config.yaml
+model_config.yaml
+venv/
diff --git a/README.md b/README.md
@@ -1,4 +1,5 @@
-![GELab-Zero Main Image](./images/main_en.png)
+
+<img width="1920" height="1606" alt="Stepfun-ai-gelab-zero-12-21-2025_09_05_PM" src="https://github.com/user-attachments/assets/b8f2227b-56c6-4f60-ba86-4dcbe7b49835" />
 
 > 👋 Hi, everyone! We are proud to present the first fully open-source GUI Agent with both model and infrastructure. Our solution features plug-and-play engineering with no cloud dependencies, giving you complete privacy control.
 
@@ -16,18 +17,6 @@
   <a href="./README_CN.md">简体中文</a>
 </p>
 
-## 📰 News
-
-* 🎁 **[2025-12-18]** We release **Step-GUI Technical Report** on [**arXiv**](https://arxiv.org/abs/2512.15431)!
-* 🎁 **[2025-12-18]** We release a more powerful **API** for GUI automation tasks. [Apply for API access here](https://wvixbzgc0u7.feishu.cn/share/base/form/shrcnNStxEmuE7aY6jTW07CZHMf)!
-* 🎁 **[2025-12-12]** We release **MCP-Server** support for multi-device management and task distribution. See [Installation & Quick Start](#-installation-quick-start) and [MCP-Server Setup](#optional-mcp-server-setup) for setup instructions.
-* 🎁 **[2025-12-1]** We thank the following projects and authors for providing quantization tools & tutorials: [GGUF_v1](https://huggingface.co/bartowski/stepfun-ai_GELab-Zero-4B-preview-GGUF), [GGUF_v2](https://huggingface.co/noctrex/GELab-Zero-4B-preview-GGUF), [EXL3](https://huggingface.co/ArtusDev/stepfun-ai_GELab-Zero-4B-preview-EXL3), [Tutorials_CN](http://xhslink.com/o/1WrmgHGWFYh), [Tutorials_EN](https://www.youtube.com/watch?v=4BMiDyQOpos)
-* 🎁 **[2025-11-31]** We release a lightweight **4B** model GELab-Zero-4B-preview on [**Hugging Face**](https://huggingface.co/stepfun-ai/GELab-Zero-4B-preview) and [**Model Scope**](https://modelscope.cn/models/stepfun-ai/GELab-Zero-4B-preview).
-* 🎁 **[2025-11-31]** We release the tasks from the [**AndroidDaily**](https://huggingface.co/datasets/stepfun-ai/AndroidDaily) benchmark.
-* 🎁 **[2025-11-30]** We release the current **GELab-Zero** engineering infrastructure.
-* 🎁 **[2025-10]** Our [**research**](https://github.com/summoneryhl/gelab-engine) paper on GELab-Engine is accepted by **NeurIPS 2025**.
-
-
 
 ## 📑 Table of Contents
 
@@ -38,15 +27,6 @@
 - [📝 Citation](#-citation)
 
 
-## 📧 Contact
-
-You can contact us and communicate with us by joining our WeChat group:
-
-| WeChat Group |
-|:-------------------------:|
-| <img src="images/wechat_group2.jpeg" width="200"> |
-
-
 
 ## 📖 Background
 
@@ -396,10 +376,43 @@ cd gelab-zero
 # Install dependencies
 pip install -r requirements.txt
 
-# To inference a single task
+# To inference a single task (Command Line)
 python examples/run_single_task.py
+
+# Or use the Web UI (Recommended)
+python start_web_ui.py
 ```
 
+#### Web UI Features
+
+The Web UI provides a more user-friendly way to interact with GELab-Zero, featuring a two-column layout:
+
+**Left Panel - Control**
+
+| Module | Features |
+|--------|----------|
+| **📱 Device Management** | Check device status, view device list, restart ADB service |
+| **📶 Wireless Debugging** | Connect device via IP address, enable TCP/IP mode, disconnect |
+| **📊 Task Monitoring** | View task status (Ready/Running/Waiting for Input), select historical Sessions |
+| **💬 Command/Reply** | Enter task instructions or reply to Agent queries, supports `Ctrl+Enter` shortcut |
+| **⚙️ Model Configuration** | Select model provider (auto-loaded from `model_config.yaml`), set Base URL and API Key |
+| **🛠 Utilities** | Launch scrcpy screen mirroring, get installed app list |
+
+**Right Panel - Display**
+
+| Module | Features |
+|--------|----------|
+| **📱 Task Trajectory** | Visual replay of each execution step, including screenshots, thought process, and action details |
+| **📋 Real-time Logs** | Real-time display of task execution terminal output, with clear and copy buttons |
+
+**Interaction Enhancements**
+
+- **🔄 Smart Auto-scroll**: Auto-scrolls during task execution; stops when task completes, allowing free navigation through history
+- **🖼️ Image Lightbox**: Click screenshots in trajectory to view full-size, with download support
+- **⌨️ Keyboard Shortcut**: `Ctrl+Enter` to quickly submit commands/replies
+
+After starting the Web UI, open your browser and go to `http://localhost:8866` to access the interface.
+
 ### (Optional) Step 4: Trajectory Visualization Environment Setup
 
 The trajectory will be defult saved in the `running_log/server_log/os-copilot-local-eval-logs/` directory. You can visualize the trajectory using streamlit:

diff --git a/README_CN.md b/README_CN.md
@@ -1,5 +1,6 @@
 
-![GELab-Zero 主图](./images/main_cn.png)
+<img width="1920" height="1606" alt="Stepfun-ai-gelab-zero-12-21-2025_09_05_PM" src="https://github.com/user-attachments/assets/d1ab326c-5927-4f81-95a4-c6c72ba521ea" />
+
 
 
 > 👋 hi大家好！我们很荣幸推出首个同时包含模型和基础设施的全开源 GUI Agent。我们的解决方案主打即插即用的工程化体验，无需依赖云端，赋予您完全的隐私控制权。
@@ -18,17 +19,6 @@
   <a href="./README_CN.md">简体中文</a>
 </p>
 
-## 📰 新闻
-
-* 🎁 **[2025-12-18]** 我们在 **[arXiv](https://arxiv.org/abs/2512.15431)** 上发布了 **Step-GUI 技术报告**！
-* 🎁 **[2025-12-18]** 我们发布了更强大的 GUI 自动化任务 **API**。[点击此处申请 API 访问权限](https://wvixbzgc0u7.feishu.cn/share/base/form/shrcnNStxEmuE7aY6jTW07CZHMf)！
-* 🎁 **[2025-12-12]** 我们发布了支持多设备管理和任务分发的 **MCP-Server**。请参阅 [安装-快速开始](#-安装-快速开始) 和 [MCP-Server 配置](#可选-mcp-server-配置) 了解配置说明。
-* 🎁 **[2025-12-01]** 感谢以下项目和作者提供量化工具及教程：[GGUF_v1](https://huggingface.co/bartowski/stepfun-ai_GELab-Zero-4B-preview-GGUF)、[GGUF_v2](https://huggingface.co/noctrex/GELab-Zero-4B-preview-GGUF)、[EXL3](https://huggingface.co/ArtusDev/stepfun-ai_GELab-Zero-4B-preview-EXL3)、[中文教程](http://xhslink.com/o/1WrmgHGWFYh)、[英文教程](https://www.youtube.com/watch?v=4BMiDyQOpos)。
-* 🎁 **[2025-11-31]** 我们在 **[Hugging Face](https://huggingface.co/stepfun-ai/GELab-Zero-4B-preview)** 和 **[Model Scope](https://modelscope.cn/models/stepfun-ai/GELab-Zero-4B-preview)** 上发布了轻量级 **4B** 模型 GELab-Zero-4B-preview。
-* 🎁 **[2025-11-31]** 我们发布了 **[AndroidDaily](https://huggingface.co/datasets/stepfun-ai/AndroidDaily)** 基准测试中的任务数据。
-* 🎁 **[2025-11-30]** 我们发布了当前的 **GELab-Zero** 工程基础设施。
-* 🎁 **[2025-10]** 我们关于 GELab-Engine 的 **[研究论文](https://github.com/summoneryhl/gelab-engine)** 被 **NeurIPS 2025** 录用。
-
 
 ## 📑 目录
 
@@ -39,15 +29,6 @@
 - [📝 引用](#-引用)
 
 
-## 📧 联系我们
-
-欢迎加入我们的微信群与我们联系和交流：
-
-| WeChat Group |
-|:-------------------------:|
-| <img src="images/wechat_group2.jpeg" width="200"> |
-
-
 ## 📖 背景
 
 随着 AI 体验日益深入消费级终端设备，移动 Agent 研究正处于从 **“可行性验证”** 向 **“大规模应用”** 转型的关键节点。虽然基于 GUI 的方案具有通用兼容性，但移动生态的碎片化带来了沉重的工程负担，阻碍了创新。GELab-Zero 旨在打破这些壁垒。
@@ -361,10 +342,43 @@ cd gelab-zero
 # 安装依赖
 pip install -r requirements.txt
 
-# 运行单个任务推理示例
+# 运行单个任务推理示例（命令行方式）
 python examples/run_single_task.py
+
+# 或使用 Web UI（推荐）
+python start_web_ui.py
 ```
 
+#### Web UI 功能特性
+
+Web UI 提供了更友好的交互方式，界面分为左右两栏布局：
+
+**左栏 - 控制面板**
+
+| 模块 | 功能 |
+|------|------|
+| **📱 设备管理** | 检查设备状态、查看设备列表、重启 ADB 服务 |
+| **📶 无线调试** | 通过 IP 地址无线连接设备、启用 TCP/IP 模式、断开连接 |
+| **📊 任务监控** | 查看任务状态（就绪/运行中/等待输入）、选择历史 Session |
+| **💬 命令/回复** | 输入任务指令或回复 Agent 询问，支持 `Ctrl+Enter` 快捷提交 |
+| **⚙️ 参数配置** | 选择模型提供商（从 `model_config.yaml` 自动加载）、设置 Base URL 和 API Key |
+| **� 实用工具** | 启动 scrcpy 屏幕镜像、获取手机应用列表 |
+
+**右栏 - 任务展示**
+
+| 模块 | 功能 |
+|------|------|
+| **📱 任务轨迹** | 可视化回放每个执行步骤，包含截图、思考过程、动作详情 |
+| **📋 实时日志** | 实时显示任务执行的终端输出，支持清空和复制 |
+
+**交互优化**
+
+- **🔄 智能滚动**：任务运行时自动滚动到最新内容；任务结束后停止滚动，可自由翻阅历史日志
+- **🖼️ 图片放大**：点击轨迹中的截图可放大查看，支持下载
+- **⌨️ 快捷键**：`Ctrl+Enter` 快速提交命令/回复
+
+启动 Web UI 后，在浏览器中访问 `http://localhost:8866` 即可使用。
+
 ### （可选）Step 4: 轨迹可视化环境搭建
 
 任务轨迹会默认保存在 `running_log/server_log/os-copilot-local-eval-logs/` 目录下。你可以使用 streamlit 对轨迹进行可视化：

diff --git a/copilot_agent_client/__pycache__/pu_client.cpython-312.pyc b/copilot_agent_client/__pycache__/pu_client.cpython-312.pyc
diff --git a/copilot_agent_server/__pycache__/__init__.cpython-312.pyc b/copilot_agent_server/__pycache__/__init__.cpython-312.pyc
diff --git a/copilot_agent_server/__pycache__/base_logger.cpython-312.pyc b/copilot_agent_server/__pycache__/base_logger.cpython-312.pyc
diff --git a/copilot_agent_server/__pycache__/base_server.cpython-312.pyc b/copilot_agent_server/__pycache__/base_server.cpython-312.pyc
diff --git a/copilot_agent_server/__pycache__/local_server.cpython-312.pyc b/copilot_agent_server/__pycache__/local_server.cpython-312.pyc
diff --git a/copilot_agent_server/__pycache__/local_server_logger.cpython-312.pyc b/copilot_agent_server/__pycache__/local_server_logger.cpython-312.pyc
diff --git a/copilot_agent_server/__pycache__/parser_factory.cpython-312.pyc b/copilot_agent_server/__pycache__/parser_factory.cpython-312.pyc
diff --git a/copilot_front_end/__pycache__/mobile_action_helper.cpython-312.pyc b/copilot_front_end/__pycache__/mobile_action_helper.cpython-312.pyc
diff --git a/copilot_front_end/__pycache__/package_map.cpython-312.pyc b/copilot_front_end/__pycache__/package_map.cpython-312.pyc
diff --git a/copilot_front_end/__pycache__/pu_frontend_executor.cpython-312.pyc b/copilot_front_end/__pycache__/pu_frontend_executor.cpython-312.pyc
diff --git a/copilot_front_end/mobile_action_helper.py b/copilot_front_end/mobile_action_helper.py
@@ -134,9 +134,27 @@ def dectect_screen_on(device_id, print_command = False):
         command = f"{adb_command} shell dumpsys display"
         if print_command:
             print(f"Executing command: {command}")
-        result = subprocess.run(command, shell=True, capture_output=True, text=True)
-        result.stdout = result.stdout.encode('utf-8').decode('utf-8')
-        screen_state = local_str_grep(result.stdout, "mScreenState").strip()
+
+        # Use text=False (or capture_output=True default) to get bytes, avoiding implicit decoding errors
+        result = subprocess.run(command, shell=True, capture_output=True, text=False)
+
+        if result.stdout:
+            # Decode carefully, ignoring errors if necessary
+            # Try utf-8 first, then gbk, or just replace errors
+            try:
+                # ADB output is usually UTF-8, but on Windows shell it might get mixed. 
+                # using errors='ignore' or 'replace' is safest for logging/grepping
+                output_str = result.stdout.decode('utf-8', errors='replace')
+            except Exception:
+                output_str = result.stdout.decode('gbk', errors='replace')
+        else:
+            output_str = ""
+
+        screen_state = local_str_grep(output_str, "mScreenState")
+        if screen_state:
+            screen_state = screen_state.strip()
+        else:
+            screen_state = ""
     else:
         command = f"{adb_command} shell dumpsys display | grep mScreenState"
         if print_command:

diff --git a/copilot_tools/__pycache__/parser_0920_summary.cpython-312.pyc b/copilot_tools/__pycache__/parser_0920_summary.cpython-312.pyc
diff --git a/examples/run_single_task.py b/examples/run_single_task.py
@@ -63,49 +63,96 @@ def timed_automate_step(payload):
     server_instance.automate_step = timed_automate_step
 
 if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser(description="Run a single task solely.")
+    parser.add_argument("task", type=str, nargs='?', help="The task description.")
+    parser.add_argument("--device-id", type=str, help="The device ID to use.")
+    parser.add_argument("--model", type=str, default="gelab-zero-4b-preview", help="Model name.")
+    parser.add_argument("--base-url", type=str, help="Base URL for the model API.")
+    parser.add_argument("--api-key", type=str, help="API Key for the model.")
+
+    args = parser.parse_args()
 
-     # task = "打开微信，给柏茗，发helloworld"
-    # task = "打开 给到 app，在主页，下滑寻找，员工权益-奋斗食代，帮我领劵。如果不能领取就退出。"
-    # task = "open wechat to send a message 'helloworld' to 'TKJ'"
-    #task = "去淘宝帮我买本书"
-    if len(sys.argv) < 2:
+    if not args.task:
         print("❌ 错误：未传入任务参数！")
         print("📝 使用方法：")
-        print(f"   python {sys.argv[0]} \"你的任务描述\"")
+        print(f"   python {sys.argv[0]} \"你的任务描述\" [options]")
         print("   示例1：python script.py \"去淘宝帮我买本书\"")
-        print("   示例2：python script.py \"打开微信，给柏茗发helloworld\"")
-        sys.exit(1)  
-
-    task = ' '.join(sys.argv[1:])
+        print("   示例2：python script.py \"打开微信，给柏茗发helloworld\" --device-id 123456")
+        sys.exit(1)
+
+    task = args.task
+
+    # Use provided device_id or find the first available one
+    if args.device_id:
+        device_id = args.device_id
+        # Verify device is connected
+        available_devices = list_devices()
+        if device_id not in available_devices:
+             print(f"Warning: Device {device_id} not found in connected devices: {available_devices}")
+    else:
+        devices = list_devices()
+        if not devices:
+            print("❌ Error: No devices connected.")
+            sys.exit(1)
+        device_id = devices[0]
+        print(f"Auto-selected device: {device_id}")
 
-    # The device ID you want to use
-    device_id = list_devices()[0]
     device_wm_size = get_device_wm_size(device_id)
     device_info = {
         "device_id": device_id,
         "device_wm_size": device_wm_size
     }
 
-
-
-    tmp_rollout_config = local_model_config
+    # Update model configuration based on arguments
+    tmp_rollout_config = local_model_config.copy()
+    if args.model:
+        tmp_rollout_config["model_config"]["model_name"] = args.model
+
+    if args.base_url or args.api_key:
+        # Switch provider to openai if URL/Key provided, or keep local if just overriding local params?
+        # Assuming if URL is provided, we might want to treat it as an OpenAI-compatible endpoint
+        # BUT for now, let's just inject these into args or model_config if the backend supports it.
+        # Looking at local_server.py might be needed to see how it handles base_url/api_key.
+        # For 'local' provider, it might not use them. Let's assume user knows what they are doing.
+        # If it is 'custom' or 'openai', provider might need to change.
+        # FOR NOW: We just update the 'args' or specific keys if the server class supports it.
+
+        # NOTE: The current LocalServer implementation details are not fully visible here. 
+        # But commonly these are passed in model_config.
+        if args.base_url:
+             tmp_rollout_config["model_config"]["base_url"] = args.base_url
+        if args.api_key:
+             tmp_rollout_config["model_config"]["api_key"] = args.api_key
+
+        # If external URL is used, we might need to change provider from 'local' to 'openai' or similar if logic dictates
+        if args.base_url and "local" in tmp_rollout_config["model_config"]["model_provider"]:
+             # Heuristic: if base_url is set, it's likely not just 'local' weights but an invalidference server
+             pass
+
+    # Ensure log directories exist
+    if "log_dir" in tmp_server_config and not os.path.exists(tmp_server_config["log_dir"]):
+        os.makedirs(tmp_server_config["log_dir"], exist_ok=True)
+    if "image_dir" in tmp_server_config and not os.path.exists(tmp_server_config["image_dir"]):
+        os.makedirs(tmp_server_config["image_dir"], exist_ok=True)
+
+    # Use tmp_server_config for LocalServer initialization as it expects log_dir etc.
     l2_server = LocalServer(tmp_server_config)
 
     # 注入计时逻辑
     wrap_automate_step_with_timing(l2_server)
+
     # 执行任务并计总时间
     total_start = time.time()
+
+    print(f"Starting task: {task}")
+    print(f"Device: {device_id}")
+    print(f"Model: {tmp_rollout_config['model_config']['model_name']}")
+
     # Disable auto reply
     evaluate_task_on_device(l2_server, device_info, task, tmp_rollout_config, reflush_app=True)
     total_time = time.time() - total_start
 
     # 在最后加一行总时间
     print(f"总计执行时间为 {total_time} 秒")
-
-    pass
-    # Enable auto reply
-    # evaluate_task_on_device(l2_server, device_info, task, tmp_rollout_config, reflush_app=True, auto_reply=True)
-
-
-
-    pass
diff --git a/model_config.yaml b/model_config.yaml
@@ -4,4 +4,4 @@ local:
 
 stepfun:
     api_base: "https://api.stepfun.com/v1"
-    api_key: "EMPTY"
+    api_key: ""
diff --git a/requirements.txt b/requirements.txt
@@ -17,3 +17,4 @@ tqdm
 requests
 
 fastmcp
+gradio
diff --git a/scrcpy-win64-v3.3.3/AdbWinApi.dll b/scrcpy-win64-v3.3.3/AdbWinApi.dll
diff --git a/scrcpy-win64-v3.3.3/AdbWinUsbApi.dll b/scrcpy-win64-v3.3.3/AdbWinUsbApi.dll
diff --git a/scrcpy-win64-v3.3.3/SDL2.dll b/scrcpy-win64-v3.3.3/SDL2.dll
diff --git a/scrcpy-win64-v3.3.3/adb.exe b/scrcpy-win64-v3.3.3/adb.exe
diff --git a/scrcpy-win64-v3.3.3/avcodec-61.dll b/scrcpy-win64-v3.3.3/avcodec-61.dll
diff --git a/scrcpy-win64-v3.3.3/avformat-61.dll b/scrcpy-win64-v3.3.3/avformat-61.dll
diff --git a/scrcpy-win64-v3.3.3/avutil-59.dll b/scrcpy-win64-v3.3.3/avutil-59.dll
diff --git a/scrcpy-win64-v3.3.3/icon.png b/scrcpy-win64-v3.3.3/icon.png
diff --git a/scrcpy-win64-v3.3.3/libusb-1.0.dll b/scrcpy-win64-v3.3.3/libusb-1.0.dll
diff --git a/scrcpy-win64-v3.3.3/open_a_terminal_here.bat b/scrcpy-win64-v3.3.3/open_a_terminal_here.bat
@@ -0,0 +1 @@
+@cmd
diff --git a/scrcpy-win64-v3.3.3/scrcpy-console.bat b/scrcpy-win64-v3.3.3/scrcpy-console.bat
@@ -0,0 +1,2 @@
+@echo off
+scrcpy.exe --pause-on-exit=if-error %*
diff --git a/scrcpy-win64-v3.3.3/scrcpy-noconsole.vbs b/scrcpy-win64-v3.3.3/scrcpy-noconsole.vbs
@@ -0,0 +1,7 @@
+strCommand = "cmd /c scrcpy.exe"
+
+For Each Arg In WScript.Arguments
+    strCommand = strCommand & " """ & replace(Arg, """", """""""""") & """"
+Next
+
+CreateObject("Wscript.Shell").Run strCommand, 0, false
diff --git a/scrcpy-win64-v3.3.3/scrcpy-server b/scrcpy-win64-v3.3.3/scrcpy-server
diff --git a/scrcpy-win64-v3.3.3/scrcpy.exe b/scrcpy-win64-v3.3.3/scrcpy.exe
diff --git a/scrcpy-win64-v3.3.3/swresample-5.dll b/scrcpy-win64-v3.3.3/swresample-5.dll
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		@echo off
		scrcpy.exe --pause-on-exit=if-error %*