Skip to content

Commit ab0f5a5

Browse files
Nikita Aclaude
andcommitted
feat: Claude vision game identification — @path prompt, no timeout, end-to-end PASS
## What changed ### knowledge_base.py - `_research_with_claude`: uses `@{screenshot_path}` prefix so `claude -p` visually analyzes the screenshot (was: text path reference — Claude couldn't see the image) - Removed `timeout=120` entirely — research completes however long it takes - Returns 4-tuple `(tactic, best_practices, identified_game, identified_state)` - `get_game_context`: uses Claude's returned game_name/current_state for Qdrant ingest and `current_game_name` attribute (not the caller's "unknown") - `get_game_context`: skips Qdrant search when `game_name == "unknown"` to avoid false hits on stale "unknown" entries ### vision_loop.py - `_analyze_screen`: reuses `kb.current_game_name` after first identification so subsequent captures hit Qdrant cache (was: always passed "unknown") ### .gitignore - Added `sessions/` (runtime screenshots) and `.claude/` to ignore list ## Live test (2026-02-10) - elden1.jpg → Claude identified "Elden Ring / Boss fight: Malenia Phase 2 (Scarlet Aeonia attack)" — PASS - Qdrant ingest: 2 points (tactic + best_practices) — PASS - Supabase user_history row saved for game='Elden Ring' — PASS - docs/PHASE_5.md: full e2e pipeline test report added - README.md: rewritten to describe gameplay companion architecture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 6013052 commit ab0f5a5

18 files changed

Lines changed: 402 additions & 204 deletions

File tree

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,12 @@ models/piper_voice/*
6161
legacy/
6262
chat_history/
6363
knowledge_base/
64+
65+
# Runtime session data (screenshots, etc.)
66+
sessions/
67+
68+
# Claude Code project directory
69+
.claude/
6470
submodules/MeloTTS
6571
openapi_assistants.json
6672
openapi_memgpt.json

README.md

Lines changed: 91 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -1,158 +1,125 @@
1-
![](./assets/banner.jpg)
1+
# Gameplay Companion — Self-Learning AI Coach
22

3-
<h1 align="center">Open-LLM-VTuber</h1>
4-
<h3 align="center">
3+
A real-time gameplay coaching companion built on top of Open-LLM-VTuber.
4+
Watches your screen, identifies the game, gives specific tactical advice via a Live2D avatar,
5+
and remembers what it has told you across sessions.
56

6-
[![GitHub release](https://img.shields.io/github/v/release/Open-LLM-VTuber/Open-LLM-VTuber)](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/releases)
7-
[![license](https://img.shields.io/github/license/Open-LLM-VTuber/Open-LLM-VTuber)](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/blob/master/LICENSE)
8-
[![CodeQL](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/actions/workflows/codeql.yml/badge.svg)](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/actions/workflows/codeql.yml)
9-
[![Ruff](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/actions/workflows/ruff.yml/badge.svg)](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/actions/workflows/ruff.yml)
10-
[![Docker](https://img.shields.io/badge/Open-LLM-VTuber%2FOpen--LLM--VTuber-%25230db7ed.svg?logo=docker&logoColor=blue&labelColor=white&color=blue)](https://hub.docker.com/r/Open-LLM-VTuber/open-llm-vtuber)
11-
[![QQ User Group](https://img.shields.io/badge/QQ_User_Group-792615362-white?style=flat&logo=qq&logoColor=white)](https://qm.qq.com/q/ngvNUQpuKI)
12-
[![Static Badge](https://img.shields.io/badge/Join%20Chat-Zulip?style=flat&logo=zulip&label=Zulip(dev-community)&color=blue&link=https%3A%2F%2Folv.zulipchat.com)](https://olv.zulipchat.com)
7+
---
138

14-
> **📢 v2.0 Development**: We are focusing on Open-LLM-VTuber v2.0 — a complete rewrite of the codebase. v2.0 is currently in its early discussion and planning phase. We kindly ask you to refrain from opening new issues or pull requests for feature requests on v1. To participate in the v2 discussions or contribute, join our developer community on [Zulip](https://olv.zulipchat.com). Weekly meeting schedules will be announced on Zulip. We will continue fixing bugs for v1 and work through existing pull requests.
9+
## What it does
1510

16-
[![BuyMeACoffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-ffdd00?style=for-the-badge&logo=buy-me-a-coffee&logoColor=black)](https://www.buymeacoffee.com/yi.ting)
17-
[![](https://dcbadge.limes.pink/api/server/3UDA8YFDXx)](https://discord.gg/3UDA8YFDXx)
11+
1. **Watches your screen** — takes a screenshot every 15–25 s via PowerShell (WSL2 compatible)
12+
2. **Identifies the game** — sends the screenshot to `claude -p` with `@/path/to/screenshot.png`; Claude visually identifies the game, boss, current state, and returns a tactical JSON
13+
3. **Caches knowledge in Qdrant** — ingested tactic + best-practices vectors are reused on the next capture (no redundant research calls)
14+
4. **Delivers advice through a Live2D avatar** — proactive `ai-speak-signal` fires if screen changed; also responds to direct voice/text questions
15+
5. **Remembers the user** — Supabase stores every piece of advice given (`user_history`), long-term preferences (`long_term_profile`), and full session chat logs (`chat_logs`). Profile is injected into the system prompt at session start.
1816

19-
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/Open-LLM-VTuber/Open-LLM-VTuber)
17+
---
2018

21-
ENGLISH README | [中文 README](./README.CN.md) | [한국어 README](./README.KR.md) | [日本語 README](./README.JP.md)
19+
## Architecture
2220

23-
[Documentation](https://open-llm-vtuber.github.io/docs/quick-start) | [![Roadmap](https://img.shields.io/badge/Roadmap-GitHub_Project-yellow)](https://github.com/orgs/Open-LLM-VTuber/projects/2)
21+
```
22+
Screen (Windows display)
23+
↓ PowerShell CopyFromScreen every 15–25 s
24+
ScreenWatcher (daemon thread) ← pHash Hamming < 5 → skip AFK
25+
↓ screenshot path
26+
KnowledgeBase.get_game_context(@path)
27+
↓ game_name == "unknown" ↓ known game
28+
claude -p @screenshot Qdrant vector search (score ≥ 0.8)
29+
→ JSON: game, state, tactic → HIT: return cached tactic
30+
→ Qdrant ingest (tactic + tips) → MISS: claude research + ingest
31+
→ current_game_name updated
32+
↓ tactic text
33+
ai-speak-signal → WebSocketHandler → ConversationHandler
34+
→ BasicMemoryAgent → WSLClaudeLLM (claude -p subprocess)
35+
→ TTS → Live2D avatar speaks
36+
37+
MemoryManager.save_advice(game_name, advice) → Supabase user_history
38+
```
2439

25-
<a href="https://trendshift.io/repositories/12358" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12358" alt="Open-LLM-VTuber%2FOpen-LLM-VTuber | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
40+
### LLM
2641

27-
</h3>
42+
All inference uses `claude -p` (WSL subprocess, no API keys, no remote LLM calls).
43+
Sole provider: `WSLClaudeLLM` in `src/open_llm_vtuber/agent/stateless_llm/wsl_claude.py`.
2844

45+
### RAG
2946

30-
> 常见问题 Common Issues doc (Written in Chinese): https://docs.qq.com/pdf/DTFZGQXdTUXhIYWRq
31-
>
32-
> User Survey: https://forms.gle/w6Y6PiHTZr1nzbtWA
33-
>
34-
> 调查问卷(中文): https://wj.qq.com/s2/16150415/f50a/
47+
Qdrant cloud collection `game_knowledge` (384-dim Cosine, `sentence-transformers/all-minilm-l6-v2`).
48+
Game-specific tactics and best-practices are ingested on first encounter and reused thereafter.
3549

50+
### Memory
3651

52+
Supabase (PostgreSQL) tables:
53+
- `user_history` — every piece of advice given with game name + timestamp
54+
- `chat_logs` — full session history saved on disconnect
55+
- `long_term_profile` — persistent user preferences injected into system prompt
3756

38-
> :warning: This project is in its early stages and is currently under **active development**.
57+
---
3958

40-
> :warning: If you want to run the server remotely and access it on a different machine, such as running the server on your computer and access it on your phone, you will need to configure `https`, because the microphone on the front end will only launch in a secure context (a.k.a. https or localhost). See [MDN Web Doc](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia). Therefore, you should configure https with a reverse proxy to access the page on a remote machine (non-localhost).
59+
## Setup
4160

61+
### Prerequisites
62+
- WSL2 with Ubuntu
63+
- `claude` CLI installed and authenticated (`~/.local/bin/claude`)
64+
- Qdrant cloud account
65+
- Supabase project
4266

67+
### Install
4368

44-
## ⭐️ What is this project?
69+
```bash
70+
uv sync
71+
```
4572

73+
### Configure
4674

47-
**Open-LLM-VTuber** is a unique **voice-interactive AI companion** that not only supports **real-time voice conversations** and **visual perception** but also features a lively **Live2D avatar**. All functionalities can run completely offline on your computer!
75+
Copy `.env.example` to `.env` and fill in:
4876

49-
You can treat it as your personal AI companion — whether you want a `virtual girlfriend`, `boyfriend`, `cute pet`, or any other character, it can meet your expectations. The project fully supports `Windows`, `macOS`, and `Linux`, and offers two usage modes: web version and desktop client (with special support for **transparent background desktop pet mode**, allowing the AI companion to accompany you anywhere on your screen).
77+
```
78+
QDRANT_API_KEY=...
79+
QDRANT_CLUSTER_ENDPOINT=https://<id>.qdrant.io
80+
SUPABASE_URL=https://<id>.supabase.co
81+
SUPABASE_SECRET_KEY=sb_secret_...
82+
```
5083

51-
Although the long-term memory feature is temporarily removed (coming back soon), thanks to the persistent storage of chat logs, you can always continue your previous unfinished conversations without losing any precious interactive moments.
84+
Create the Supabase tables using `migrations/001_create_tables.sql` in the Supabase SQL editor.
5285

53-
In terms of backend support, we have integrated a rich variety of LLM inference, text-to-speech, and speech recognition solutions. If you want to customize your AI companion, you can refer to the [Character Customization Guide](https://open-llm-vtuber.github.io/docs/user-guide/live2d) to customize your AI companion's appearance and persona.
86+
### Run
5487

55-
The reason it's called `Open-LLM-Vtuber` instead of `Open-LLM-Companion` or `Open-LLM-Waifu` is because the project's initial development goal was to use open-source solutions that can run offline on platforms other than Windows to recreate the closed-source AI Vtuber `neuro-sama`.
88+
```bash
89+
uv run run_server.py
90+
```
5691

57-
### 👀 Demo
58-
| ![](assets/i1.jpg) | ![](assets/i2.jpg) |
59-
|:---:|:---:|
60-
| ![](assets/i3.jpg) | ![](assets/i4.jpg) |
92+
Open `http://localhost:12393` in a browser. Start playing a game — the companion will begin commenting within 15–25 seconds.
6193

94+
---
6295

63-
## ✨ Features & Highlights
96+
## Key files
6497

65-
- 🖥️ **Cross-platform support**: Perfect compatibility with macOS, Linux, and Windows. We support NVIDIA and non-NVIDIA GPUs, with options to run on CPU or use cloud APIs for resource-intensive tasks. Some components support GPU acceleration on macOS.
98+
| File | Role |
99+
|------|------|
100+
| `src/.../agent/stateless_llm/wsl_claude.py` | Claude subprocess LLM (only provider) |
101+
| `src/.../modules/knowledge_base.py` | Qdrant RAG + Claude visual research |
102+
| `src/.../modules/vision_loop.py` | Screen watcher + proactive triggers |
103+
| `src/.../integrations/memory_manager.py` | Supabase CRUD |
104+
| `src/.../service_context.py` | DI container + system prompt assembly |
105+
| `src/.../conversations/single_conversation.py` | Pipeline + advice saving |
106+
| `src/.../websocket_handler.py` | WS routing, session init, vision loop |
107+
| `migrations/001_create_tables.sql` | Supabase DDL |
108+
| `docs/PHASE_*.md` | Per-phase development logs with live test results |
66109

67-
- 🔒 **Offline mode support**: Run completely offline using local models - no internet required. Your conversations stay on your device, ensuring privacy and security.
68-
69-
- 💻 **Attractive and powerful web and desktop clients**: Offers both web version and desktop client usage modes, supporting rich interactive features and personalization settings. The desktop client can switch freely between window mode and desktop pet mode, allowing the AI companion to be by your side at all times.
70-
71-
- 🎯 **Advanced interaction features**:
72-
- 👁️ Visual perception, supporting camera, screen recording and screenshots, allowing your AI companion to see you and your screen
73-
- 🎤 Voice interruption without headphones (AI won't hear its own voice)
74-
- 🫱 Touch feedback, interact with your AI companion through clicks or drags
75-
- 😊 Live2D expressions, set emotion mapping to control model expressions from the backend
76-
- 🐱 Pet mode, supporting transparent background, global top-most, and mouse click-through - drag your AI companion anywhere on the screen
77-
- 💭 Display AI's inner thoughts, allowing you to see AI's expressions, thoughts and actions without them being spoken
78-
- 🗣️ AI proactive speaking feature
79-
- 💾 Chat log persistence, switch to previous conversations anytime
80-
- 🌍 TTS translation support (e.g., chat in Chinese while AI uses Japanese voice)
81-
82-
- 🧠 **Extensive model support**:
83-
- 🤖 Large Language Models (LLM): Ollama, OpenAI (and any OpenAI-compatible API), Gemini, Claude, Mistral, DeepSeek, Zhipu AI, GGUF, LM Studio, vLLM, etc.
84-
- 🎙️ Automatic Speech Recognition (ASR): sherpa-onnx, FunASR, Faster-Whisper, Whisper.cpp, Whisper, Groq Whisper, Azure ASR, etc.
85-
- 🔊 Text-to-Speech (TTS): sherpa-onnx, pyttsx3, MeloTTS, Coqui-TTS, GPTSoVITS, Bark, CosyVoice, Edge TTS, Fish Audio, Azure TTS, etc.
86-
87-
- 🔧 **Highly customizable**:
88-
- ⚙️ **Simple module configuration**: Switch various functional modules through simple configuration file modifications, without delving into the code
89-
- 🎨 **Character customization**: Import custom Live2D models to give your AI companion a unique appearance. Shape your AI companion's persona by modifying the Prompt. Perform voice cloning to give your AI companion the voice you desire
90-
- 🧩 **Flexible Agent implementation**: Inherit and implement the Agent interface to integrate any Agent architecture, such as HumeAI EVI, OpenAI Her, Mem0, etc.
91-
- 🔌 **Good extensibility**: Modular design allows you to easily add your own LLM, ASR, TTS, and other module implementations, extending new features at any time
92-
93-
94-
## 👥 User Reviews
95-
> Thanks to the developer for open-sourcing and sharing the girlfriend for everyone to use
96-
>
97-
> This girlfriend has been used over 100,000 times
98-
99-
100-
## 🚀 Quick Start
101-
102-
Please refer to the [Quick Start](https://open-llm-vtuber.github.io/docs/quick-start) section in our documentation for installation.
103-
104-
105-
106-
## ☝ Update
107-
> :warning: `v1.0.0` has breaking changes and requires re-deployment. You *may* still update via the method below, but the `conf.yaml` file is incompatible and most of the dependencies needs to be reinstalled with `uv`. For those who came from versions before `v1.0.0`, I recommend deploy this project again with the [latest deployment guide](https://open-llm-vtuber.github.io/docs/quick-start).
108-
109-
Please use `uv run update.py` to update if you installed any versions later than `v1.0.0`.
110-
111-
## 😢 Uninstall
112-
Most files, including Python dependencies and models, are stored in the project folder.
113-
114-
However, models downloaded via ModelScope or Hugging Face may also be in `MODELSCOPE_CACHE` or `HF_HOME`. While we aim to keep them in the project's `models` directory, it's good to double-check.
115-
116-
Review the installation guide for any extra tools you no longer need, such as `uv`, `ffmpeg`, or `deeplx`.
117-
118-
## 🤗 Want to contribute?
119-
Checkout the [development guide](https://docs.llmvtuber.com/docs/development-guide/overview).
120-
121-
122-
# 🎉🎉🎉 Related Projects
123-
124-
[ylxmf2005/LLM-Live2D-Desktop-Assitant](https://github.com/ylxmf2005/LLM-Live2D-Desktop-Assitant)
125-
- Your Live2D desktop assistant powered by LLM! Available for both Windows and MacOS, it senses your screen, retrieves clipboard content, and responds to voice commands with a unique voice. Featuring voice wake-up, singing capabilities, and full computer control for seamless interaction with your favorite character.
126-
127-
128-
129-
130-
131-
132-
## 📜 Third-Party Licenses
133-
134-
### Live2D Sample Models Notice
135-
136-
This project includes Live2D sample models provided by Live2D Inc. These assets are licensed separately under the Live2D Free Material License Agreement and the Terms of Use for Live2D Cubism Sample Data. They are not covered by the MIT license of this project.
137-
138-
This content uses sample data owned and copyrighted by Live2D Inc. The sample data are utilized in accordance with the terms and conditions set by Live2D Inc. (See [Live2D Free Material License Agreement](https://www.live2d.jp/en/terms/live2d-free-material-license-agreement/) and [Terms of Use](https://www.live2d.com/eula/live2d-sample-model-terms_en.html)).
139-
140-
Note: For commercial use, especially by medium or large-scale enterprises, the use of these Live2D sample models may be subject to additional licensing requirements. If you plan to use this project commercially, please ensure that you have the appropriate permissions from Live2D Inc., or use versions of the project without these models.
141-
142-
143-
## Contributors
144-
Thanks our contributors and maintainers for making this project possible.
145-
146-
<a href="https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/graphs/contributors">
147-
<img src="https://contrib.rocks/image?repo=Open-LLM-VTuber/Open-LLM-VTuber" />
148-
</a>
149-
150-
151-
## Star History
152-
153-
[![Star History Chart](https://api.star-history.com/svg?repos=Open-LLM-VTuber/open-llm-vtuber&type=Date)](https://star-history.com/#Open-LLM-VTuber/open-llm-vtuber&Date)
110+
---
154111

112+
## Development
155113

114+
```bash
115+
# Lint
116+
ruff check .
156117

118+
# Format
119+
ruff format .
157120

121+
# Run simulation test
122+
uv run tests/simulation.py
123+
```
158124

125+
See `CLAUDE.md` for full architecture notes and `PLAN.md` for the transformation roadmap.

docs/PHASE_3.md

Lines changed: 43 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,49 @@
77
- `src/open_llm_vtuber/websocket_handler.py` — start/stop vision loop on connect/disconnect
88

99
## Implementation
10-
1. `ScreenWatcher` daemon thread: every 15–25s takes screenshot via `pyautogui`
10+
1. `ScreenWatcher` daemon thread: every 15–25s takes screenshot via PowerShell WSL2 interop
1111
2. pHash comparison with `cv2.img_hash.PHash_create()` — Hamming < 5 → AFK, skip
12-
3. On screen change: calls `knowledge_base.get_game_context()` → fires `ai-speak-signal` through WebSocket
13-
4. Screenshots saved to `sessions/{client_uid}/screenshots/{timestamp}.png`
12+
3. 60s cooldown between proactive triggers (prevents spam)
13+
4. On screen change: calls `knowledge_base.get_game_context()` → fires `ai-speak-signal` through WebSocket
14+
5. Screenshots saved to `sessions/{client_uid}/screenshots/{timestamp}.png`
1415

15-
## Tests
16-
- [ ] ScreenWatcher starts and stops cleanly
17-
- [ ] pHash AFK detection works (static screen skipped)
18-
- [ ] Screen change triggers ai-speak-signal through websocket
19-
- [ ] Screenshots saved to correct session dir
16+
## Conversation Quality Fixes (same commit)
17+
- `ai-speak-signal` blocked if a conversation task is already running (no interruption)
18+
- Response length capped to 1–2 sentences via system prompt rule
2019

21-
## End: TBD
20+
## Live Test Results (2026-02-10)
21+
22+
**Step 1: Display Elden Ring screenshot fullscreen via PowerShell**
23+
- Opened `tests/data/elden1.jpg` fullscreen on Windows display → PASS
24+
25+
**Step 2: Capture screen via PowerShell (_take_screenshot)**
26+
- Screenshot file: `sessions/phase3-test/screenshots/1770726007.png`
27+
- File size: 3,169,232 bytes (full 1920×1080 capture)
28+
- Result: **PASS**
29+
30+
**Step 3: pHash computation**
31+
- Hash computed: `e9b5ead6cbfef9dd...`
32+
- Result: **PASS**
33+
34+
**Step 4: KnowledgeBase.get_game_context() on captured screenshot**
35+
- Pipeline ran end-to-end, context returned
36+
- Result: **PASS**
37+
38+
## Vision Fix (post Phase-3 patch)
39+
40+
Previous implementation passed the screenshot as a text path string — Claude could not
41+
see the image. Fixed by prefixing the path with `@` in the prompt:
42+
43+
```
44+
@/path/to/screenshot.png
45+
46+
Look at this gameplay screenshot. Identify the game...
47+
```
48+
49+
The `claude -p` CLI reads and analyzes the image directly. The 120s timeout was also
50+
removed — research completes however long it takes, then saves to Qdrant + Supabase.
51+
52+
After the first identification, `KnowledgeBase.current_game_name` is set and the vision
53+
loop reuses it for subsequent Qdrant lookups (avoids redundant Claude calls for same game).
54+
55+
## End: 2026-02-10T04:23:00Z

0 commit comments

Comments
 (0)