Skip to content

Commit ee0cbda

Browse files
committed
Lemonade Server integrates with new app:MineContext
1 parent 44b3a68 commit ee0cbda

File tree

1 file changed

+187
-0
lines changed

1 file changed

+187
-0
lines changed

docs/server/apps/minecontext.md

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# MineContext
2+
3+
[MineContext](https://github.com/volcengine/MineContext) is an open-source, proactive context-aware AI assistant. This document explains how to configure MineContext to use local AI models powered by AMD NPU (via Lemonade Server), enabling AI chat, screen monitoring, and work summarization capabilities—all data is processed locally, ensuring complete privacy.
4+
5+
There are a few things to note on this integration:
6+
7+
* **Privacy-First Architecture**: MineContext stores all data locally on your device. Combined with Lemonade Server's local inference, your data never leaves your machine.
8+
9+
* **NPU + GPU Acceleration**: This integration supports both AMD Ryzen AI NPU for VLM inference (via FLM server with `Qwen3-VL-4B-Instruct-FLM`) and GPU acceleration (via llama-server with `Qwen3-VL-4B-Instruct-GGUF`), providing flexible deployment options based on your hardware.
10+
11+
* **Multi-Model Support**: MineContext requires a Vision-Language Model for screen understanding and an Embedding model for context retrieval. Lemonade Server supports both NPU and GPU backends, allowing flexible deployment based on your hardware.
12+
13+
* **Universal Compatibility**: While this guide focuses on NPU-optimized configuration, Lemonade Server also supports GPU-only deployment using Vulkan or ROCm backends.
14+
15+
* **Hardware Requirements and Current Status**: This integration is still in its early stages. We encourage you to test it and share any issues you encounter. For the best NPU experience, the minimum configuration is Ryzen AI 300 Series with 32GB RAM. We recommend using a Strix Halo PC with 64GB or more RAM.
16+
17+
18+
## Prerequisites
19+
20+
- **Lemonade Server**: Install Lemonade Server using the [Getting Started Guide](https://lemonade-server.ai/docs/server/).
21+
- **Server running**: Ensure Lemonade Server is running on `http://localhost:8000`
22+
- **Models installed**: MineContext requires two types of models:
23+
- **Vision-Language Model (VLM)**: For screen capture understanding and AI chat. You have two options:
24+
- **NPU Option**: `Qwen3-VL-4B-Instruct-FLM` - Uses FLM server backend, optimized for AMD NPU (recommended for Ryzen AI systems)
25+
- **GPU Option**: `Qwen3-VL-4B-Instruct-GGUF` - Uses llama-server backend, runs on GPU via Vulkan/ROCm
26+
- **Embedding Model**: For context retrieval and similarity search. We recommend `Qwen3-Embedding-0.6B-GGUF`, which uses llama-server backend, running on GPU.
27+
- **MineContext**: Download the appropriate version from [MineContext Releases](https://github.com/volcengine/MineContext/releases) (v0.1.7 recommended).
28+
29+
30+
## Installation
31+
32+
### Launch Lemonade Server with Optimized Settings
33+
34+
For optimal MineContext performance, launch Lemonade Server with extended context size.
35+
36+
**NPU + GPU Configuration (Recommended for Ryzen AI systems):**
37+
38+
```bash
39+
lemonade-server serve --ctx-size 32768 --flm-args "-s 32 -q 32"
40+
```
41+
42+
**GPU-Only Configuration:**
43+
44+
If you prefer to run all models purely on GPU (using GGUF models via llama-server backend), simply omit the `--flm-args` parameter:
45+
46+
```bash
47+
lemonade-server serve --ctx-size 32768
48+
```
49+
50+
**Parameter explanation:**
51+
52+
- `--ctx-size`: Context window size in tokens. You can adjust this based on your available memory. Larger context windows can process more screen history but require more memory.
53+
54+
- `--flm-args`: Custom arguments to pass to FLM (FastFlowLM) server for NPU backend. This parameter is **only needed when using NPU-optimized FLM models** (e.g., `Qwen3-VL-4B-Instruct-FLM`). If you're using GPU-only GGUF models, you don't need to specify this parameter.
55+
- `-s 32`: Sets the number of socket connections for NPU inference. Higher values allow more concurrent connections to the NPU backend, improving throughput for parallel requests.
56+
- `-q 32`: Sets the queue length for NPU request handling. Higher values allow more requests to be queued, reducing request rejection under high load.
57+
58+
> **Note**: The `--flm-args` parameter must not conflict with arguments already managed by Lemonade (e.g., `--host`, `--port`, `--ctx-len`). These values can also be overridden per-model via the `/api/v1/load` endpoint.
59+
60+
### Installing MineContext
61+
62+
1. Refer to [MineContext](https://github.com/volcengine/MineContext/blob/main/README.md) for local installation instructions:
63+
- **Windows**: `MineContext-x.x.x-setup.exe`
64+
65+
2. Run the installer to complete installation. On first launch, the application will set up its backend environment (approximately 2 minutes).
66+
67+
68+
## Configuring MineContext
69+
70+
When first launching MineContext, you'll need to configure it to connect to Lemonade Server.
71+
72+
1. Open MineContext and navigate to **Settings**, **Model platform**: Select `Custom`.
73+
74+
2. Configure the **VLM Model** settings:
75+
- **URL**: `http://localhost:8000/api/v1`
76+
- **Model**: `Qwen3-VL-4B-Instruct-FLM`
77+
- **API Key**: Enter any character (e.g., `-`), as Lemonade Server doesn't require authentication in local mode
78+
79+
3. Configure the **Embedding Model** settings:
80+
- **URL**: `http://localhost:8000/api/v1`
81+
- **Model**: `Qwen3-Embedding-0.6B-GGUF`
82+
- **API Key**: Enter any character (e.g., `-`)
83+
84+
4. Click **Save** to apply the configuration.
85+
86+
<div align="center">
87+
<br><em>MineContext Model Configuration Interface</em></br>
88+
<img src="https://github.com/user-attachments/assets/08dadbf4-f235-4a5b-949a-dfe6c3f3e708" alt="MineContext Model Configuration Interface" width="700"/>
89+
</div>
90+
91+
92+
## Using MineContext
93+
94+
### AI Chat
95+
96+
Chat with AI using your captured screen context:
97+
98+
1. Navigate to **Chat with AI**.
99+
100+
2. Enter your question in the chat box. MineContext will provide you with the corresponding answer:
101+
102+
<div align="center">
103+
<br><em>AI Chat Interface</em></br>
104+
<img src="https://github.com/user-attachments/assets/18ba5d37-a304-478f-8910-d4f7f01bd76f" alt="AI Chat Interface" width="700"/>
105+
</div>
106+
107+
### Enable Screen Monitor
108+
109+
Screen Monitor is MineContext's core feature that captures and analyzes your screen content.
110+
111+
1. Navigate to the **Screen Monitor** section.
112+
113+
2. On first use, grant screen recording permissions when prompted.
114+
115+
3. After granting permissions, restart the application for changes to take effect.
116+
117+
4. After restart, configure your screen capture area in **Settings**, then click **Start Recording**.
118+
119+
5. Once recording starts, MineContext will analyze your screen content in the background using the local VLM model. This context is used for AI chat and work summaries.
120+
121+
<div align="center">
122+
<br><em>Screen Monitor Feature</em></br>
123+
<img src="https://github.com/user-attachments/assets/f3386a5e-e2a0-42a5-82c9-f258cde72a8c" alt="Screen Monitor Feature" width="700"/>
124+
</div>
125+
126+
### Work Summary
127+
128+
MineContext automatically generates insights based on your screen activity:
129+
130+
1. From the main page, view auto-generated content:
131+
- **Daily Summary**: Overview of your daily activities
132+
- **Todo Items**: Automatically extracted action items
133+
- **Activity Report**: Detailed activity records
134+
135+
<div align="center">
136+
<br><em>Work Summary and Todo Items Interface</em></br>
137+
<img src="https://github.com/user-attachments/assets/f6f3c764-9af6-43d7-a4c1-562f2f6f4182" alt="Work Summary and Todo Items Interface" width="700"/>
138+
</div>
139+
140+
2. These summaries update automatically based on your screen monitoring data—no manual input required.
141+
142+
143+
### Backend Debugging
144+
145+
MineContext provides a web-based debugging interface at `http://localhost:1733`:
146+
147+
1. **Token Usage**: Monitor model consumption and API calls
148+
2. **Task Intervals**: Configure screenshot and summary generation frequency
149+
3. **System Prompts**: Customize AI behavior with custom prompts
150+
151+
<div align="center">
152+
<br><em>Backend Debugging Interface</em></br>
153+
<img src="https://github.com/user-attachments/assets/36f847bc-8e3a-4c33-9578-4dd189349e73" alt="Backend Debugging Interface" width="700"/>
154+
</div>
155+
156+
157+
## Common Issues
158+
159+
* **Connection refused error**: Ensure Lemonade Server is running. Check with `lemonade-server status` or verify the server is accessible at `http://localhost:8000`.
160+
161+
* **Model loading slow on first use**: Initial model loading requires loading weights into memory (VLM to NPU, Embedding to GPU). Subsequent uses will be faster as models remain cached.
162+
163+
* **Context window exceeded**: If conversations are being truncated, increase the context size:
164+
```bash
165+
lemonade-server serve --ctx-size 65536 --flm-args "-s 32 -q 32"
166+
```
167+
168+
* **Out of memory errors**: Running both VLM and Embedding models requires sufficient RAM. Try reducing the `--ctx-size` value:
169+
```bash
170+
lemonade-server serve --ctx-size 16384 --flm-args "-s 32 -q 32"
171+
```
172+
173+
* **Screen recording not working**: Ensure you've granted screen recording permissions and restarted the application after granting them.
174+
175+
176+
## Known Issues
177+
178+
* **Embedding validation Error** (v0.1.8): `Embedding validation failed: 'OpenAI' object has no attribute 'multimodal_embeddings'`
179+
180+
* **Validation Error**: `Data validation failed because a list was passed instead of the required string`
181+
182+
183+
## Resources
184+
185+
* [MineContext GitHub](https://github.com/volcengine/MineContext)
186+
* [Lemonade Server](https://lemonade-server.ai)
187+
* [FastFlowLM](https://fastflowlm.com/)

0 commit comments

Comments
 (0)