-
Notifications
You must be signed in to change notification settings - Fork 82
Description
Describe the bug
Bug Report: Images are not passed to Aider in "Code" mode
Summary:
In Code mode (Aider integration), images added to the task context are not transmitted to the LLM. While multimodal content works correctly in Agent mode, the visual context is lost during the data transfer from the TypeScript main process to the Python connector when using Aider.
Steps to Reproduce:
- Open any project in AiderDesk.
- Switch to Code mode.
- Add an image file (e.g.,
screenshot.png) to the context via the sidebar or by pasting it from the clipboard. - Enter a prompt related to the image (e.g., "Implement the layout based on this image").
- Observe the model's response.
Expected Result:
Aider receives the image as part of the multimodal context (via abs_image_fnames or a multimodal message structure), and the model responds with awareness of the visual data.
Actual Result:
The model either states it cannot see any images or fails to process the file because it attempts to read it as a text/binary blob instead of a visual input.
Technical Analysis:
1. Content Filtering in TypeScript:
The toConnectorMessages method in src/main/task/context-manager.ts prepares history for the Python connector. It relies on the extractTextContent utility, which explicitly strips out everything that isn't plain text:
// src/common/utils.ts
export const extractTextContent = (content: unknown): string => {
// ...
if (Array.isArray(content)) {
return content.filter(isTextContent) // <--- This removes image objects (image_url/base64)
.map((c) => (typeof c === 'string' ? c : c.text))
.join('\n\n');
}
}2. Missing File Classification in the Connector:
In resources/connector/connector.py, the clone_coder function only categorizes files into two sets: abs_fnames (editable) and abs_read_only_fnames (read-only).
Aider specifically requires images to be placed in a separate abs_image_fnames set to trigger multimodal processing. Currently, this set is not even initialized or populated in the connector.
3. Simplified Message Protocol:
The connector currently assumes that the content of a message is always a simple string:
# resources/connector/connector.py
coder.done_messages = [dict(role=msg['role'], content=msg['content']) for msg in messages]If a multimodal array (text + image) is passed, it might cause schema validation errors in LiteLLM or be ignored by Aider's internal logic.
Proposed Fix:
- TypeScript: Update
toConnectorMessagesto preserve image parts in the content array when sending data to the connector. - Python: Modify
clone_coderinconnector.pyto useis_image_file(file_path)and populatecoder.abs_image_fnames. - Python: Ensure the connector can handle multimodal content objects in the
messageslist instead of assuming strings.
Steps to reproduce
Open AiderDesk and load any project.
Ensure you are in "Code" mode (using the mode selector below the prompt field).
Paste an image from your clipboard directly into the message editor field (Ctrl+V or Cmd+V).
Observe the UI:
A new image file (e.g., image-001.png) appears in the "Context Files" list in the right sidebar.
The file is physically created in the .aider-desk/tmp/images/ directory.
Ask the model about the file: Type a prompt like "Is there a file named image-001.png in our chat context? Can you see it?".
Observe the response: Read the model's reply.
Expected behavior
No response
Operating System
Windows
Version
v0.52.0
Screenshots
No response
Additional context
No response