Skip to content

Interactive Model Selection, Custom Prompts, and Expanded Model Support #247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

malah-code
Copy link

@malah-code malah-code commented Jul 7, 2025

Version v2.0.15 (Latest) Release Summary

New Features:

  • Centralized Model Management: All model configurations are now managed in a single file (operate/models/model_configs.py), making it easier to add, remove, and manage models.
  • Expanded Ollama Model Support: Added support for qwen2.5vl:3b and gemma3:4b.
  • Enhanced Debugging: Added a -d flag (alias for --verbose) that provides detailed debugging information, including the full prompt sent to the AI and the raw response received.

Improvements:

  • Improved System Prompt: The system prompt has been enhanced with a more structured format, explicit JSON schema definitions, and clear examples to improve model accuracy and reliability.

Bug Fixes:

  • Fixed an issue where the model selection screen was not correctly displaying all available models.
  • Resolved an IndentationError in the model configuration file.

Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.

Key Features

  • Compatibility: Designed for various multimodal models.
  • Expanded Model Support: Now integrated with the latest OpenAI o3, o4-mini, GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemma 3n models (including e2b and e4b variants), and Gemma 3:12b alongside existing support for GPT-4o, Claude 3, Qwen-VL, and LLaVa.
  • Enhanced Ollama Integration: Improved handling for Ollama models, including default host configuration and more informative error messages.
  • Future Plans: Support for additional models.

telmalah and others added 30 commits July 3, 2025 15:43
@malah-code malah-code closed this Jul 7, 2025
@malah-code malah-code reopened this Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants