Skip to content

Conversation

@BrewTestBot
Copy link
Contributor

Created by brew bump


Created with brew bump-formula-pr.

Details

release notes
# 🎉 LocalAI 3.10.0 Release! 🚀  




LocalAI 3.10.0 is big on agent capabilities, multi-modal support, and cross-platform reliability.

We've added native Anthropic API support, launched a new Video Generation UI, introduced Open Responses API compatibility, and enhanced performance with a unified GPU backend system.

For a full tour, see below!


📌 TL;DR

Feature Summary
Anthropic API Support Fully compatible /v1/messages endpoint for seamless drop-in replacement of Claude.
Open Responses API Native support for stateful agents with tool calling, streaming, background mode, and multi-turn conversations, passing all official acceptance tests.
Video & Image Generation Suite New video gen UI + LTX-2 support for text-to-video and image-to-video.
Unified GPU Backends GPU libraries (CUDA, ROCm, Vulkan) packaged inside backend containers — works out of the box on Nvidia, AMD, and ARM64 (Experimental).
Tool Streaming & XML Parsing Full support for streaming tool calls and XML-formatted tool outputs.
System-Aware Backend Gallery Only see backends your system can run (e.g., hide MLX on Linux).
Crash Fixes Prevents crashes on AVX-only CPUs (Intel Sandy/Ivy Bridge) and fixes VRAM reporting on AMD GPUs.
Request Tracing Debug agents & fine-tuning with memory-based request/response logging.
Moonshine Backend Ultra-fast transcription engine for low-end devices.
Pocket-TTS Lightweight, high-fidelity text-to-speech with voice cloning.
Vulkan arm64 builds We now build backends and images for vulkan on arm64 as well

🚀 New Features & Major Enhancements

🤖 Open Responses API: Build Smarter, Autonomous Agents

LocalAI now supports the OpenAI Responses API, enabling powerful agentic workflows locally.

  • Stateful conversations via response_id — resume and manage long-running agent sessions.
  • Background mode: Run agents asynchronously and fetch results later.
  • Streaming support for tools, images, and audio.
  • Built-in tools: Web search, file search, and computer use (via MCP integrations).
  • Multi-turn interaction with dynamic context and tool use.

✅ Ideal for developers building agents that can browse, analyze files, or interact with systems — all on your local machine.

🔧 How to Use:

  • Set response_id in your request to maintain session state across calls.
  • Use background: true to run agents asynchronously.
  • Retrieve results via GET /api/v1/responses/{response_id}.
  • Enable streaming with stream: true to receive partial responses and tool calls in real time.

📌 Tip: Use response_id to build agent orchestration systems that persist context and avoid redundant computation.

Our support passes all the official acceptance tests:

Open Responses API support

🧠 Anthropic Messages API: Clone Claude Locally

LocalAI now fully supports the Anthropic messages API.

  • Use https://api.localai.host/v1/messages as a drop-in replacement for Claude.
  • Full tool/function calling support, just like OpenAI.
  • Streaming and non-streaming responses.
  • Compatible with anthropic-sdk-go, LangChain, and other tooling.

🔥 Perfect for teams migrating from Anthropic to local inference with full feature parity.


🎥 Video Generation: From Text to Video in the Web UI

  • New dedicated video generation page with intuitive controls.
  • LTX-2 is supported
  • Supports text-to-video and image-to-video workflows.
  • Built on top of diffusers with full compatibility.

📌 How to Use:

  • Go to /video in the web UI.
  • Enter a prompt (e.g., "A cat walking on a moonlit rooftop").
  • Optionally upload an image for image-to-video generation.
  • Adjust parameters like fps, num_frames, and guidance_scale.

⚙️ Unified GPU Backends: Acceleration Works Out of the Box

A major architectural upgrade: GPU libraries (CUDA, ROCm, Vulkan) are now packaged inside backend containers.

  • Single image: Now you don't need anymore to pull a specific image for your GPU. Any image works regardless if you have a GPU or not.
  • No more manual GPU driver setup — just run the image and get acceleration.
  • Works on Nvidia (CUDA), AMD (ROCm), and ARM64 (Vulkan).
  • Vulkan arm64 builds enabled
  • Reduced image complexity, faster builds, and consistent performance.

🚀 This means latest/master images now support GPU acceleration on all platforms — no extra config!

Note: this is experimental, please help us by filing an issue if something doesn't work!


🧩 Tool Streaming & Advanced Parsing

Enhance your agent workflows with richer tool interaction.

  • Streaming tool calls: Receive partial tool arguments in real time (e.g., input_json_delta).
  • XML-style tool call parsing: Models that return tools in XML format (<function>...</function>) are now properly parsed alongside text.
  • Works across all backends (llama.cpp, vLLM, diffusers, etc.).

💡 Enables more natural, real-time interaction with agents that use structured tool outputs.


🌐 System-Aware Backend Gallery: Only Compatible Backends Show

The backend gallery now shows only backends your system can run.

  • Auto-detects system capabilities (CPU, GPU, MLX, etc.).
  • Hides unsupported backends (e.g., MLX on Linux, CUDA on AMD).
  • Shows detected capabilities in the hero section.

🎤 New TTS Backends: Pocket-TTS

Add expressive voice generation to your apps with Pocket-TTS.

  • Real-time text-to-speech with voice cloning support (requires HF login).
  • Lightweight, fast, and open-source.
  • Available in the model gallery.

🗣️ Perfect for voice agents, narrators, or interactive assistants.
Note: Voice cloning requires HF authentication and a registered voice model.


🔍 Request Tracing: Debug Your Agents

Trace requests and responses in memory — great for fine-tuning and agent debugging.

  • Enable via runtime setting or API.
  • Log stored in memory, dropped after max size.
  • Fetch logs via GET /api/v1/trace.
  • Export to JSON for analysis.

🪄 New 'Reasoning' Field: Extract Thinking Steps

LocalAI now automatically detects and extracts thinking tags from model output.

  • Supports both SSE and non-SSE modes.
  • Displays reasoning steps in the chat UI (under "Thinking" tab).
  • Fixes issue where thinking content appeared as part of final answer.

🚀 Moonshine Backend: Faster Transcription for Low-End Devices

Add Moonshine, an ONNX-based transcription engine, for fast, lightweight speech-to-text.

  • Optimized for low-end devices (Raspberry Pi, older laptops).
  • One of the fastest transcription engines available.
  • Supports live transcription.

🛠️ Fixes & Stability Improvements

🔧 Prevent BMI2 Crashes on AVX-Only CPUs

Fixed crashes on older Intel CPUs (Ivy Bridge, Sandy Bridge) that lack BMI2 instructions.

  • Now safely falls back to llama-cpp-fallback (SSE2 only).
  • No more EOF errors during model warmup.

✅ Ensures LocalAI runs smoothly on older hardware.


📊 Fix Swapped VRAM Usage on AMD GPUs

Correctly parses rocm-smi output: used and total VRAM are now displayed correctly.

  • Fixes misreported memory usage on dual-Radeon setups.
  • Handles HIP_VISIBLE_DEVICES properly (e.g., when using only discrete GPU).

🚀 The Complete Local Stack for Privacy-First AI

LocalAI Logo

LocalAI

The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI Logo

LocalAGI

Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall Logo

LocalRecall

RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Works alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall


❤️ Thank You

LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

  • Star the repo
  • 💬 Contribute code, docs, or feedback
  • 📣 Share with others

Your support keeps this stack alive.


✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Bug fixes :bug:

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

Full Changelog: mudler/LocalAI@v3.9.0...v3.10.0

View the full release notes at https://github.com/mudler/LocalAI/releases/tag/v3.10.0.


@github-actions github-actions bot added go Go use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` labels Jan 18, 2026
@github-actions
Copy link
Contributor

🤖 An automated task has requested bottles to be published to this PR.

Caution

Please do not push to this PR branch before the bottle commits have been pushed, as this results in a state that is difficult to recover from. If you need to resolve a merge conflict, please use a merge commit. Do not force-push to this PR branch.

@github-actions github-actions bot added the CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. label Jan 18, 2026
@BrewTestBot BrewTestBot enabled auto-merge January 18, 2026 23:33
@BrewTestBot BrewTestBot added this pull request to the merge queue Jan 18, 2026
Merged via the queue into main with commit 44abfe9 Jan 18, 2026
22 checks passed
@BrewTestBot BrewTestBot deleted the bump-localai-3.10.0 branch January 18, 2026 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bump-formula-pr PR was created using `brew bump-formula-pr` CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. go Go use is a significant feature of the PR or issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants