Releases: letta-ai/letta
v0.16.7
Letta Server 0.16.7 Release Notes
173 commits since 0.16.6 | Released March 31, 2026
Highlights
Self-hosted users: this is a big upgrade. The default global context window is raised from 32k to 128k, the context window reset bug (LET-7991) is fixed, and compaction has been overhauled. If you've been running curl commands to patch your config after every ADE load, most of that pain should be gone.
Breaking Changes
- Block limits are no longer enforced -- block limit validation has been deprecated and removed from the git memory sync path (#9977, #9983). Blocks can now grow freely. If you were relying on limits to cap per-turn cost, you'll need to manage block size via other means.
Context Window & Compaction (21 fixes)
The biggest category of fixes. Self-hosted users were hit hardest by these.
- Global context window default raised from 32k to 128k (#9993) -- self-hosted servers no longer default to 32k for unknown models
- Context window preserved on conversation model override (LET-7991, #9986) -- the bug where non-default conversations fell back to 32k is fixed
- Compaction overflow fixes (#9897) -- addresses the double-compaction and runaway compaction loops
- Compaction model resets on agent model change (#10031) -- switching your agent's model no longer leaves the old summarizer model behind
- Summarizer prompt improved (#10314) -- now remembers plan files, GitHub PRs, and other structured content during summarization
- BYOK summarization fixed (#10152) -- summarizer provider fallback no longer fires for BYOK requests
- Better error surfacing -- context window exceeded errors now have descriptive messages (#10135, #10171), and system prompt size warnings during compaction (#10058)
Gemini (2 fixes)
- thought_signature preserved on function calls without reasoning (LET-8166, #10237) -- the bug blocking all Gemini 2.5+/3.x multi-turn tool calling is fixed
- Streaming interface crash fixed (#10306) --
self.modelnow initialized inSimpleGeminiStreamingInterfaceconstructor (LET-8129)
Memory & memfs (10 fixes, 4 features)
- available_skills block no longer duplicates in system prompt (#10006, #10011, #10021) -- three separate fixes for the skills block multiplying and inflating context (LET-8013)
- Git memory sync deferred until stream close (#9951) -- reduces mid-stream sync failures
- System prompt recompiles on agent creation with git memory (#9950) -- new git-enabled agents no longer start with empty compiled context
- Projection-style git memory rendering (#10211) -- new rendering approach for memfs content in system prompts
- Manual block edits via API trigger recompile (#9775) -- no more stale context after API block updates
- Conversation recompile endpoint (#9848) --
POST /v1/conversations/{id}/recompileis now available
Conversations (7 features)
- Conversation forking (#10234, #10263) -- fork conversations with shared message history, including the default conversation
- Sort conversations by last_message_at (#10190)
- Idempotent conversation streaming (#10147) -- OTID-based retry safety
- Request-scoped system overrides (#10227) -- per-request system prompt modifications
Streaming & Reliability (13 fixes)
- OTID retry hardening (#10229, #10209) -- stream resume with backoff, race condition fixes
- Conversation lock released earlier (#10203) -- reduces contention on concurrent requests
- Better error messages (#10207) -- known LLM errors now surface descriptive messages instead of generic failures
- BYOK error tagging (#10204, #10311) -- errors now include
is_byokflag for debugging - stream_incomplete diagnosis (#10033) -- BaseException catching to identify root causes
Model Support (14 features, 20 fixes)
- GPT-5.4 -- full support including mini, nano, and fast variants (#9798, #10043)
- GLM-5 -- GLM-5, GLM-5.1, GLM-5 Turbo, GLM-4.7 (#10317, #10285, #9994)
- MiniMax M2.7 (#10093)
- Baseten -- added as provider with full frontend integration, serverless auto mode, reasoning support (#10250, #9846, #9998)
- Fireworks (#9780) and zAI coding provider (#10064)
- Opus 4.6 / Sonnet 4.6 -- adaptive thinking tokens no longer incorrectly capped (#9795)
- OpenAI proxy cleanup -- extra fields removed (#9949), parallel tool calling supported (#9879)
Security
- Local filesystem access blocked via ImageContent bypass (#3256, #10329) --
file:///URLs in images are now rejected - Internal MCP server targets blocked (#10009)
- SECURITY.md added (#3228)
Infrastructure
- Readiness enforcement scaffold (M1-M3 metrics pipeline) -- request pressure, DB pool, SSE lifecycle, event loop lag monitoring
- Multi-agent tools moved to less privileged execution environment (#9779)
- Subagent agents auto-hidden on create (#10096)
- WebSocket transport for OpenAI Responses API (#9841)
For self-hosted users upgrading from 0.16.6: This release addresses the majority of issues reported in the community over the past month. The context window default change alone (#9993) eliminates the most common source of "everything breaks when I open ADE" complaints.
Full Changelog: 0.16.6...0.16.7
v0.16.6
Highlights
- Expanded Conversations API support for default conversation / agent-direct mode.
- New conversations now initialize with a compiled system message at creation time.
- Fixed
model_settings.max_output_tokensdefault behavior so it does not silently override existingmax_tokensunless explicitly set.
Conversations API updates
- Added support for
conversation_id="default"+agent_idacross conversation endpoints (send/list/cancel/compact/stream retrieve). - Kept backwards compatibility for
conversation_id=agent-*(deprecated path). - Added lock-key handling in agent-direct flows to avoid concurrent execution conflicts.
Conversation/system-message behavior
- Conversation creation now compiles and persists a system message immediately.
- This captures current memory state at conversation start and removes first-message timing edge cases.
Model/config updates
- Added model support for:
gpt-5.3-codexgpt-5.3-chat-latest
- Updated defaults:
- context window default: 32k → 128k
CORE_MEMORY_BLOCK_CHAR_LIMIT: 20k → 100k
- Anthropic model settings now allow
effort="max"where supported. - Gemini request timeout default increased to 600s.
Memory / memfs updates
- Git-backed memory frontmatter no longer emits
limit(legacylimitkeys are removed on merge). - Skills sync now maps only
skills/{name}/SKILL.mdtoskills/{name}block labels. - Other markdown under
skills/is intentionally ignored for block sync. - Memory filesystem rendering now includes descriptions for non-
system/files and condenses skill display.
Reliability and compatibility fixes
- Added explicit
LLMEmptyResponseErrorhandling for empty Anthropic streaming responses. - Improved Fireworks compatibility by stripping unsupported reasoning fields.
- Improved Z.ai compatibility by mapping
max_completion_tokenstomax_tokens.
Full Changelog: 0.16.5...0.16.6
v0.16.5
v0.16.4
What's Changed
- fix: update gh templates by @cpacker in #3155
- chore: release 0.16.3 by @sarahwooders in #3158
- chore: bump v0.16.4 by @carenthomas in #3168
Full Changelog: 0.16.2...0.16.4
v0.16.2
What's Changed
- docs: update README.md by @cpacker in #3110
- Update contributing.md with corrected local setup steps by @neversettle17-101 in #3123
- chore: bump version 0.16.2 by @carenthomas in #3140
New Contributors
- @neversettle17-101 made their first contribution in #3123
Full Changelog: 0.16.1...0.16.2
v0.16.1
What's Changed
- Correct provider name for openai-proxy in LLMConfig by @SootyOwl in #3097
- chore: bump v0.16.1 by @carenthomas in #3107
Full Changelog: 0.16.0...0.16.1
v0.16.0
What's Changed
- Updated readme with actual argument by @Godofnothing in #3083
- fix: Implement architecture-specific OTEL installation logic by @SootyOwl in #3061
- chore: bump v0.16.0 by @carenthomas in #3095
New Contributors
- @Godofnothing made their first contribution in #3083
- @SootyOwl made their first contribution in #3061
Full Changelog: 0.15.1...0.16.0
v0.15.1
v0.15.0
What's Changed
- Add context windows for grok-4 models by @runtimeBob in #3043
- chore: bump version 0.15.0 by @carenthomas in #3077
New Contributors
- @runtimeBob made their first contribution in #3043
Full Changelog: 0.14.0...0.15.0