Skip to content

Releases: letta-ai/letta

v0.16.7

31 Mar 19:28
f333247

Choose a tag to compare

Letta Server 0.16.7 Release Notes

173 commits since 0.16.6 | Released March 31, 2026

Highlights

Self-hosted users: this is a big upgrade. The default global context window is raised from 32k to 128k, the context window reset bug (LET-7991) is fixed, and compaction has been overhauled. If you've been running curl commands to patch your config after every ADE load, most of that pain should be gone.

Breaking Changes

  • Block limits are no longer enforced -- block limit validation has been deprecated and removed from the git memory sync path (#9977, #9983). Blocks can now grow freely. If you were relying on limits to cap per-turn cost, you'll need to manage block size via other means.

Context Window & Compaction (21 fixes)

The biggest category of fixes. Self-hosted users were hit hardest by these.

  • Global context window default raised from 32k to 128k (#9993) -- self-hosted servers no longer default to 32k for unknown models
  • Context window preserved on conversation model override (LET-7991, #9986) -- the bug where non-default conversations fell back to 32k is fixed
  • Compaction overflow fixes (#9897) -- addresses the double-compaction and runaway compaction loops
  • Compaction model resets on agent model change (#10031) -- switching your agent's model no longer leaves the old summarizer model behind
  • Summarizer prompt improved (#10314) -- now remembers plan files, GitHub PRs, and other structured content during summarization
  • BYOK summarization fixed (#10152) -- summarizer provider fallback no longer fires for BYOK requests
  • Better error surfacing -- context window exceeded errors now have descriptive messages (#10135, #10171), and system prompt size warnings during compaction (#10058)

Gemini (2 fixes)

  • thought_signature preserved on function calls without reasoning (LET-8166, #10237) -- the bug blocking all Gemini 2.5+/3.x multi-turn tool calling is fixed
  • Streaming interface crash fixed (#10306) -- self.model now initialized in SimpleGeminiStreamingInterface constructor (LET-8129)

Memory & memfs (10 fixes, 4 features)

  • available_skills block no longer duplicates in system prompt (#10006, #10011, #10021) -- three separate fixes for the skills block multiplying and inflating context (LET-8013)
  • Git memory sync deferred until stream close (#9951) -- reduces mid-stream sync failures
  • System prompt recompiles on agent creation with git memory (#9950) -- new git-enabled agents no longer start with empty compiled context
  • Projection-style git memory rendering (#10211) -- new rendering approach for memfs content in system prompts
  • Manual block edits via API trigger recompile (#9775) -- no more stale context after API block updates
  • Conversation recompile endpoint (#9848) -- POST /v1/conversations/{id}/recompile is now available

Conversations (7 features)

  • Conversation forking (#10234, #10263) -- fork conversations with shared message history, including the default conversation
  • Sort conversations by last_message_at (#10190)
  • Idempotent conversation streaming (#10147) -- OTID-based retry safety
  • Request-scoped system overrides (#10227) -- per-request system prompt modifications

Streaming & Reliability (13 fixes)

  • OTID retry hardening (#10229, #10209) -- stream resume with backoff, race condition fixes
  • Conversation lock released earlier (#10203) -- reduces contention on concurrent requests
  • Better error messages (#10207) -- known LLM errors now surface descriptive messages instead of generic failures
  • BYOK error tagging (#10204, #10311) -- errors now include is_byok flag for debugging
  • stream_incomplete diagnosis (#10033) -- BaseException catching to identify root causes

Model Support (14 features, 20 fixes)

  • GPT-5.4 -- full support including mini, nano, and fast variants (#9798, #10043)
  • GLM-5 -- GLM-5, GLM-5.1, GLM-5 Turbo, GLM-4.7 (#10317, #10285, #9994)
  • MiniMax M2.7 (#10093)
  • Baseten -- added as provider with full frontend integration, serverless auto mode, reasoning support (#10250, #9846, #9998)
  • Fireworks (#9780) and zAI coding provider (#10064)
  • Opus 4.6 / Sonnet 4.6 -- adaptive thinking tokens no longer incorrectly capped (#9795)
  • OpenAI proxy cleanup -- extra fields removed (#9949), parallel tool calling supported (#9879)

Security

  • Local filesystem access blocked via ImageContent bypass (#3256, #10329) -- file:/// URLs in images are now rejected
  • Internal MCP server targets blocked (#10009)
  • SECURITY.md added (#3228)

Infrastructure

  • Readiness enforcement scaffold (M1-M3 metrics pipeline) -- request pressure, DB pool, SSE lifecycle, event loop lag monitoring
  • Multi-agent tools moved to less privileged execution environment (#9779)
  • Subagent agents auto-hidden on create (#10096)
  • WebSocket transport for OpenAI Responses API (#9841)

For self-hosted users upgrading from 0.16.6: This release addresses the majority of issues reported in the community over the past month. The context window default change alone (#9993) eliminates the most common source of "everything breaks when I open ADE" complaints.

Full Changelog: 0.16.6...0.16.7

v0.16.6

04 Mar 03:14
4cb2f21

Choose a tag to compare

Highlights

  • Expanded Conversations API support for default conversation / agent-direct mode.
  • New conversations now initialize with a compiled system message at creation time.
  • Fixed model_settings.max_output_tokens default behavior so it does not silently override existing max_tokens unless explicitly set.

Conversations API updates

  • Added support for conversation_id="default" + agent_id across conversation endpoints (send/list/cancel/compact/stream retrieve).
  • Kept backwards compatibility for conversation_id=agent-* (deprecated path).
  • Added lock-key handling in agent-direct flows to avoid concurrent execution conflicts.

Conversation/system-message behavior

  • Conversation creation now compiles and persists a system message immediately.
  • This captures current memory state at conversation start and removes first-message timing edge cases.

Model/config updates

  • Added model support for:
    • gpt-5.3-codex
    • gpt-5.3-chat-latest
  • Updated defaults:
    • context window default: 32k → 128k
    • CORE_MEMORY_BLOCK_CHAR_LIMIT: 20k → 100k
  • Anthropic model settings now allow effort="max" where supported.
  • Gemini request timeout default increased to 600s.

Memory / memfs updates

  • Git-backed memory frontmatter no longer emits limit (legacy limit keys are removed on merge).
  • Skills sync now maps only skills/{name}/SKILL.md to skills/{name} block labels.
  • Other markdown under skills/ is intentionally ignored for block sync.
  • Memory filesystem rendering now includes descriptions for non-system/ files and condenses skill display.

Reliability and compatibility fixes

  • Added explicit LLMEmptyResponseError handling for empty Anthropic streaming responses.
  • Improved Fireworks compatibility by stripping unsupported reasoning fields.
  • Improved Z.ai compatibility by mapping max_completion_tokens to max_tokens.

Full Changelog: 0.16.5...0.16.6

v0.16.5

24 Feb 19:02
1b2aa98

Choose a tag to compare

What's Changed

Full Changelog: 0.16.4...0.16.5

v0.16.4

29 Jan 20:50
65dbd7f

Choose a tag to compare

What's Changed

Full Changelog: 0.16.2...0.16.4

v0.16.2

12 Jan 19:04
67013ef

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.16.1...0.16.2

v0.16.1

18 Dec 01:37
58ab2bc

Choose a tag to compare

What's Changed

Full Changelog: 0.16.0...0.16.1

v0.16.0

15 Dec 20:12
be53f15

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.15.1...0.16.0

v0.15.1

26 Nov 22:46
0893bbf

Choose a tag to compare

What's Changed

Full Changelog: 0.15.0...0.15.1

v0.15.0

25 Nov 03:16
7216d35

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.14.0...0.15.0

v0.14.0

14 Nov 00:02
693a352

Choose a tag to compare

What's Changed

Full Changelog: 0.13.0...0.14.0