Skip to content

2.0.0

Latest

Choose a tag to compare

@asdek asdek released this 11 Apr 22:10
· 3 commits to main since this release
87ac00f

PentAGI 2.0 — Broader Model Support, Analytics, Runtime Flexibility, and Agent Guardrails

This release expands the LLM provider ecosystem with four new providers, introduces a full analytics dashboard, enables runtime provider switching mid-flow, adds Docker host network mode for OOB attack scenarios, and ships a comprehensive set of stability and reliability improvements. It also includes significant test coverage expansion across the backend codebase.

DiscordTelegram


Major Features

Four New LLM Providers: DeepSeek, GLM, Kimi, Qwen

Native support for four providers from the Chinese LLM ecosystem — DeepSeek, GLM (Zhipu AI), Kimi (Moonshot AI), and Qwen (Alibaba Cloud). Each is available through the standard provider configuration interface with API key and server URL environment variables (DEEPSEEK_API_KEY, GLM_API_KEY, KIMI_API_KEY, QWEN_API_KEY). All four providers are registered in the GraphQL schema and settings UI, and come with pre-configured model lists and pricing information. Unit test coverage for all four providers is included (~71% coverage per provider).

Ollama Cloud Support

In addition to local Ollama deployments, PentAGI now ships a pre-built ollama-cloud.provider.yml configuration with 7 cloud-hosted models assigned per agent type: nemotron-3-super, qwen3-coder-next, glm-5, minimax-m2.7, qwen3.5:397b, and devstral-2:123b. Both Free Tier and Paid Tier setup options are documented. The configuration is bundled in the Docker image at /opt/pentagi/conf/ollama-cloud.provider.yml.

Analytics Dashboards

A new analytics dashboard surfaces usage statistics and cost data collected from the REST API analytics endpoints introduced in v1.2.0. The dashboard shows:

  • Token usage and cost breakdown per flow and per agent type (primary, pentester, coder, installer, searcher, adviser, etc.)
  • Cache hit rates and cache read/write cost separation for Anthropic and Gemini providers
  • Tool call frequency and execution time metrics per flow and subtask
  • Per-model cost detail, useful when running multiple provider configurations simultaneously

Runtime Provider Switching

It is now possible to switch the active LLM provider for a running flow without restarting the application. To switch providers: pause the flow using the stop button, navigate to provider settings to change the active provider or update configuration, then resume the flow with a message directing the agent. The backend applies conditional chain normalization to preserve the reasoning cache when the provider is unchanged, and converts tool call IDs when switching providers. The modelProvider parameter has been added to the relevant GraphQL mutations to support this workflow.

Agent Supervision System (Beta)

Two optional supervision mechanisms are now available and disabled by default:

  • AGENT_PLANNING_STEP_ENABLED=true — enables a planning step before each specialist agent starts work, where a planner generates a 3–7 step execution plan to scope the subtask and prevent drift.
  • EXECUTION_MONITOR_ENABLED=true — enables automatic detection of unproductive agent behavior: consecutive identical tool calls (EXECUTION_MONITOR_SAME_TOOL_LIMIT, default 5) and excessive exploration (EXECUTION_MONITOR_TOTAL_TOOL_LIMIT, default 10) trigger automatic mentor intervention to redirect the agent.

Reworked Langfuse Observability

The Langfuse integration has been significantly overhauled for clearer visualization of agent activity. Observation types are now separated into Spans, Generations, Agents, Tools, Chains, Retrievers, Evaluators, Embeddings, and Guardrails. Each agent's tool calls, LLM calls, and intermediate results are tracked as distinct observations, making it straightforward to trace why an agent made a particular decision, what input it received, and what output it produced. Score metrics, timing data, and variable tracking have been improved across all observation types.


New Capabilities

Docker Host Network Mode

Setting DOCKER_NETWORK=host instructs PentAGI to create worker containers using the host network stack instead of a bridge network. This gives containers direct access to local network interfaces, which is necessary for OOB (out-of-band) attack techniques that require binding to local interfaces — such as setting up reverse shells where the listener must be reachable from the target network. Agent prompts include mandatory guidance on OOB port allocation for this mode.

HTTP Client Timeout

A new environment variable HTTP_CLIENT_TIMEOUT (default: 600 seconds / 10 minutes) applies a timeout to all outbound HTTP connections — including every LLM provider, search tool, and external API call. Previously, connections to unresponsive backends could hang indefinitely, blocking agent goroutines. When the config is nil, a client with the default timeout is returned instead of Go's http.DefaultClient (which has no timeout).

Agent Tool Call Limits

Hard limits are now enforced on the number of tool calls per agent invocation:

  • MAX_GENERAL_AGENT_TOOL_CALLS (default: 100) — applies to primary and specialist agents
  • MAX_LIMITED_AGENT_TOOL_CALLS (default: 20) — applies to focused agents (reflector, mentor, etc.)

When the limit is reached, the agent is guided to a graceful completion using barrier tools rather than being abruptly terminated.

Tool Call ID Generation

A configurable tool call ID template mechanism has been added for LLM backends that require a specific tool call ID format. This prevents validation errors from providers with strict ID format requirements and is set per provider configuration.

vLLM Reference Configuration

A tested reference configuration for qwen3.5-27b under vLLM is included, aimed at fully air-gapped or isolated environments where cloud LLM providers are unavailable. The configuration covers model parameters optimized for the agent roles PentAGI uses.

Flow Templates

Flows can now be saved as templates and reused. The templates management interface is available in the frontend, with full GraphQL API support for creating, updating, and launching flows from templates.

Novita AI Provider (Optional)

Novita AI is available as an optional provider via custom provider YAML configuration (novita.provider.yml). The default model assignment uses moonshotai/kimi-k2.5 for primary agents.


LLM Provider Improvements

Updated Model Configurations

All built-in provider configurations have been updated to reflect the current model landscape:

  • OpenAI: updated to GPT-5.4 series with revised pricing and token limits
  • Anthropic: increased max_tokens limits and updated to latest Claude Sonnet/Opus variants
  • Gemini: updated model assignments including Gemini 2.5-class models
  • Bedrock: added support for Default AWS SDK credential chain, Bearer token, and static credentials (Access Key + Secret Key), in addition to the existing session token method

Function Call and Thinking Signatures

Provider-level support for function call signatures and thinking signatures ensures that reasoning-capable models (Claude extended thinking, Gemini thinking tokens, DeepSeek R1 reasoning mode) produce well-formed conversations that preserve reasoning context across multi-turn interactions. This is particularly important for chain summarization, which now retains thinking signatures when compressing long conversations.

Improved Token Caching

Token caching has been further optimized for Anthropic (ephemeral cache controls) and Gemini (pre-created content caching) to reduce costs in long-running flows. Cache hit and cache write token counts are tracked separately per turn and are visible in the analytics dashboard.


Bug Fixes

Bedrock Provider Compatibility

Two distinct Bedrock issues have been resolved:

  • Fixed ValidationException errors when using the Converse API with tool schemas generated by Go's jsonschema reflector. The $schema field is now automatically stripped from tool parameters before sending to Bedrock.
  • Fixed a runtime failure where toolConfig was undefined for message chains containing toolUse/toolResult blocks. WithTools is now applied last in both CallEx and CallWithTools to prevent provider config options from overwriting tool definitions.

Detached Terminal Command Hangs

Background (detach mode) terminal commands previously inherited the parent agent context. When the parent context was cancelled (e.g., due to agent delegation timeout), the background goroutine was also terminated. Fixed by using context.WithoutCancel for detached goroutines, which preserves context values (tracing, logging) while preventing parent cancellation propagation. The command's own timeout continues to work as expected.

Infinite Agent Loop Prevention

Two complementary safeguards are now in place:

  • A hard cap of 100 iterations on the main agent chain loop prevents infinite execution when a model repeatedly calls the same tool.
  • After 3 consecutive identical tool calls, the agent receives a soft "please try another approach" message. After 7 identical consecutive calls in total, the loop terminates with an error.
  • Infinite reflector recursion has been independently fixed by moving retry guards to reflector entry points.

Logging and Log Worker Fixes

Several issues with agent activity logging were resolved: fixed log update propagation in the flow assistant log worker, corrected empty log entries being created when no updates occurred during assistant processing, improved streaming log throttling to prevent excessive cache updates, and fixed log worker initialization in assistant mode (missing agent call limits and execution monitoring).

QA Summarization Double-Summarization

Fixed an issue where already-summarized sections could be summarized a second time, producing degraded summaries in long-running flows with many completed phases.

Search Tool HTTP Client Safety

tavily.go and traversaal.go were mutating Go's global http.DefaultClient.Transport when configuring proxy settings, creating a data race for concurrent requests. Both tools now create a new http.Client instance when a proxy is configured.

Browser Tool Screenshot Handling

A screenshot failure in the browser tool no longer discards successfully-fetched page content. The screenshot is treated as a non-critical side effect; on failure, a warning is logged and the page content is returned with an empty screenshot reference.

Google Search Proxy Configuration

The Google search tool was constructing a proxy-configured options slice but then ignoring it and using a hardcoded option.WithAPIKey in the actual service creation call. The proxy configuration is now correctly applied.

User-Defined Provider Precedence

Fixed an issue (#220) where built-in provider configurations could override user-defined custom configurations with the same provider identifier. User-defined providers now always take precedence.

Security: CA Private Key Cleanup

After the server certificate is signed during container startup, the CA private key, CSR, and serial file are now immediately removed from disk. These files are not needed at runtime and their presence increases the attack surface if the container filesystem is compromised.

Auth Session Management

Improved session handling in the frontend: WebSocket connections on public pages are prevented, 401/403 errors in WebSocket, GraphQL, and HTTP requests trigger automatic session refresh, and the OAuth providers list is always fetched fresh on the login page to reflect newly added providers without cache clearing.


Test Coverage

Backend test coverage has been significantly expanded in this release. Key package coverage after new tests:

Package Coverage
pkg/terminal 83.3%
pkg/queue 89.4%
pkg/schema 86.5%
pkg/server/auth 87.7%
pkg/server/response 100.0%
pkg/server/context 100.0%
pkg/csum 84.0%
pkg/cast 87.3%
pkg/config 75.7%
pkg/providers/bedrock 81.1%
pkg/providers/custom 79.2%
pkg/providers/embeddings 74.0%
pkg/providers/deepseek 71.4%
pkg/providers/glm 71.4%
pkg/providers/kimi 71.4%
pkg/providers/qwen 71.4%
pkg/providers/tester 78.7%

Tests also cover agent context management, tool registry completeness, executor helpers, terminal formatting utilities, langfuse helpers and noop observer, graphiti disabled mode, server/models validation, and JSON Schema validation.


Infrastructure

  • Docker Compose healthcheck: pg_isready healthcheck added to the pgvector service so the application waits for the database to be fully ready before starting.
  • License compliance: CONTRIBUTING.md with license compliance guidelines for contributors, and Dockerfile tooling to generate frontend and backend dependency license reports.
  • Frontend terminal: Modular architecture refactor, Unicode rendering fix, and security hardening.

Documentation

  • Added CONTRIBUTORS.md recognizing all contributors across the full project history (see note below).
  • Updated README with Ollama Cloud setup instructions (Free Tier and Paid Tier).
  • Added reference documentation for Docker host network mode and OOB attack scenarios.
  • Documented agent supervision settings (AGENT_PLANNING_STEP_ENABLED, EXECUTION_MONITOR_ENABLED, MAX_GENERAL_AGENT_TOOL_CALLS, MAX_LIMITED_AGENT_TOOL_CALLS, HTTP_CLIENT_TIMEOUT).

A Note on Repository History

On March 29, 2026, the repository history prior to that date was rewritten into a single squash commit to resolve a licensing matter. Individual commit history from January 2025 through March 2026 is no longer visible in the GitHub interface. The CONTRIBUTORS.md file was created to permanently record all contributions made during that period.


Contributors

Core Team

  • @asdek (Dmitry Nagibin) — Architecture, backend infrastructure, agent system, provider integrations, observability, project coordination
  • @sirozha (Sergey Kozyrenko) — React UI, settings interfaces, GraphQL integration, frontend architecture, analytics dashboard, terminal component
  • @zavgorodnii (Andrei Zavgorodnii) — Graphiti integration, patch refiner, knowledge graph implementation

External Contributors


Full Changelog: v1.2.0...v2.0.0