Skip to content

Conversation

@JAORMX
Copy link
Collaborator

@JAORMX JAORMX commented Oct 13, 2025

Summary

This PR adds a comprehensive architecture documentation suite in docs/arch/ covering ToolHive's design, components, and concepts.

Documentation Added

  • 00-overview.md: High-level architecture overview, key components, and platform philosophy
  • 01-deployment-modes.md: Local CLI, UI, and Kubernetes deployment patterns
  • 02-core-concepts.md: Core terminology, abstractions, nouns/verbs, and design patterns
  • 03-transport-architecture.md: MCP transport protocols (stdio, SSE, streamable-http) and proxy architecture
  • 04-secrets-management.md: Secret handling and backend integrations (1Password, encrypted storage)
  • 05-runconfig-and-permissions.md: Configuration schema, permission profiles, and security model
  • 06-registry-system.md: Registry architecture, distribution, and server catalog management
  • 07-groups.md: Group management, virtual MCP servers, and logical organization
  • 08-workloads-lifecycle.md: Workload state management, lifecycle operations, and process model
  • 09-operator-architecture.md: Kubernetes operator design, CRDs, and reconciliation patterns
  • README.md: Navigation guide with quick links and documentation index

🤖 Generated with Claude Code

@JAORMX JAORMX force-pushed the docs/arch branch 5 times, most recently from 54cb6dc to 680158a Compare October 13, 2025 10:44
@codecov
Copy link

codecov bot commented Oct 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.36%. Comparing base (103266e) to head (721ed35).
⚠️ Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2165      +/-   ##
==========================================
+ Coverage   53.31%   53.36%   +0.04%     
==========================================
  Files         231      231              
  Lines       29581    29581              
==========================================
+ Hits        15772    15785      +13     
+ Misses      12674    12658      -16     
- Partials     1135     1138       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JAORMX JAORMX changed the title Add comprehensive architecture documentation Add architecture documentation Oct 13, 2025
@JAORMX JAORMX force-pushed the docs/arch branch 2 times, most recently from 4d6d2bc to 309ebb5 Compare October 13, 2025 11:26
@ChrisJBurns
Copy link
Collaborator

Wondering if @danbarr has any thoughts on this, as there will seem to be overlaps in documentation between the docs website and this repo?

@JAORMX
Copy link
Collaborator Author

JAORMX commented Oct 13, 2025

@ChrisJBurns docs for a different purpose. these are for devs

@eleftherias
Copy link
Member

We have some other documentation throughout the codebase that overlap with this, for example https://github.com/stacklok/toolhive/blob/main/docs/middleware.md.
Should we move them all to this section or keep them spread out?

Also is there an opportunity to split up this PR? It's quite long and dense to fit into my mind at once.

@JAORMX
Copy link
Collaborator Author

JAORMX commented Oct 15, 2025

@eleftherias I can split it into multiple PRs... but then I'd have broken markdown references and incomplete parts 😕 I figured it might just be worth getting something started and iterating on top of this.

regarding middleware.md, that's a good idea! We could ditch that one and absorb it to the new arch docs.

JAORMX and others added 15 commits October 20, 2025 14:26
Replace full struct definition with link to pkg/runner/config.go
and categorized field summary to reduce maintenance burden.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Verified against source code and corrected:
- Export command syntax (requires 2 args: workload and path, no stdout)
- Cedar policy format (Client:: not User::, Action::call_tool not "tools/call")
- Group operations (thv list --group, not thv group list <name>)
- File locations (data files in ~/.local/share, state in ~/.local/state)
- Complete socket paths including macOS locations (Podman Machine, Docker Desktop, Rancher)

All changes verified against pkg/authz/cedar.go, cmd/thv/app/export.go,
pkg/container/docker/sdk/client_unix.go, and pkg state/workloads code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Verified against actual code:
- Scalar UI path is /api/doc not /scalar (pkg/api/docs.go:13, server.go:234)
- Fixed audit event types based on pkg/audit/mcp_events.go (15 total types)
- Corrected mcp_list_operation to actual types: mcp_tools_list, mcp_resources_list, mcp_prompts_list

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Project structure section was removed from overview, update index to match.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Clarify tool-filter and tool-call-filter middleware descriptions
- Separate tool filtering from tool overriding in documentation
- Rename "Filter" section to "Filter and Override" to reflect both operations
- Change "metrics" to "telemetry" for consistency with middleware naming
- Explain that both middlewares work together with shared configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Replace CRD examples with references to examples/operator/mcp-servers/ directory
- Fix export command syntax (thv export requires output path)
- Fix group commands documentation (thv list --group instead of thv group list)
- Refocus groups documentation on architecture rather than CLI usage
- Remove excessive CLI usage examples to reduce maintenance burden

All changes verified against actual codebase implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Clarify token storage security in remote authentication (AES-256-GCM encryption)
- Add Kubernetes Mode section to secrets documentation explaining native K8s Secret usage
- Note that Kubernetes uses SecretKeyRef, not the provider system used in CLI mode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add a new section to CLAUDE.md instructing agents to update
architecture documentation when making code changes. Includes
a mapping table of code areas to documentation files and
guidelines for keeping docs in sync with implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Fix all 12 unresolved review comments by improving architectural focus:

- Remove CLI command examples, focus on architectural concepts
- Update file path references to actual implementation files
- Fix middleware type name from 'authz' to 'authorization'
- Organize RunConfig fields by architectural categories
- Simplify audit events to categories instead of exhaustive list
- Simplify request flow diagram and reference middleware.md
- Correct file paths for registry, session, client, MCP, audit, monitor, healthcheck

These changes align the documentation with architectural best practices:
focusing on concepts, patterns, and system design rather than CLI
usage or exhaustive implementation details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Address PR feedback by removing CLI examples and correcting technical details:

- Remove all CLI command examples (architecture docs should focus on design, not usage)
- Fix container monitor path: pkg/container/docker/monitor.go (not pkg/container/monitor.go)
- Correct OAuth token storage: tokens managed in-memory by TokenSource, not persisted
- Clarify MCP_HOST: defaults to 127.0.0.1 locally, 0.0.0.0 in Kubernetes
- Replace CLI examples with architectural descriptions of concepts
- Update port management to describe architecture, not command flags
- Document TokenSource pattern and client credential storage distinction

These changes align documentation with actual implementation and follow
architecture documentation best practices.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Address final round of PR feedback by removing CLI examples and correcting technical details:

- Remove all CLI command examples from architecture docs
- Fix 1Password implementation: SDK not CLI (diagram and text)
- Add missing secret providers: environment and none
- Document Environment provider security: ListSecrets disabled for security
- Correct environment variable merge order with architectural reasoning
- Fix Windows path handling: allowed as host paths only, not container paths
- Replace export/import CLI examples with architectural descriptions
- Update permission auditing, network isolation, secrets management sections
- Remove CLI flags from custom profiles section

All changes verified by toolhive-expert agent. Documentation now focuses on
architectural concepts and design patterns rather than CLI usage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Fix architecture documentation inaccuracies identified in code review:

Registry System (06):
- Update file references to actual provider implementation files
- Remove reference to non-existent README
- Fix annotation keys to use correct toolhive.stacklok.dev domain
- Correct MCPRegistry phases (remove Degraded, add Terminating)
- Fix YAML examples (apiVersion, Git repository field, sync policy)
- Remove incomplete OAuth example
- Update CLI flags to match actual implementation
- Remove reference to non-existent converter command
- Simplify architecture diagram to reflect actual implementation

Groups (07):
- Clarify group move functionality is internal only
- Add note about empty default registry groups
- Remove stale PR reference, use generic description

Workloads Lifecycle (08):
- Remove all line number references per documentation guidelines
- Fix storage paths to match XDG directory structure
- Correct label format to simple prefix style

Operator Architecture (09):
- Fix MCPExternalAuthConfig filename reference
- Add missing controller reference
- Remove incorrect StatusCollector example code
- Fix sync trigger annotation key

All changes verified against actual code implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Juan Antonio Osorio <[email protected]>
Remove CLI-focused content and maintain architecture focus:
- Fix state transition: container exit goes to stopped (was already correct in diagram)
- Remove non-existent update command section
- Remove CLI examples from List section, describe architecture instead
- Rename 'Async Operations' to 'Batch Operations' for clarity
- Remove CLI flags from filtering, describe capability architecturally
- Expand label descriptions with purpose/meaning

Architecture docs should describe system design, not CLI usage.
Verified against pkg/workloads/manager.go and pkg/container/runtime/types.go

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Clarify that Rancher Desktop is Docker-compatible and detected as Docker runtime type
- Convert CRD API documentation reference to clickable link
- Convert operator DESIGN.md reference to clickable link

Rancher Desktop is included in the Docker section because it's detected as
runtime.TypeDocker in the code (pkg/container/docker/sdk/client_unix.go:188),
not as a separate runtime type like Colima or Podman.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture documentation review - Batch 2: Core Concepts, Transport Architecture, and Secrets Management

02-core-concepts.md: 3 issues (mount format, network field names, session terminology)
03-transport-architecture.md: 3 issues (MCP_HOST question, merged PR status, trust proxy headers)
04-secrets-management.md: 2 issues (storage paths, prompting behavior)

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture documentation review - Batch 3: RunConfig and Permissions

05-runconfig-and-permissions.md: 9 issues (env var merge order, defaults, state paths, proxy description)

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture documentation review - Batch 4: Registry System and Operator Architecture

06-registry-system.md: 6 issues (provenance fields, file paths, ConfigMap structure, status fields, Sigstore status)
09-operator-architecture.md: 8 issues (status phases, missing CRD, resource lists, YAML examples)

Fix all unresolved PR review comments from PR #2165:

- Fix file paths and references to match actual codebase structure
- Correct API endpoint paths to use /api/v1beta/ prefix
- Update container runtime paths and support descriptions
- Fix environment variable merge order and documentation
- Correct secrets storage paths using XDG specification
- Update MCP_HOST behavior documentation (always 127.0.0.1)
- Fix permission profile selection priority and defaults
- Update Sigstore verification status to implemented
- Fix operator CRD status structures and phase names
- Add MCPRemoteProxy CRD documentation
- Complete operator-created resources list with RBAC details
- Fix ConfigMap storage format and sync policy documentation
- Clarify OIDC and authorization ConfigMap patterns
- Update egress proxy implementation details
- Fix state storage directory name to 'runconfigs'

All changes verified against actual codebase implementation
to ensure documentation accuracy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jhrozek
Copy link
Contributor

jhrozek commented Oct 20, 2025

@JAORMX I had claude resolve all the threads that were addressed. The remaining have inline suggestions, can you accept those so I can accept the PR?

We might also need to ask @ChrisJBurns to remove his changes requested since I don't think my approve will override his nack

@ChrisJBurns ChrisJBurns dismissed their stale review October 20, 2025 22:50

dismissing review

jhrozek
jhrozek previously approved these changes Oct 21, 2025
@JAORMX JAORMX enabled auto-merge (squash) October 21, 2025 06:45
@JAORMX JAORMX merged commit ca60962 into main Oct 21, 2025
26 checks passed
@JAORMX JAORMX deleted the docs/arch branch October 21, 2025 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants