For AI coders and human contributors alike. Read this before writing any code.
OneManCompany is an operating system for building AI-powered companies. One human CEO, a team of AI employees, a real company structure. It's not a chatbot wrapper or an agent demo — it's a full organizational simulation with hiring, task management, performance reviews, and company culture.
The codebase models a real company: employees have profiles (YAML), skills, departments, and work principles. They receive tasks, execute them via LangChain agents or Claude CLI, report progress, and get reviewed. The CEO (user) manages everything through a pixel-art office UI.
AI coding tools are powerful but opinionated. Left unchecked, they'll add "helpful" abstractions, swallow exceptions, cache data in memory, and create helpers for one-time operations. This guide exists to align AI coders with our engineering philosophy — which is opinionated in the opposite direction.
Three principles above all else:
- SSOT (Single Source of Truth) — Every piece of data has exactly one owner. Disk is truth. No caching. No duplication. If you find yourself storing the same data in two places, one of them is wrong.
- TDD (Test-Driven Development) — Write the test first. Watch it fail. Then implement. This isn't a suggestion — it's the workflow. Code without a failing test first is code that doesn't belong here.
- Modular, registry-based design — If you're writing
if/elif/elsefor different types, you're doing it wrong. Use registries and dispatch. New types should be addable without touching existing code.
These aren't suggestions. They're load-bearing walls. Violate them and the system breaks in subtle, hard-to-debug ways.
Before you start, here's what AI coders consistently get wrong in this codebase:
- Adding in-memory caches — "I'll store the employee list in a dict for faster access." No. Read from disk every time.
store.load_*()is the only read path. - Creating unnecessary abstractions — "Let me extract a base class for this." If there's only one implementation, inline it. Extract on the third use.
- Swallowing exceptions —
except Exception: passis banned. Log it, re-raiseCancelledError, handle it properly. - Adding "just in case" error handling — Don't validate inputs that come from trusted internal code. Only validate at system boundaries.
- Improving code you weren't asked to touch — Don't refactor neighboring functions, add docstrings to existing code, or rename variables for "clarity". Touch only what the task requires.
- Mocking at the wrong level — Patch where the function is imported, not where it's defined. This catches real import-path bugs.
- Writing test files outside
tmp_path— All unit test I/O must go totmp_path. Never write to the repo,company/, or.onemancompany/. Tests that pollute the codebase are worse than no tests.
- Design Philosophy
- Architecture Patterns
- Code Style
- Testing
- Code Smells & How to Eliminate Them
- Development Guides
Every change must be a systematic design, never a patch. If a bug reveals a structural flaw, fix the structure. If a feature request doesn't fit the current architecture, evolve the architecture — don't duct-tape around it.
Bad: Adding if employee_id == "00003": ... to handle a special case.
Good: Extracting a protocol/registry that handles all cases uniformly.
Extract harnesses and protocols. Never hardcode case-by-case.
# Bad: case-by-case in main.py
if tool_type == "gmail":
render_gmail_ui()
elif tool_type == "roblox":
render_roblox_ui()
# Good: registry-based, data-driven
_toolSectionRenderers = {
"oauth": render_oauth_section,
"env_vars": render_env_section,
}
for section in tool.sections:
renderers[section.type](section)Any new state or work item must be designed as a complete data package:
- Serializable — can be persisted to disk (YAML/JSON)
- Recoverable — can be restored after a server restart
- Registered — tracked in both company state and the owning employee
- Terminable — has a clear lifecycle, will not be stuck forever
Never write except Exception: pass. Always log errors. Always re-raise asyncio.CancelledError.
# Bad
try:
await do_work()
except Exception:
pass
# Good
try:
await do_work()
except asyncio.CancelledError:
raise
except Exception:
logger.exception("do_work failed")All business data lives in .onemancompany/ disk files. Writes go to disk immediately via core/store.py. Memory holds only intermediate computation products (layout, counters) — never cached copies of business data.
Rules:
- Every piece of data has exactly one file that owns it and exactly one write function (
store.save_*()) - Reads always go to disk (
store.load_*()) — no in-memory caching of business data - Frontend is a pure render layer — no
this.stateor cached copies; fetches from REST API on demand - Frontend-backend sync runs on a 3-second tick: backend accumulates dirty categories, broadcasts
state_changed, frontend re-fetches - Real-time chat messages are the exception — pushed immediately via WebSocket for low-latency UX
# Bad: in-memory cache that can diverge from disk
company_state.employees[emp_id].status = "working"
# Good: write to disk immediately, mark dirty for next tick
await store.save_employee_runtime(emp_id, status="working")Every critical code path must have logger.debug(...) at key decision points: function entry with parameters, branching conditions, external call results, and error context. Users deploy with INFO level (default); --debug mode (OMC_DEBUG=1) enables DEBUG level to surface these logs for diagnosis.
What to log (DEBUG level):
- Function entry with key parameters (truncate long strings)
- Branch decisions: which path was taken and why
- External call inputs/outputs (MCP, LLM, API)
- State transitions and their triggers
- Loop iterations with item identifiers
What NOT to log at DEBUG:
- Every line of execution (that's tracing, not debugging)
- Full request/response bodies (truncate to key fields)
- Sensitive data (API keys, tokens — mask them)
# Good: key decision points logged
logger.debug("[recruitment] search called, market_connected={}", talent_market.connected)
logger.debug("[recruitment] Talent Market candidate #{}: id={}, name={}", idx, tid, tname)
# Bad: no debug logs, impossible to diagnose in production
grouped = await talent_market.search(jd) # what happened? who knowsRule: If a bug required adding debug logs to diagnose, those logs stay in the codebase permanently. They cost nothing at INFO level and save hours on the next issue.
Don't over-engineer. The right amount of complexity is the minimum needed for the current task. Three similar lines are better than a premature abstraction.
- Don't add features, refactor, or "improve" beyond what was asked
- Don't add error handling for scenarios that can't happen
- Don't create helpers for one-time operations
- Don't design for hypothetical future requirements
The dominant pattern in this codebase. Used for snapshot providers, plugins, tools, event handlers, and UI section renderers.
# Decorator-based registry (see core/snapshot.py)
_providers: dict[str, type] = {}
def snapshot_provider(name: str):
def decorator(cls):
_providers[name] = cls
return cls
return decorator
@snapshot_provider("my_module")
class MySnapshot:
@staticmethod
def save() -> dict: ...
@staticmethod
def restore(data: dict) -> None: ...Where it's used:
core/snapshot.py—@snapshot_providerfor state persistencecore/plugin_registry.py— plugin discovery from directoriesagents/common_tools.py—BASE_TOOLS,GATED_TOOLS,COMMON_TOOLSlistsfrontend/app.js—_toolSectionRenderersfor dynamic UI
Async pub-sub for decoupled communication between backend modules and frontend.
from onemancompany.core.events import event_bus, CompanyEvent
# Publish
await event_bus.publish(CompanyEvent(
type="ceo_report",
payload={"subject": "...", "report": "..."},
agent="SYSTEM",
))
# Subscribe (WebSocket handler)
queue = event_bus.subscribe()
while True:
event = await queue.get()
await ws.send_json(event.payload)UI sections are declared in data files, not hardcoded in templates. The backend resolves runtime state, the frontend renders by type.
tool.yaml (declares) → backend (resolves state) → sections[] → frontend (renders by type)
To add a new section type:
- Add the key to
tool.yaml - Add section builder in
routes.py:get_tool_definition() - Add renderer in
app.js:_toolSectionRenderers
Employee task execution uses a launcher abstraction:
| Launcher | Hosting | How it works |
|---|---|---|
LangChainLauncher |
Company-hosted | create_react_agent with LangChain tools |
ClaudeSessionLauncher |
Self-hosted | claude --print CLI with MCP bridge |
ScriptLauncher |
Script | Runs a bash script |
BASE_TOOLS — always available (read, ls, write, edit, list_colleagues, ...)
GATED_TOOLS — need explicit tool_permissions (bash, use_tool, ...)
COMMON_TOOLS — founding employees only (all tools + admin tools)
| Tier | Trigger | Action |
|---|---|---|
| 1 | company/ data files changed |
Instant state reload |
| 1.5 | frontend/ files changed |
Browser reload notification |
| 2 | src/ Python files changed |
Graceful restart (snapshot → os.execv) |
# Imports: stdlib → third-party → local, separated by blank lines
from __future__ import annotations
import asyncio
import json
from pathlib import Path
from loguru import logger
from onemancompany.core.config import EMPLOYEES_DIR
from onemancompany.core.state import company_state- Lazy imports inside functions for heavy or circular dependencies
- Type hints on function signatures, not on every local variable
- Dataclasses for structured data, not dicts
- f-strings for formatting, never
.format()or% loguru.loggerfor logging, notprint()or stdliblogging- No SDK dependencies in tools when possible — prefer
urllib.requestfor HTTP
- Vanilla JS — no frameworks, no build step
- Canvas 2D for the pixel art office
- Class-based controller (
AppController) with method namespacing - Event-driven via WebSocket messages
- Pixel-consistent styling: 7px font, monospace, CSS variables for theming
_escapeHtml()for all user-provided content in innerHTML
- Configuration and data files use YAML, not JSON
- Employee profiles, tool manifests, workflow definitions — all YAML
- Keep YAML flat where possible, nested only when structurally necessary
# Python
employee_id = "00002" # snake_case for variables
def _private_helper(): # underscore prefix for private
class EmployeeManager: # PascalCase for classes
EMPLOYEES_DIR = Path(...) # UPPER_SNAKE for constants// JavaScript
viewingEmployeeId // camelCase for variables
_showCeoReport() // underscore prefix for private methods
_toolSectionRenderers // registry objects# YAML keys
employee_id: "00002" # snake_case
allowed_users: [] # snake_caseWrite tests first, then implement. This is a hard requirement.
# 1. Write the test
# tests/unit/test_new_feature.py
# 2. Run it — should fail
.venv/bin/python -m pytest tests/unit/test_new_feature.py -x
# 3. Implement the feature
# 4. Run it — should pass
.venv/bin/python -m pytest tests/unit/test_new_feature.py -x
# 5. Verify no regressions
.venv/bin/python -m pytest tests/ -xtests/
unit/ — Fast (<1s), no external deps, no network
integration/ — Mock LLM, <30s
e2e/ — Running server, <120s
conftest.py — Shared fixtures
- Mock at the importing module level, not the source module:
# Bad: patches where the function is defined
@patch("onemancompany.core.config.load_employee")
# Good: patches where the function is imported
@patch("onemancompany.agents.base.load_employee")-
WebSocket tests: Don't use Starlette
TestClient(thewhile Trueloop hangs). Mock the WebSocket object directly and call async functions. -
Employee IDs: Avoid
00002–00005in tests (these are founding executive IDs). Use00100+for test employees. -
Disk isolation — ALL unit tests MUST run in
tmp_path. Tests must never write to the real.onemancompany/directory, the project source tree, or any persistent location. A test that leaves files behind is a test that pollutes the codebase. Thetests/unit/conftest.pyprovides autouse fixtures that redirect disk writes totmp_path:persist_taskand_append_progressare auto-redirected viavessel.EMPLOYEES_DIRandtp.EMPLOYEES_DIRpatchesstore.save_employee(),store.save_employee_runtime(),store.append_activity()are auto-intercepted by the bridge fixture — they only write to disk when the test explicitly patchesstore.EMPLOYEES_DIRtotmp_path- If your test needs disk writes, explicitly
monkeypatch.setattr(store, "EMPLOYEES_DIR", tmp_path)— this signals the bridge to allow writes to the controlled tmp directory - If your test does NOT set up in-memory
company_state.employeesand does NOT redirectstore.EMPLOYEES_DIR, store write calls become no-ops (preventing leaks) - Bottom line: if your test creates files, they go in
tmp_path. No exceptions. No "temporary" files in the repo root. No writing tocompany/or.onemancompany/.
-
Compilation check: Always verify after editing:
.venv/bin/python -c "from onemancompany.api.routes import router; print('OK')"import pytest
@pytest.mark.asyncio
async def test_something():
result = await some_async_function()
assert result["status"] == "ok"Smell: if type == "X": ... elif type == "Y": ...
Fix: Registry/dict dispatch. Map types to handlers.
Smell: A function doing 5+ unrelated things, 100+ lines. Fix: Extract into named sub-functions. Each function does one thing.
Smell: Passing dicts with magic string keys everywhere.
Fix: Use @dataclass with typed fields. The compiler catches typos.
Smell: Reading/writing YAML in 10 different places with 10 different error handling patterns.
Fix: One _load_yaml(path) / _save_yaml(path, data) helper.
Smell: except Exception: pass
Fix: Always logger.exception(...). Re-raise CancelledError.
Smell: _old_var = new_var # backward compat lingering for months.
Fix: Delete it. If nothing breaks, it wasn't needed. If something breaks, fix the caller.
Smell: A state/task that can be created but never completed or cleaned up. Fix: Design the full lifecycle: create → active → complete/fail → cleanup. Add timeout/expiry.
Smell: Module A reads Module B's internal _private_dict directly.
Fix: Module B exposes a public API. Module A calls it.
Smell: Writing implementation first, tests later (or never). Fix: Write the test. Watch it fail. Implement. Watch it pass.
Smell: Abstract factory pattern for a function called once. Fix: Inline it. Three lines of repeated code is fine. Extract when you hit the third use.
Every PR must pass this three-phase review before merge. This is not optional — it's how we catch bugs that tests don't cover and design debt before it accumulates.
Design principles are documented in full at docs/design-principles.md.
Go through every changed line and ask:
| Check | What to look for |
|---|---|
| State mutation safety | Does the code modify shared state (node, tree, schedule)? Is the modification atomic? Can a restart mid-operation leave things inconsistent? |
| Restart recovery | If the server crashes right after this line, does the system recover correctly? Any in-memory-only state that's lost? |
| Edge cases | What if the input is empty/None/missing? What if the operation was already done (idempotency)? What if a concurrent operation modified the same data? |
| Error paths | What happens when the operation fails? Is the error logged? Does CancelledError propagate? Are there silent except blocks? |
| Timing/ordering | If multiple async operations run, does the order matter? Can one complete before another starts and cause issues? |
For each changed file, verify against the core principles:
| Principle | Question to ask |
|---|---|
| Systematic, not patching | Would a second similar request require touching this same code? Or can it be handled by just adding data/config? |
| Modular/generic | Is there a if type == "X" branch that could be a registry/field-based dispatch? Is the solution reusable or one-off? |
| Complete data package | Any new state introduced? Is it serializable, recoverable after restart, registered, and terminable? |
| SSOT (disk is truth) | Does this add in-memory caching of business data? Does it duplicate information that lives elsewhere? |
| Status via transition() | Any direct node.status = "..." assignments instead of set_status()? |
| No silent except | Any except Exception: pass or except: pass? |
Look beyond the changed lines:
| Check | What to look for |
|---|---|
| Downstream consumers | Who reads the fields/state you modified? Do they handle the new values correctly? |
| YAML/API contract | Did you add/remove/rename a field in to_dict()? Does from_dict() handle old files without the field? Does the frontend expect specific keys? |
| Performance | Does this add per-node/per-tick work that scales with the number of nodes/employees? |
| Watchdog/cron interaction | If you created a HOLDING state, is the watchdog behavior correct? Will it timeout-escalate when it shouldn't? Is the resume path guaranteed to fire? |
| Test coverage | Are the new paths tested? Are edge cases (empty, duplicate, restart) covered? |
When reviewing, categorize findings:
- Critical (C): Bug or data loss. Must fix before merge.
- Important (I): Correctness risk in edge cases. Should fix.
- Suggestion (S): Nice to have, not blocking.
## Review: PR #10
### C1: [title]
[description + location + fix suggestion]
### I1: [title]
[description + location]
### S1: [title]
[description]
Detailed guides for specific subsystems:
| Guide | Location | Description |
|---|---|---|
| Design Principles | docs/design-principles.md | The 8 load-bearing principles — read this first |
| Tool Development | company/assets/tools/README.md | Creating custom LangChain tools with OAuth, env vars, and dynamic UI |
| Workflow Rules | company_rules/README.md | Writing workflow definitions parsed by the workflow engine |
| Plugin Development | plugins/README.md | Creating frontend plugins (kanban, timeline, etc.) |
src/onemancompany/
core/ — Business logic, state, config, events
agents/ — LangChain agent definitions + tools
api/ — FastAPI routes + WebSocket
tools/mcp/ — MCP server bridge for self-hosted employees
talent_market/ — Hiring + talent system
company/ — Runtime data (employees, projects, assets)
frontend/ — Vanilla JS + Canvas 2D
tests/ — pytest (unit / integration / e2e)
All changes must go through a Pull Request. Never push directly to main.
# Correct workflow:
git checkout -b fix/my-bugfix
# ... make changes, commit ...
git push -u origin fix/my-bugfix
gh pr create --title "fix: description" --body "..."
# NEVER do this:
git push origin main # ← BANNED- Every code change (bugfix, feature, refactor) requires a new branch → commit → PR
- PR must pass all tests (pre-commit hook runs full test suite)
- Review before merge — no exceptions
# Start server
.venv/bin/python -m onemancompany.main
# Verify compilation
.venv/bin/python -c "from onemancompany.api.routes import router; print('OK')"
# Run tests
.venv/bin/python -m pytest tests/unit/ -x
# Check frontend syntax
node -c frontend/app.js