Skip to content

feat: add observability, memory, resilience, and guidance modules#329

Open
reh3376 wants to merge 2 commits intokarpathy:masterfrom
reh3376:feat/mdemg-observability-and-memory
Open

feat: add observability, memory, resilience, and guidance modules#329
reh3376 wants to merge 2 commits intokarpathy:masterfrom
reh3376:feat/mdemg-observability-and-memory

Conversation

@reh3376
Copy link

@reh3376 reh3376 commented Mar 18, 2026

Summary

Adds four optional, non-breaking Python modules that improve the effectiveness of autonomous experiment runs by bringing production-grade observability, learning, and resilience patterns to autoresearch. Inspired by mdemg — a persistent memory system for AI coding agents built on Neo4j, Hebbian learning, and Prometheus metrics.

Key principle: These modules enhance the infrastructure around the experiment loop — they never touch train.py or prepare.py. The core 5-minute experiment loop works exactly as before. All modules use only Python stdlib (json, time, math, dataclasses) — zero new dependencies.

New Modules

monitor.py — Experiment Metrics & Observability

Inspired by mdemg's internal/metrics/ Prometheus pipeline and 10-panel Grafana dashboard

  • ExperimentTracker class with full experiment lifecycle tracking (start_experimentrecord_stepend_experiment)
  • Per-step loss curve capture (sampled at configurable intervals to bound memory)
  • Session-level aggregates: keep rate, improvement velocity (BPB/hour), training hours
  • Real-time alerting with configurable thresholds (env vars):
    • Loss spike detection (smoothed EMA vs raw loss ratio)
    • VRAM pressure warnings
    • Consecutive crash streak alerts
    • Improvement plateau detection
  • Prometheus text exposition format export (get_prometheus_text()) — compatible with node_exporter textfile collector or direct scraping
  • JSON export for external dashboard consumption (Grafana, custom UIs)
  • Terminal dashboard (format_dashboard()) for quick-glance session status
  • Crash-resilient: persists session state to disk, recovers on restart

memory.py — Cross-Session Experiment Memory with Hebbian Learning

Inspired by mdemg's Conversation Memory System (internal/conversation/) and Hebbian learning engine (internal/learning/)

  • ExperimentMemory class — persistent knowledge base across research sessions
  • Hebbian association tracking: strengthens connections between change categories and positive/negative outcomes using mdemg's tanh soft-capping formula (w = wmax * tanh((w + eta * signal) / wmax)) — smooth saturation instead of hard clamping, allowing continuous learning near weight limits
  • 15 standardized change categories: architecture, attention, activation, optimizer, learning_rate, schedule, batch_size, initialization, regularization, normalization, embedding, numerical, simplification, combination, radical
  • Auto-tagging: keyword-based classifier automatically assigns categories from experiment descriptions
  • Temporal decay: exponential weight decay with cautious skipping of recently-reinforced associations (mirrors mdemg's cautious decay window)
  • Surprise-weighted storage: unexpected results (contradicting Hebbian expectations) receive higher surprise scores and stronger learning signals — inspired by mdemg's CMS which retains novel observations longer than routine ones
  • Pattern extraction APIs:
    • get_promising_directions() — ranked categories blending Hebbian weight with exploration bonus
    • get_dead_ends() — categories that consistently fail (agent should avoid)
    • get_plateaus() — velocity-based plateau detection comparing recent vs earlier improvement rates
    • get_surprise_highlights() — most unexpected results for investigation
  • Persists to .autoresearch/memory/memory.json

resilience.py — Circuit Breakers & Anomaly Detection

Inspired by mdemg's internal/circuitbreaker/, internal/anomaly/, and internal/backpressure/

  • CircuitBreaker — prevents wasting GPU time on repeated failures:
    • State machine: CLOSED → OPEN (after N crashes) → HALF_OPEN (probe) → CLOSED/OPEN
    • Exponential backoff on probe failures (configurable multiplier and max cooldown)
    • Mirrors mdemg's per-endpoint circuit breaker with half-open recovery
  • AnomalyDetector — multi-pattern detection across experiment history:
    • Plateau: no improvements in configurable window
    • VRAM creep: monotonically increasing memory usage across experiments
    • Systematic regression: BPB worsening over consecutive experiments
    • Crash clustering: high crash rate in recent experiments
  • BackpressureMonitor — VRAM pressure tracking:
    • Warning/critical thresholds based on GPU VRAM capacity
    • Trend analysis (increasing/stable/decreasing)
    • Actionable suggestions (reduce batch size, reduce model depth)
  • ExperimentGuard — unified pre/post experiment safety wrapper:
    • pre_experiment() → checks circuit breaker, VRAM pressure, anomalies → returns PreExperimentVerdict with allowed, blocked, warnings, suggestions
    • post_experiment() → updates all safety systems
    • Single integration point for all resilience features
  • Persists state to .autoresearch/resilience/state.json

guidance.py — Proactive Experiment Suggestions

Inspired by mdemg's Jiminy inner voice (internal/jiminy/) and RSIC (internal/ape/)

  • ExperimentAdvisor — synthesizes signals from memory, monitoring, and resilience:
    • Suggestion generation (4 sources, mirroring Jiminy's parallel fan-out):
      1. Hebbian memory associations → promising categories
      2. Plateau detection → radical change recommendations
      3. Surprise analysis → revisit unexpectedly bad results (try the opposite)
      4. Dead end avoidance → categories to skip
    • Contradiction detection: finds pairs where the same change category produced opposite outcomes — suggests context-dependent dynamics worth investigating
    • Strategy assessment (simplified RSIC assess phase):
      • Phase detection: exploring | exploiting | plateaued | recovering
      • Effectiveness score (0-1) blending keep rate with velocity trend
      • High-level strategy recommendations
    • Formatted guidance (get_guidance()["formatted"]): human-readable text block designed for injection into agent context before each experiment decision

Modified Files

program.md

  • Added comprehensive "Observability & Intelligence Modules" section (+205 lines)
  • Full usage documentation for all four modules with code examples
  • Environment variable reference for alert thresholds
  • Recommended integration pattern showing complete experiment loop
  • State directory documentation

analysis.ipynb

  • Change Category Effectiveness cell: builds Hebbian associations from results.tsv, plots horizontal bar charts of category weights and success rates
  • Monitoring Dashboard cell: loads session.json, plots VRAM trends and training durations, displays alert timeline
  • Guidance Report cell: generates and displays formatted guidance with suggestion table and contradictions

.gitignore

  • Added .autoresearch/ to exclude module state directories from version control

Design Decisions

  1. Zero new dependencies — all modules use Python stdlib only. No changes to pyproject.toml.
  2. Non-intrusive — modules never modify train.py or prepare.py. They observe and advise.
  3. Opt-in — every module is independently usable. You can use just monitor.py without memory.py, etc.
  4. Crash-resilient — all state persists to disk and gracefully handles corrupted state files.
  5. Hebbian tanh soft-capping over hard clamping — continuous learning without saturation walls.
  6. Circuit breaker with exponential backoff — progressively longer cooldowns prevent thrashing.

Test plan

  • Verify train.py and prepare.py are completely unmodified (zero diff)
  • Verify no new dependencies added to pyproject.toml
  • Import each module independently: python -c "import monitor", import memory", etc.
  • Run ExperimentTracker lifecycle: start_experimentrecord_stepend_experiment
  • Run ExperimentMemory.store_experiment() and verify .autoresearch/memory/memory.json is created
  • Run CircuitBreaker through CLOSED → OPEN → HALF_OPEN → CLOSED transition
  • Run ExperimentAdvisor.get_guidance() and verify formatted output
  • Run existing analysis.ipynb cells (original cells unchanged, new cells gracefully handle missing data)
  • Verify .autoresearch/ is gitignored

Authored-by: reh3376

Inspired by mdemg's production-grade AI memory infrastructure, adds four
optional modules that improve experiment effectiveness without modifying
the core train.py loop:

- monitor.py: Prometheus-compatible metrics, loss curve tracking, alerting
- memory.py: Hebbian association learning across experiment sessions
- resilience.py: circuit breakers, anomaly detection, VRAM backpressure
- guidance.py: proactive experiment suggestions (Jiminy-style inner voice)

Updates program.md with full integration documentation and enhances
analysis.ipynb with category effectiveness charts and guidance reports.

Authored-by: reh3376
… work

Comprehensive handoff document covering completed work (research,
implementation, documentation), suggested future work prioritized by
impact, architecture reference, module dependency graph, and onboarding
reading order.

Authored-by: reh3376
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants