diff --git a/.gitignore b/.gitignore
index ed88e063..d6d83b86 100644
--- a/.gitignore
+++ b/.gitignore
@@ -141,3 +141,4 @@ experiments/arc_bench/batch_logs/
 # README is tracked; any local checkout placed here stays local.
 external/agents/ColliderAgent/*
 !external/agents/ColliderAgent/README.md
+result/*
diff --git a/docs/CHANGELOG_ANTHROPIC_ADAPTER.md b/docs/CHANGELOG_ANTHROPIC_ADAPTER.md
deleted file mode 100644
index a7510b83..00000000
--- a/docs/CHANGELOG_ANTHROPIC_ADAPTER.md
+++ /dev/null
@@ -1,332 +0,0 @@
-# Anthropic Messages API Adapter — 改动说明
-
-> 本文档详细描述了为 ResearchClaw LLM 模块引入 Anthropic Messages API 原生支持的改动内容，
-> 并通过架构图说明本次改动 **不影响现有 OpenAI / OpenRouter / DeepSeek 等 provider 的任何行为**。
-
----
-
-## 目录
-
-1. [改动背景](#1-改动背景)
-2. [架构总览 — 改动前后对比](#2-架构总览--改动前后对比)
-3. [核心设计：适配器模式](#3-核心设计适配器模式)
-4. [调用流程详解](#4-调用流程详解)
-5. [对现有 Provider 零影响的保证](#5-对现有-provider-零影响的保证)
-6. [变更文件清单](#6-变更文件清单)
-7. [异常处理与重试机制](#7-异常处理与重试机制)
-8. [配置示例](#8-配置示例)
-9. [新增依赖](#9-新增依赖)
-
----
-
-## 1. 改动背景
-
-ResearchClaw 的 LLM 模块原先仅支持 **OpenAI Chat Completions API 格式**（含兼容此格式的 OpenRouter、DeepSeek 等）。
-Anthropic 的 Claude 系列模型使用独立的 **Messages API**，其请求/响应结构与 OpenAI 格式存在显著差异：
-
-| 差异点 | OpenAI 格式 | Anthropic 格式 |
-|---|---|---|
-| 认证方式 | `Authorization: Bearer <key>` | `x-api-key: <key>` |
-| System 消息 | 放在 `messages` 数组中 | 独立的 `system` 字段 |
-| 端点路径 | `/v1/chat/completions` | `/v1/messages` |
-| 响应结构 | `choices[0].message.content` | `content[0].text` |
-| Token 统计 | `prompt_tokens` / `completion_tokens` | `input_tokens` / `output_tokens` |
-
-为了原生支持 Anthropic API 而不影响现有功能，我们采用了 **适配器模式（Adapter Pattern）**。
-
----
-
-## 2. 架构总览 — 改动前后对比
-
-### 改动前
-
-```mermaid
-graph TB
-    subgraph "create_llm_client (工厂函数)"
-        A[config.llm.provider] -->|"acp"| B[ACPClient]
-        A -->|"其他所有"| C["内联构造 LLMClient<br/>使用 PROVIDER_PRESETS 填充 base_url"]
-    end
-
-    C --> D["_raw_call()<br/>urllib → OpenAI /chat/completions"]
-    D --> E[LLMResponse]
-
-    style B fill:#e1f5fe
-    style C fill:#e8f5e9
-    style D fill:#e8f5e9
-```
-
-### 改动后
-
-```mermaid
-graph TB
-    subgraph "create_llm_client (工厂函数)"
-        A[config.llm.provider] -->|"acp"| B[ACPClient]
-        A -->|"其他所有"| C["LLMClient.from_rc_config()<br/>使用 PROVIDER_PRESETS 填充 base_url"]
-    end
-
-    C -->|"provider == anthropic"| F["挂载 AnthropicAdapter"]
-    C -->|"其他 provider"| G["_anthropic = None"]
-
-    subgraph "_raw_call() 内部分支"
-        H{"self._anthropic<br/>是否存在?"}
-        H -->|"是 (Anthropic)"| I["AnthropicAdapter.chat_completion()<br/>httpx → Anthropic /v1/messages"]
-        H -->|"否 (OpenAI 等)"| J["原有逻辑不变<br/>urllib → OpenAI /chat/completions"]
-    end
-
-    F --> H
-    G --> H
-    I --> K["返回 OpenAI 兼容格式 dict"]
-    J --> K
-    K --> L["统一解析 → LLMResponse"]
-
-    style B fill:#e1f5fe
-    style F fill:#fff3e0
-    style I fill:#fff3e0
-    style G fill:#e8f5e9
-    style J fill:#e8f5e9
-    style L fill:#f3e5f5
-```
-
-> 绿色 = 原有逻辑（未修改），橙色 = 新增 Anthropic 路径，紫色 = 共享的统一出口。
-
----
-
-## 3. 核心设计：适配器模式
-
-```mermaid
-classDiagram
-    class LLMClient {
-        -LLMConfig config
-        -AnthropicAdapter _anthropic
-        +chat(messages, ...) LLMResponse
-        +preflight() tuple
-        -_call_with_retry(model, ...) LLMResponse
-        -_raw_call(model, ...) LLMResponse
-    }
-
-    class AnthropicAdapter {
-        -str base_url
-        -str api_key
-        -int timeout_sec
-        +chat_completion(model, messages, ...) dict
-    }
-
-    class LLMResponse {
-        +str content
-        +str model
-        +int prompt_tokens
-        +int completion_tokens
-    }
-
-    LLMClient "1" *-- "0..1" AnthropicAdapter : _anthropic
-    LLMClient ..> LLMResponse : returns
-    AnthropicAdapter ..> LLMResponse : "返回 OpenAI 兼容 dict\n由 LLMClient 统一解析"
-
-    note for AnthropicAdapter "仅当 provider=='anthropic' 时实例化\n其他 provider 时 _anthropic = None"
-```
-
-**关键设计决策：**
-
-- `AnthropicAdapter` 是 `LLMClient` 的一个 **可选内部组件**，不是独立的客户端类
-- 适配器返回 **OpenAI 兼容格式的 dict**，由 `_raw_call()` 的统一出口解析为 `LLMResponse`
-- 当 `_anthropic is None` 时，`_raw_call()` 走 **完全不变的原有 OpenAI 路径**
-
----
-
-## 4. 调用流程详解
-
-以下时序图展示了两种 provider 各自的完整调用链路：
-
-### OpenAI / OpenRouter / DeepSeek（原有流程，零改动）
-
-```mermaid
-sequenceDiagram
-    participant Caller as 调用方
-    participant Client as LLMClient
-    participant Raw as _raw_call()
-    participant API as OpenAI API
-
-    Caller->>Client: chat(messages)
-    Client->>Client: _call_with_retry(model, ...)
-    Client->>Raw: _raw_call(model, ...)
-    Note over Raw: self._anthropic is None<br/>→ 走 else 分支 (原有逻辑)
-    Raw->>API: urllib POST /chat/completions
-    API-->>Raw: {"choices": [...], "usage": {...}}
-    Raw-->>Client: LLMResponse
-    Client-->>Caller: LLMResponse
-```
-
-### Anthropic（新增流程）
-
-```mermaid
-sequenceDiagram
-    participant Caller as 调用方
-    participant Client as LLMClient
-    participant Raw as _raw_call()
-    participant Adapter as AnthropicAdapter
-    participant API as Anthropic API
-
-    Caller->>Client: chat(messages)
-    Client->>Client: _call_with_retry(model, ...)
-    Client->>Raw: _raw_call(model, ...)
-    Note over Raw: self._anthropic 存在<br/>→ 走 if 分支
-    Raw->>Adapter: chat_completion(model, messages, ...)
-    Note over Adapter: 1. 提取 system 消息<br/>2. 构建 Anthropic 请求体<br/>3. httpx POST /v1/messages
-    Adapter->>API: httpx POST /v1/messages
-    API-->>Adapter: {"content": [...], "usage": {...}}
-    Note over Adapter: 转换为 OpenAI 兼容格式
-    Adapter-->>Raw: {"choices": [...], "usage": {...}}
-    Note over Raw: 统一解析（与 OpenAI 路径完全相同）
-    Raw-->>Client: LLMResponse
-    Client-->>Caller: LLMResponse
-```
-
----
-
-## 5. 对现有 Provider 零影响的保证
-
-```mermaid
-graph LR
-    subgraph "provider != 'anthropic' 时的代码路径"
-        A["from_rc_config()"] --> B["PROVIDER_PRESETS 填充 base_url ✅"]
-        B --> C["LLMClient.__init__()"]
-        C --> D["self._anthropic = None"]
-        D --> E["_raw_call()"]
-        E --> F{"self._anthropic?"}
-        F -->|"None → False"| G["else 分支<br/>原有 OpenAI 逻辑<br/>（代码未修改）"]
-    end
-
-    style G fill:#e8f5e9,stroke:#4caf50,stroke-width:3px
-    style F fill:#fff9c4
-```
-
-**零影响的 5 重保证：**
-
-| # | 保证机制 | 说明 |
-|---|---|---|
-| 1 | **条件初始化** | `AnthropicAdapter` 仅在 `provider == "anthropic"` 时实例化，其他 provider 不触发任何新代码 |
-| 2 | **`_anthropic = None`** | `__init__` 中默认设为 `None`，非 Anthropic provider 永远不会进入适配器分支 |
-| 3 | **else 分支 = 原代码** | `_raw_call()` 的 else 分支包含的是 **未修改的** OpenAI urllib 调用逻辑 |
-| 4 | **PROVIDER_PRESETS 保留** | 恢复了 preset base_url 回退逻辑，`openai` / `openrouter` / `deepseek` 的自动 URL 填充行为与之前一致 |
-| 5 | **统一出口** | 两条路径最终都产出相同结构的 dict，由同一段代码解析为 `LLMResponse` |
-
-### PROVIDER_PRESETS 对照表
-
-```mermaid
-graph TD
-    subgraph "PROVIDER_PRESETS（base_url 自动填充）"
-        P1["openai → https://api.openai.com/v1"]
-        P2["openrouter → https://openrouter.ai/api/v1"]
-        P3["deepseek → https://api.deepseek.com/v1"]
-        P4["anthropic → https://api.anthropic.com"]
-        P5["openai-compatible → 用户自定义 base_url"]
-    end
-
-    P1 --> |"不变 ✅"| OK1[" "]
-    P2 --> |"不变 ✅"| OK2[" "]
-    P3 --> |"不变 ✅"| OK3[" "]
-    P4 --> |"新增"| OK4[" "]
-    P5 --> |"不变 ✅"| OK5[" "]
-
-    style P1 fill:#e8f5e9
-    style P2 fill:#e8f5e9
-    style P3 fill:#e8f5e9
-    style P4 fill:#fff3e0
-    style P5 fill:#e8f5e9
-```
-
----
-
-## 6. 变更文件清单
-
-| 文件路径 | 变更类型 | 改动说明 |
-|---|---|---|
-| `researchclaw/llm/__init__.py` | 修改 | 添加 `"anthropic"` preset；简化工厂函数委托给 `from_rc_config()` |
-| `researchclaw/llm/client.py` | 修改 | `from_rc_config()` 恢复 PRESETS 逻辑 + 条件挂载适配器；`_raw_call()` 添加 if/else 分支 |
-| `researchclaw/llm/anthropic_adapter.py` | **新增** | `AnthropicAdapter` 类 — Anthropic Messages API → OpenAI 兼容格式转换 |
-| `tests/test_anthropic.py` | **新增** | Anthropic API 连通性测试脚本 |
-| `pyproject.toml` | 修改 | 添加 `httpx` 为 optional dependency (`[anthropic]` extra) |
-| `.gitignore` | 修改 | 添加 `run.log` |
-
----
-
-## 7. 异常处理与重试机制
-
-Anthropic 适配器内部将 httpx 异常 **转换为 urllib 标准异常**，确保上层重试逻辑无需修改：
-
-```mermaid
-graph TD
-    subgraph "AnthropicAdapter 内部"
-        A["httpx.HTTPStatusError<br/>(4xx/5xx)"] -->|转换| B["urllib.error.HTTPError<br/>(保留 status_code)"]
-        C["httpx.ConnectError<br/>httpx.TimeoutException"] -->|转换| D["urllib.error.URLError"]
-    end
-
-    subgraph "_call_with_retry() — 不变"
-        B --> E{"status code?"}
-        E -->|"429/500/502/503/504"| F["指数退避重试 ✅"]
-        E -->|"400"| G["立即抛出（Bad Request）"]
-        E -->|"403 + model forbidden"| H["跳到下一个 fallback model"]
-        D --> I["重试直到耗尽 ✅"]
-    end
-
-    style A fill:#fff3e0
-    style C fill:#fff3e0
-    style B fill:#e8f5e9
-    style D fill:#e8f5e9
-```
-
-这意味着 Anthropic 路径享有与 OpenAI 路径 **完全相同的重试策略**：指数退避 + jitter + model fallback chain。
-
----
-
-## 8. 配置示例
-
-### 使用 Anthropic（新增）
-
-```yaml
-llm:
-  provider: anthropic
-  # base_url 可省略，自动使用 https://api.anthropic.com
-  api_key_env: ANTHROPIC_API_KEY
-  primary_model: claude-sonnet-4-20250514
-  fallback_models:
-    - claude-haiku-4-5-20251001
-```
-
-### 使用 OpenAI（不变）
-
-```yaml
-llm:
-  provider: openai
-  # base_url 可省略，自动使用 https://api.openai.com/v1
-  api_key_env: OPENAI_API_KEY
-  primary_model: gpt-4o
-  fallback_models:
-    - gpt-4.1
-    - gpt-4o-mini
-```
-
-### 使用 OpenRouter（不变）
-
-```yaml
-llm:
-  provider: openrouter
-  api_key_env: OPENROUTER_API_KEY
-  primary_model: anthropic/claude-sonnet-4-20250514
-```
-
----
-
-## 9. 新增依赖
-
-| 依赖 | 版本要求 | 安装方式 | 说明 |
-|---|---|---|---|
-| `httpx` | `>=0.24` | `pip install researchclaw[anthropic]` | **可选依赖**，仅 Anthropic provider 需要 |
-
-不使用 Anthropic provider 的用户 **无需安装 httpx**，`pip install researchclaw` 的行为完全不变。
-
----
-
-> **总结**: 本次改动通过适配器模式在 `_raw_call()` 内部添加了一条 Anthropic 专用路径。
-> 当 provider 不是 `"anthropic"` 时，`self._anthropic` 为 `None`，代码执行路径与改动前 **完全一致**，
-> 不触及任何新增代码，不引入任何新依赖。
diff --git a/docs/DOMAIN_INTEGRATION_GUIDE.md b/docs/DOMAIN_INTEGRATION_GUIDE.md
deleted file mode 100644
index f29d15dd..00000000
--- a/docs/DOMAIN_INTEGRATION_GUIDE.md
+++ /dev/null
@@ -1,868 +0,0 @@
-# Domain Integration Guide
-
-> Audience: domain experts (chemistry, neuroscience, biology, materials, …) who already have a curated set of prompts and want to plug their domain into AutoResearchClaw end-to-end.
-> Working example throughout: the existing **`hep_ph`** integration (ColliderAgent + JHEP).
-
-## 1. Elevator Summary
-
-AutoResearchClaw runs a **fixed 23-stage research pipeline** (topic init → literature → hypothesis → experiment design → code generation → execution → analysis → paper draft → review → revision → export). The pipeline runner, gates, evaluators, LLM dispatch, and the experiment-config plumbing are **domain-agnostic**. To integrate a new domain you do not modify any of that. You add a small **plug-in surface** consisting of (at minimum) a profile YAML and a detector keyword tuple, and (at most) a new prompt bank, an adapter class, an experiment-mode sandbox, and a LaTeX template. The selected domain id flows through `PromptManager(domain=...)`, `get_adapter(...)`, and (optionally) `create_sandbox(config.mode == "<id>_agent")`. Every other stage handler reads the same generic API and never needs to know your domain exists.
-
-## 2. Architecture (5 Layers)
-
-```
-   ┌─────────────────────────────────────────────────────────────────┐
-   │  USER:  --profile <id>   OR   topic auto-detected by keywords    │
-   └─────────────────────────────┬────────────────────────────────────┘
-                                 │
-                                 ▼
- ┌──────────────────────────────────────────────────────────────────────┐
- │ LAYER 1 — Profile YAML (declarative metadata)                        │
- │   researchclaw/domains/profiles/<id>.yaml                            │
- │   • preferred_experiment_mode, preferred_target_conference           │
- │   • condition_terminology, typical_file_structure, baselines, …     │
- │   Loaded by: researchclaw.domains.detector.load_all_profiles()       │
- │   Consumed by: deploy.py (defaults), DomainProfile fields everywhere │
- └─────────────────────────────────┬────────────────────────────────────┘
-                                   │
-                                   ▼
- ┌──────────────────────────────────────────────────────────────────────┐
- │ LAYER 2 — Prompt Adapter (per-stage block overlay, Python class)     │
- │   researchclaw/domains/adapters/<id>.py                              │
- │   class <Id>PromptAdapter(PromptAdapter):                            │
- │       get_code_generation_blocks(ctx)  -> PromptBlocks(...)          │
- │       get_experiment_design_blocks(ctx) -> PromptBlocks(...)         │
- │       get_result_analysis_blocks(ctx)   -> PromptBlocks(...)         │
- │       get_export_publish_blocks(ctx)    -> preferred_template, …    │
- │   Registered in: prompt_adapter.py:_build_adapter_registry           │
- └─────────────────────────────────┬────────────────────────────────────┘
-                                   │  (optional — only if narrative
-                                   │   prose differs from ML defaults)
-                                   ▼
- ┌──────────────────────────────────────────────────────────────────────┐
- │ LAYER 3 — Prompt Bank (full STAGES dict, Python module)              │
- │   researchclaw/prompts/<id>.py                                       │
- │   STAGES = { "topic_init": {...}, ..., "export_publish": {...} }     │
- │   DEBATE_ROLES_HYPOTHESIS, DEBATE_ROLES_ANALYSIS                     │
- │   Loaded by: prompts/manager.py:_load_bank(domain)                   │
- │   MUST share stage keys + placeholders with prompts/ml.py            │
- │   (parity test in tests/test_prompt_bank_parity.py enforces this)    │
- └─────────────────────────────────┬────────────────────────────────────┘
-                                   │
-                                   ▼
- ┌──────────────────────────────────────────────────────────────────────┐
- │ LAYER 4 — Detector Keyword Rule (auto-routing)                       │
- │   researchclaw/domains/detector.py:_KEYWORD_RULES (lines ~245-351)   │
- │   ([... keyword phrases ...], "<id>")                                │
- │   Most-specific-first: HEP rule comes BEFORE generic "particle       │
- │   physics" so dark-matter topics route to hep_ph not physics_*       │
- └─────────────────────────────────┬────────────────────────────────────┘
-                                   │
-                                   ▼ (optional, only when domain wraps
-                                     an external Claude-Code subagent)
- ┌──────────────────────────────────────────────────────────────────────┐
- │ LAYER 5 — Experiment Mode + Sandbox + LaTeX Template                 │
- │   researchclaw/config.py: EXPERIMENT_MODES set + <Id>AgentConfig      │
- │   researchclaw/experiment/<id>_agent_sandbox.py: SandboxProtocol     │
- │   researchclaw/experiment/factory.py: dispatch in create_sandbox()    │
- │   researchclaw/templates/conference.py: LaTeX template registry     │
- │   Hep example: collider_agent / ColliderAgentConfig / JHEP          │
- └──────────────────────────────────────────────────────────────────────┘
-```
-
-The narrow interface is intentional. Stages 0–23 of the pipeline runner stay generic; the only stage code that knows your domain exists is the prompt manager (which returns a `RenderedPrompt`) and — if you add Layer 5 — the sandbox factory.
-
-## 3. The 7-File Checklist
-
-When adding a new domain, every place that may need touching, in order:
-
-| # | File | Action | Mandatory? |
-|---|---|---|---|
-| 1 | `researchclaw/domains/profiles/<id>.yaml` | **CREATE** — declarative metadata, deploy defaults | yes |
-| 2 | `researchclaw/domains/adapters/<id>.py` | **CREATE** — `<Id>PromptAdapter` subclass | yes |
-| 3 | `researchclaw/domains/adapters/__init__.py` **or** `researchclaw/domains/prompt_adapter.py:_build_adapter_registry` (lines 276–328) | **REGISTER** — add lazy import + prefix mapping | yes |
-| 4 | `researchclaw/prompts/<id>.py` | **CREATE** — full `STAGES` dict + debate roles | only if you need stage-level prose forks (otherwise the ML bank is reused) |
-| 5 | `researchclaw/prompts/manager.py:_load_bank` (lines 74–92) | **REGISTER** — `elif domain == "<id>": from researchclaw.prompts import <id> as _bank` | only if you added file 4 |
-| 6 | `researchclaw/prompts/manager.py:SUPPORTED_DOMAINS` (line 31) | **APPEND** — add `"<id>"` to the tuple | only if you added file 4 |
-| 7 | `researchclaw/domains/detector.py:_KEYWORD_RULES` (lines 245–351) | **APPEND** — `(["kw1", "kw2", ...], "<id>")` tuple | yes (otherwise auto-routing won't find you) |
-| 8 (opt) | `researchclaw/config.py:EXPERIMENT_MODES` (lines 98–106) + new `<Id>AgentConfig` dataclass + `_parse_<id>_agent_config` | **APPEND** | only if Layer 5 |
-| 9 (opt) | `researchclaw/experiment/<id>_agent_sandbox.py` | **CREATE** mirroring `collider_agent_sandbox.py` | only if Layer 5 |
-| 10 (opt) | `researchclaw/experiment/factory.py:create_sandbox` (around line 85) | **APPEND** `if config.mode == "<id>_agent": ...` | only if Layer 5 |
-| 11 (opt) | `researchclaw/templates/conference.py` | **APPEND** template entry + registry alias | only if you have a domain-native LaTeX style |
-
-For the simplest possible new domain (e.g. plain Python analysis), files 1, 2, 3, 7 are enough. The full HEP integration uses all 11.
-
-## 4. Profile YAML Skeleton
-
-Modeled on `researchclaw/domains/profiles/hep_ph.yaml`. Every field carries a comment naming where it is consumed.
-
-```yaml
-# ── Identity ──────────────────────────────────────────────────────────────
-domain_id: my_domain                # MUST equal the filename stem; used as
-                                    #   the registry key in prompt_adapter,
-                                    #   manager.py, factory.py, etc.
-display_name: My Domain Name        # Human-readable; surfaces in prompts
-                                    #   via DomainProfile.display_name
-parent_domain: my_domain            # Free-form taxonomy parent; used by
-                                    #   evaluator/grouping logic only
-
-# ── Deployment defaults ───────────────────────────────────────────────────
-# Consumed by researchclaw.domains.deploy when this profile is selected.
-# Each key is applied only if the user's config.yaml leaves the slot blank.
-preferred_experiment_mode: sandbox          # → experiment.mode (one of
-                                            #   EXPERIMENT_MODES in config.py:98)
-preferred_project_mode: full-auto           # → project.mode (PROJECT_MODES)
-preferred_target_conference: neurips        # → export.target_conference
-default_time_budget_sec: 1800               # → experiment.time_budget_sec
-default_max_iterations: 5                   # → experiment.max_iterations
-default_metric_key: primary_metric          # → experiment.metric_key
-default_metric_direction: maximize          # → experiment.metric_direction
-
-# ── Optional: external-agent block (mirrors collider_agent: in hep_ph.yaml)
-# Only set this if you added a LAYER 5 sandbox + EXPERIMENT_MODE.
-# my_domain_agent:
-#   timeout_sec: 3600
-#   max_turns: 100
-#   install_skills: true
-#   extra_args:
-#     - "--dangerously-skip-permissions"
-
-# ── Experiment paradigm ───────────────────────────────────────────────────
-experiment_paradigm: simulation     # one of: simulation, convergence,
-                                    #   progressive_spec, benchmark, …
-                                    # GenericPromptAdapter switches default
-                                    #   code-gen blurbs based on this value
-                                    #   (see prompt_adapter.py:204-235)
-
-# ── Domain vocabulary mapping (drives prompt phrasing) ────────────────────
-condition_terminology:              # Used by GenericPromptAdapter to
-                                    #   render experiment-design context.
-                                    #   HEP example: "BSM model" instead
-                                    #   of "method", "exclusion limit"
-                                    #   instead of "accuracy".
-  baseline: existing literature baseline / control measurement
-  proposed: new method or model under test
-  variant: parameter / hyperparameter variation
-  input: dataset / sample / system being studied
-  metric: primary success quantity for this domain
-
-# ── Code-gen file scaffold (consumed by adapter.get_blueprint_context) ────
-typical_file_structure:             # Renders into the blueprint prompt
-                                    #   as a Recommended File Structure
-                                    #   block (see prompt_adapter.py:94-97)
-  model.py: "Core algorithm or model definition"
-  analysis.py: "Run experiments and gather statistics"
-  main.py: "Entry point: orchestrate model + analysis + report"
-
-entry_point: main.py                # Documented as the main script the
-                                    #   sandbox runner will invoke
-
-core_libraries:                     # Listed in blueprint prompt and used
-                                    #   by GenericPromptAdapter as the
-                                    #   "Core libraries: ..." line.
-  - numpy
-  - scipy
-  - matplotlib
-
-docker_image: researchclaw/sandbox-generic:latest   # Consumed by deploy.py
-gpu_required: false                                  # Consumed by deploy.py
-
-pip_packages:                       # Auto-installed in sandbox / Docker
-  - numpy
-  - scipy
-  - matplotlib
-
-# ── Result shape ──────────────────────────────────────────────────────────
-metric_types:                       # Drives output_format_guidance in the
-                                    #   GenericPromptAdapter
-                                    #   (see prompt_adapter.py:252-267).
-                                    #   Allowed: scalar, table, structured,
-                                    #   convergence
-  - scalar
-  - structured
-
-standard_baselines:                 # Surfaces in experiment_design context
-                                    #   via GenericPromptAdapter:170-178.
-  - first canonical baseline name (with citation tag)
-  - second baseline
-  - …
-
-evaluation_protocol: >              # Multi-line prose; injected into the
-                                    #   experiment_design stage as part of
-                                    #   the rendered context.
-  Free-form description of how a successful experiment is measured: what
-  is computed, what is compared against, what is the success criterion.
-
-statistical_tests:                  # Listed in result_analysis stage via
-                                    #   GenericPromptAdapter:191-202.
-  - t_test
-  - bootstrap_ci
-
-output_formats:                     # Consumed by writing/export stage
-  - latex_table
-  - line_plot
-
-figure_types:                       # Hints for the figure agent
-  - bar_chart_metric_vs_baseline
-  - learning_curve
-
-github_search_terms:                # Used by literature search heuristics
-  - <plain English search snippet>
-
-paper_keywords:                     # Used in paper_outline / metadata
-  - keyword 1
-  - keyword 2
-
-# ── Final blueprint hints (the only narrative text in this YAML) ──────────
-# This is read by PromptAdapter.get_blueprint_context() and concatenated
-# verbatim onto the code-blueprint prompt (prompt_adapter.py:101-103).
-# Keep it short and concrete — instructions, not philosophy.
-code_generation_hints: |
-  Domain code requirements:
-  1. <do this>
-  2. <use that library>
-  3. <write outputs to results.json with these keys>
-
-  ANTI-PATTERNS for <my_domain> (DO NOT do these):
-  - <thing the model gets wrong by default>
-  - <another thing>
-```
-
-> **Tip — narrative belongs in the prompt bank, not the YAML.** After the Phase-B refactor (see `researchclaw/prompts/hep.py:1-23`), per-stage prose moved out of the YAML and into a dedicated bank module. Only `code_generation_hints` is left in YAML because it feeds the blueprint context.
-
-## 5. Prompt Adapter Skeleton
-
-Inherit from `PromptAdapter` (defined at `researchclaw/domains/prompt_adapter.py:52`). Each method returns a `PromptBlocks` dataclass (`prompt_adapter.py:26-49`); empty fields fall back to defaults.
-
-```python
-# researchclaw/domains/adapters/my_domain.py
-"""My-domain prompt adapter."""
-from __future__ import annotations
-from typing import Any
-from researchclaw.domains.prompt_adapter import PromptAdapter, PromptBlocks
-
-
-class MyDomainPromptAdapter(PromptAdapter):
-    """Adapter for <description of domain>."""
-
-    # ── Stage 11 (code generation) ──────────────────────────────────────
-    def get_code_generation_blocks(self, context: dict[str, Any]) -> PromptBlocks:
-        # Return PromptBlocks() (all empty) if you have a full prompt bank
-        # in researchclaw/prompts/<id>.py — the bank already covers this
-        # stage natively. (HEPPhPromptAdapter at hep_ph.py:24-25 does this.)
-        #
-        # If you don't have a bank, the GenericPromptAdapter pattern is to
-        # populate the four common blocks from DomainProfile fields:
-        domain = self.domain
-        return PromptBlocks(
-            compute_budget=domain.compute_budget_guidance or "",
-            dataset_guidance=domain.dataset_guidance or "",
-            hp_reporting=domain.hp_reporting_guidance or "",
-            code_generation_hints=domain.code_generation_hints or "",
-            output_format_guidance=(
-                'Output results to results.json:\n'
-                '{"conditions": {"method": {"metric": value}}, '
-                '"metadata": {"domain": "my_domain"}}'
-            ),
-        )
-
-    # ── Stage 10 (experiment design) ────────────────────────────────────
-    def get_experiment_design_blocks(self, context: dict[str, Any]) -> PromptBlocks:
-        domain = self.domain
-        design_context = (
-            f"This is a **{domain.display_name}** experiment.\n\n"
-            "Key principles:\n"
-            "1. <first principle>\n"
-            "2. <use canonical metric M>\n"
-            "3. <compare against established baselines B1, B2>\n"
-        )
-        return PromptBlocks(
-            experiment_design_context=design_context,
-            statistical_test_guidance=(
-                "Use <test family> for significance; apply <correction> "
-                "for multiple comparisons."
-            ),
-        )
-
-    # ── Stage 13 (result analysis) ──────────────────────────────────────
-    def get_result_analysis_blocks(self, context: dict[str, Any]) -> PromptBlocks:
-        return PromptBlocks(
-            result_analysis_hints=(
-                "My-domain result analysis:\n"
-                "- Report <metric A>, <metric B>, runtime\n"
-                "- Distinguish primary vs secondary findings\n"
-            ),
-        )
-
-    # ── Stage 22 (export / publish) ─────────────────────────────────────
-    def get_export_publish_blocks(self, context: dict[str, Any]) -> PromptBlocks:
-        # Used when domain has its own LaTeX template and final-pass
-        # formatting rules. HEP example: hep_ph.py:33-44.
-        guidance = (
-            "This is a <my_domain> manuscript; the export pass must "
-            "preserve <domain conventions>. Do NOT insert "
-            "<inappropriate template artifacts>."
-        )
-        return PromptBlocks(
-            export_publish_guidance=guidance,
-            preferred_template="my_template",   # registered in
-                                                #   templates/conference.py
-        )
-
-    # ── Blueprint hint (called from blueprint stage) ────────────────────
-    # Inherits the default from PromptAdapter.get_blueprint_context (lines
-    # 87-105), which auto-renders typical_file_structure + core_libraries
-    # + code_generation_hints from the YAML. Override only if you need
-    # something dynamic.
-```
-
-### How the `PromptBlocks` fields map to rendered prompts
-
-| `PromptBlocks` field | Stage that reads it | Effect |
-|---|---|---|
-| `compute_budget` | code generation | Replaces the default budget paragraph |
-| `dataset_guidance` | code generation | Replaces the default dataset paragraph |
-| `hp_reporting` | code generation | Replaces the default hyperparameter format |
-| `code_generation_hints` | code generation | Replaces "domain hints" block |
-| `output_format_guidance` | code generation | Replaces results.json schema example |
-| `experiment_design_context` | experiment design | Replaces the high-level domain pitch |
-| `statistical_test_guidance` | experiment design + result analysis | Names the appropriate stat tests |
-| `result_analysis_hints` | result analysis | Replaces the default analysis checklist |
-| `export_publish_guidance` | export/publish (stage 22) | Final-pass formatting rule (no NeurIPS checklists in JHEP, etc.) |
-| `preferred_template` | export/publish | Selects a `templates/conference.py` entry when user hasn't set `export.target_conference` |
-
-Real-world examples to crib from:
-- `researchclaw/domains/adapters/biology.py` — full `GenericPromptAdapter`-style overlay (recommended starting point for bio/chem/neuro).
-- `researchclaw/domains/adapters/hep_ph.py` — minimal adapter that returns empty blocks because all narrative lives in the prompt bank `prompts/hep.py`.
-
-## 6. Prompt Bank Skeleton (optional — file 4)
-
-If your domain's per-stage prose deviates substantially from ML defaults (the HEP-ph case), create a dedicated bank module. The contract:
-
-1. Module exposes `STAGES: dict[str, dict[str, Any]]` whose keys **exactly match** those in `researchclaw/prompts/ml.py`. The parity test `tests/test_prompt_bank_parity.py` will fail if you add or drop a stage.
-2. Each stage value is `{"system": str, "user": str, "json_mode": bool, "max_tokens": int}` (the latter two are optional; defaults `False` / `None`).
-3. The `user` template uses `{placeholder}` substitution (the regex in `prompts/manager.py:39-51` is `r"\{(\w+)\}"`). The set of placeholders for each stage **must match the corresponding ML stage** so the call sites in the pipeline (which pass the same kwargs regardless of domain) work unchanged.
-4. Optionally export `DEBATE_ROLES_HYPOTHESIS` and `DEBATE_ROLES_ANALYSIS` — dicts of `{role_name: {"system": ..., "user": ...}}` used by the multi-agent debate at the hypothesis/analysis stages. ML uses innovator/pragmatist/contrarian; HEP-ph uses theorist/phenomenologist/experimentalist (`prompts/hep.py:34-192`).
-
-The full canonical stage list (from `prompts/hep.py`):
-
-```
-topic_init, problem_decompose, search_strategy, literature_collect,
-literature_screen, knowledge_extract, synthesis, hypothesis_gen,
-experiment_design, code_generation, resource_planning, result_analysis,
-research_decision, paper_outline, paper_draft, peer_review,
-paper_revision, quality_gate, knowledge_archive, export_publish
-```
-
-Example stage entry — `hypothesis_gen` from `prompts/hep.py:437-488`:
-
-```python
-STAGES["hypothesis_gen"] = {
-    "system": (
-        "You formulate testable HEP-phenomenology hypotheses that address "
-        "gaps NOT covered by existing experimental results. Every "
-        "hypothesis must be:\n"
-        "1. NOVEL: Not replicating a published recast or an existing "
-        "collaboration exclusion.\n"
-        "..."
-    ),
-    "user": (
-        "Generate at least 2 falsifiable HEP-ph hypotheses from the "
-        "synthesis below.\n"
-        "For each hypothesis provide:\n"
-        "- **Hypothesis statement**: A clear claim in physics language, "
-        "  naming the BSM model / operator and the observable.\n"
-        "..."
-        "{domain_context}"
-        "Synthesis:\n{synthesis}"
-    ),
-}
-```
-
-The two placeholders `{domain_context}` and `{synthesis}` are exactly the kwargs the hypothesis stage handler passes — so the same handler works for both `prompts/ml.py` and `prompts/hep.py`.
-
-### Wiring the bank into the manager
-
-After creating `researchclaw/prompts/my_domain.py`, two single-line edits:
-
-**`researchclaw/prompts/manager.py:31`**
-```python
-SUPPORTED_DOMAINS = ("ml", "hep_ph", "my_domain")
-```
-
-**`researchclaw/prompts/manager.py:74-92`** (`_load_bank`)
-```python
-def _load_bank(domain: str) -> tuple[...]:
-    if domain == "hep_ph":
-        from researchclaw.prompts import hep as _bank
-    elif domain == "my_domain":
-        from researchclaw.prompts import my_domain as _bank
-    else:
-        from researchclaw.prompts import ml as _bank
-    ...
-```
-
-Anything not in `SUPPORTED_DOMAINS` falls back to the ML bank — see line 119:  `self._domain = domain if domain in SUPPORTED_DOMAINS else "ml"`. So if you forget to register, your domain silently uses ML prose.
-
-## 7. Detector Keyword Rule
-
-Auto-routing happens via a flat list of `(phrases, domain_id)` tuples scanned top-to-bottom in `researchclaw/domains/detector.py:_KEYWORD_RULES` (lines 245–351). The first rule whose keyword list intersects the topic text wins.
-
-Rule format — one line per domain, most-specific-first:
-
-```python
-(["dark matter", "wimp", "direct detection", "dark photon",
-  "axion", "neutralino", "bsm", "beyond standard model",
-  "effective field theory", "relic density", "annihilation cross section",
-  "hep-ph", "hep-ex", "madgraph5", "feynrules", "delphes", "pythia8",
-  "collider phenomenology", "monojet", "mono-x", "missing et",
-  "simplified model", "mediator mass", "portal interaction",
-  "spin-independent", "spin-dependent", "xenon1t", "pandax", "lz experiment",
-  "exclusion contour", "atlas dark matter", "cms dark matter"],
- "hep_ph"),
-```
-
-**Ordering rule (most-specific-first).** Place narrower domains before broader ones. The HEP-ph rule sits at lines 283–292 of `detector.py`, **before** the generic physics rules at lines 298–307, so a topic about "dark matter direct detection" hits `hep_ph` rather than being swallowed by the broader `physics_*` block. Similarly the neuroscience rules (lines 264–276) precede the ML catch-all (lines 279–281) so "spiking neural" routes to `neuroscience_computational` instead of ML.
-
-When you add your domain, ask: *which already-listed rule could accidentally swallow my topic?* Place your tuple **above** that rule.
-
-## 8. External-Agent Integration (the ColliderAgent pattern)
-
-When your domain pipeline isn't a single Python script but a multi-tool toolchain — ColliderAgent runs Lagrangian → FeynRules → UFO → MadGraph5 → Delphes → MadAnalysis5 — wrap it in an experiment mode. The plug-in surface is exactly five edits:
-
-### Step 1 — Register the mode
-
-`researchclaw/config.py:98-106`
-```python
-EXPERIMENT_MODES = {
-    "simulated",
-    "sandbox",
-    "docker",
-    "ssh_remote",
-    "colab_drive",
-    "agentic",
-    "collider_agent",     # ← existing example
-    "my_domain_agent",    # ← your addition
-}
-```
-
-### Step 2 — Define the config dataclass
-
-Mirror `ColliderAgentConfig` at `config.py:296-333`:
-
-```python
-@dataclass(frozen=True)
-class MyDomainAgentConfig:
-    """Configuration for my-domain external-agent experiment mode."""
-
-    # Path to your toolchain repo (used to install skills/agents)
-    my_domain_agent_dir: str = "/path/to/MyDomainAgent"
-    working_dir: str = "my_domain_workspace"
-    timeout_sec: int = 3600
-    claude_binary: str = ""                        # auto-detect if empty
-    extra_args: tuple[str, ...] = ("--dangerously-skip-permissions",)
-    install_skills: bool = True                    # copy skills → ~/.claude
-    max_turns: int = 100
-    # Add cloud creds, GPU flags, etc. as needed
-    incremental: bool = False
-```
-
-### Step 3 — Wire it into `ExperimentConfig`
-
-`researchclaw/config.py:475` — append a field next to `collider_agent`:
-
-```python
-@dataclass(frozen=True)
-class ExperimentConfig:
-    ...
-    collider_agent: ColliderAgentConfig = field(default_factory=ColliderAgentConfig)
-    my_domain_agent: MyDomainAgentConfig = field(default_factory=MyDomainAgentConfig)   # ← add
-    ...
-```
-
-### Step 4 — Add the parser
-
-`researchclaw/config.py:1094-1110` — mirror `_parse_collider_agent_config`:
-
-```python
-def _parse_my_domain_agent_config(data: dict[str, Any]) -> MyDomainAgentConfig:
-    if not data:
-        return MyDomainAgentConfig()
-    extra_raw = data.get("extra_args", ("--dangerously-skip-permissions",))
-    if isinstance(extra_raw, str):
-        extra_raw = [extra_raw]
-    return MyDomainAgentConfig(
-        my_domain_agent_dir=data.get("my_domain_agent_dir", "/path/to/MyDomainAgent"),
-        working_dir=data.get("working_dir", "my_domain_workspace"),
-        timeout_sec=_safe_int(data.get("timeout_sec"), 3600),
-        claude_binary=data.get("claude_binary", ""),
-        extra_args=tuple(extra_raw),
-        install_skills=bool(data.get("install_skills", True)),
-        max_turns=_safe_int(data.get("max_turns"), 100),
-    )
-```
-
-Then call it from `_parse_experiment_config` (around line 1179):
-
-```python
-my_domain_agent=_parse_my_domain_agent_config(data.get("my_domain_agent") or {}),
-```
-
-### Step 5 — Create the sandbox class
-
-`researchclaw/experiment/my_domain_agent_sandbox.py` — mirror `collider_agent_sandbox.py` (≈656 lines). The minimum viable shape:
-
-```python
-"""MyDomainAgent sandbox — runs <domain> experiments via Claude Code."""
-from __future__ import annotations
-import json, os, shutil, subprocess, time
-from pathlib import Path
-from typing import Any
-
-from researchclaw.config import MyDomainAgentConfig
-from researchclaw.experiment.sandbox import SandboxResult
-
-_PROMPT_FILENAME = "my_domain_plan.md"
-_CLAUDE_INSTRUCTION = "Execute the analysis following " + _PROMPT_FILENAME
-
-
-class MyDomainAgentSandbox:
-    def __init__(self, config: MyDomainAgentConfig, workdir: Path) -> None:
-        self.config = config
-        self.workdir = workdir
-
-    def run(self, prompt_text: str, *, timeout_sec: int | None = None) -> SandboxResult:
-        timeout_sec = timeout_sec if timeout_sec is not None else self.config.timeout_sec
-        workspace = self._prepare_workspace(prompt_text)
-        cmd = self._build_command()
-
-        start = time.monotonic()
-        try:
-            proc = subprocess.run(
-                cmd, cwd=str(workspace), env=self._build_env(),
-                capture_output=True, text=True, timeout=timeout_sec,
-            )
-            returncode, stdout, stderr, timed_out = proc.returncode, proc.stdout, proc.stderr, False
-        except subprocess.TimeoutExpired as exc:
-            returncode, stdout, stderr, timed_out = -1, exc.stdout or "", exc.stderr or "", True
-
-        elapsed = time.monotonic() - start
-        artifacts = self._collect_artifacts(workspace)
-        metrics = {
-            "my_domain_agent_success": 1.0 if returncode == 0 and not timed_out else 0.0,
-            "figures_produced": float(len(artifacts.get("figures", []))),
-            "primary_metric": float(len(artifacts.get("figures", []))) / max(1.0, ...),
-        }
-        self._write_summary(workspace, returncode, elapsed, artifacts, timed_out)
-        return SandboxResult(
-            returncode=returncode, stdout=stdout, stderr=stderr,
-            elapsed_sec=elapsed, metrics=metrics, timed_out=timed_out,
-        )
-
-    def _prepare_workspace(self, prompt_text: str) -> Path:
-        ws = self.workdir
-        ws.mkdir(parents=True, exist_ok=True)
-        (ws / _PROMPT_FILENAME).write_text(prompt_text, encoding="utf-8")
-        for sub in ("models", "scripts", "output/figures", "output/data", "progress"):
-            (ws / sub).mkdir(parents=True, exist_ok=True)
-        if self.config.install_skills:
-            self._install_skills(ws)
-        return ws
-
-    def _install_skills(self, workspace: Path) -> None:
-        # Mirror collider_agent_sandbox.py:445-505 — copy
-        # <repo>/src/skills and <repo>/src/agents into both
-        # ~/.claude/{skills,agents} (global) and workspace/.claude/{...}
-        # (project-scoped, takes precedence in the CWD).
-        ...
-
-    def _build_command(self) -> list[str]:
-        binary = self.config.claude_binary or shutil.which("claude") or "claude"
-        cmd = [binary, "-p", _CLAUDE_INSTRUCTION]
-        if self.config.max_turns > 0:
-            cmd += ["--max-turns", str(self.config.max_turns)]
-        cmd += [a for a in self.config.extra_args if a]
-        return cmd
-
-    def _build_env(self) -> dict[str, str]:
-        env = os.environ.copy()
-        # add credentials, paths, etc.
-        return env
-
-    def _collect_artifacts(self, workspace: Path) -> dict[str, list[str]]:
-        # Walk workspace/output/** and categorize
-        artifacts: dict[str, list[str]] = {"figures": [], "data": [], "scripts": [], "models": [], "logs": []}
-        for p in sorted(workspace.glob("output/figures/*.pdf")):
-            artifacts["figures"].append(str(p.relative_to(workspace)))
-        # ...
-        return artifacts
-
-    def _write_summary(self, workspace, returncode, elapsed, artifacts, timed_out) -> None:
-        # Merge sandbox metadata into workspace/results.json without
-        # clobbering keys that the agent itself wrote (see
-        # collider_agent_sandbox.py:610-655 for the merge contract).
-        ...
-```
-
-The two design rules to copy from `ColliderAgentSandbox`:
-
-1. **Existing keys win** for "soft" fields — the agent's own `results.json` is authoritative for things like `metrics` and `structured_results`; the sandbox only fills in `source`, `artifacts`, `status`. (`collider_agent_sandbox.py:626-655`)
-2. **`returncode`, `elapsed_sec`, `timed_out` are sandbox-authoritative** — always overwritten regardless of what the agent wrote.
-
-### Step 6 — Add the factory dispatch
-
-`researchclaw/experiment/factory.py:85-89` — a dispatch case parallel to `collider_agent`:
-
-```python
-if config.mode == "my_domain_agent":
-    from researchclaw.experiment.my_domain_agent_sandbox import MyDomainAgentSandbox
-    return MyDomainAgentSandbox(config.my_domain_agent, workdir)
-```
-
-### Step 7 — Set the preference in your profile YAML
-
-```yaml
-preferred_experiment_mode: my_domain_agent
-
-my_domain_agent:
-  timeout_sec: 3600
-  max_turns: 100
-  install_skills: true
-  extra_args:
-    - "--dangerously-skip-permissions"
-```
-
-That's the entire Layer 5 surface. The pipeline runner doesn't change.
-
-### Step 8 — Mandate a canonical `results.json` in the agent prompt
-
-**This is non-optional for bench scoring.** Agent-based pipelines are atomic: the agent runs end-to-end inside one Claude Code session and exits. The downstream pipeline (stage 14 RESULT_ANALYSIS, stage 15 RESEARCH_DECISION, the bench rubric judge) only reads what's on disk afterwards. If the agent's scientific numbers don't reach a known-location structured file, the rubric scores all-null and the bench is meaningless.
-
-In your `_prepare_workspace` (Step 5), append a **MANDATORY CANONICAL OUTPUT** footer to the prompt that instructs the agent to write `results.json` at the workspace root with this fixed schema:
-
-```json
-{
-  "primary_metric": <number>,
-  "metric_key": "<string>",
-  "metrics": { "<domain_key_1>": <number>, "<domain_key_2>": <number>, ... },
-  "hypotheses": {
-    "h1": {"supported": true|false, "value": <number>, "details": "..."},
-    "h2": {"supported": true|false, "details": "..."},
-    "h3": {"supported": true|false, "details": "..."}
-  },
-  "summary": "human-readable narrative",
-  "structured_results": {"artifacts": {"figures": [...], "data": [...]}}
-}
-```
-
-See `biology_agent_sandbox.py:_prepare_workspace` and `collider_agent_sandbox.py:_prepare_workspace` for the exact instruction text — copy it verbatim.
-
-In `_build_metrics`, read this file via a static `_read_agent_results(workspace)` helper and **merge `metrics.*` into the SandboxResult.metrics dict**, plus convert `hypotheses.<id>.supported` flags into `hypothesis_<id>_supported` 0/1 metrics. This makes the agent's scientific numbers visible to stage 14 and the rubric. See `biology_agent_sandbox.py:374-470` for the full pattern.
-
-Crucial guard: the sandbox itself writes a meta stub to `results.json` (returncode, elapsed_sec, artifacts). `_read_agent_results` must skip that stub by requiring at least one of `metrics`, `primary_metric`, `hypotheses`, `structured_results` to be present, otherwise it picks up its own stub and forwards garbage. The fallback chain (`analysis/summary.json`, `analysis/flux_analysis_summary.json`) lets the agent use older conventions without breaking.
-
-### Step 9 — Implement `run_project()` (SandboxProtocol parity)
-
-The pipeline's stage-14 repair loop calls `sandbox.run_project(project_dir)` to re-execute. Without this method, repair cycles fail silently with `'XYZAgentSandbox' object has no attribute 'run_project'`. For agent-based sandboxes the implementation is trivial — a single Claude Code session IS the project — so dispatch to `run()` after reading the existing plan markdown:
-
-```python
-def run_project(self, project_dir, *, entry_point="main.py",
-                timeout_sec=300, args=None, env_overrides=None):
-    del entry_point, args, env_overrides  # SandboxProtocol parity only
-    for cand in (project_dir / "REPAIR_PROMPT.md",
-                 project_dir / _PROMPT_FILENAME,
-                 self.workdir / _PROMPT_FILENAME):
-        if cand.is_file():
-            return self.run(cand.read_text(encoding="utf-8"),
-                            timeout_sec=timeout_sec)
-    return SandboxResult(returncode=-1, stdout="",
-                         stderr="no plan found", elapsed_sec=0.0,
-                         metrics={}, timed_out=False)
-```
-
-### Step 10 — Skip stage 13 + stage-14 repair for your mode
-
-For agent-based modes, the python-code refinement loop in stage 13 ITERATIVE_REFINE and the stage-14 repair cycles are dead code: they iterate on python files the agent never executed, then re-spawn the agent atomically anyway. Two one-line edits skip them cleanly:
-
-* `researchclaw/pipeline/stage_impls/_execution.py:519` — extend the existing `if config.experiment.mode == "collider_agent"` guard at the top of `_execute_iterative_refine` to also include `"my_domain_agent"`.
-* `researchclaw/pipeline/runner.py:670` — extend the gate above `_run_experiment_diagnosis` / `_run_experiment_repair` so it skips when `config.experiment.mode in ("collider_agent", "biology_agent", "my_domain_agent")`.
-
-After these edits the pipeline reduces to: stage 12 (agent runs, writes `results.json`) → stage 13 (no-op, copies artifacts forward) → stage 14 (reads `results.json`, builds summary) → stage 15 (proceed-or-reject decision based on the summary). No abstract code refinement.
-
-### Step 11 — Declare requirements + plug into the LLM gate
-
-For agent modes the pipeline replaces the python-style numeric-threshold repair with an **LLM-driven proceed/rerun gate** at stage 15 RESEARCH_DECISION. Each manifest declares:
-
-```yaml
-requirements:
-  - id: req_<short_name>
-    type: numeric | discussion | artifact   # advisory; LLM uses freely
-    description: "natural-language statement of what must be true post-run"
-    must_pass: true                          # true → unmet ⇒ rerun once
-```
-
-Mechanics:
-
-1. `experiments/arc_bench/scripts/prepare_run.py:write_requirements()` copies the list to `run_dir/stage-09/requirements.json` and stashes the full manifest under `run_dir/stage-07/topic_manifest.json` for fallback lookup.
-2. At stage 15, `researchclaw.pipeline.stage_impls._analysis._agent_requirements_decision()` fires (only when `experiment.mode in ("collider_agent", "biology_agent")`):
-   - reads `requirements.json`
-   - reads the most recent `experiment_summary.json` and the agent's canonical `results.json` (with the same fallback chain as the sandbox: `analysis/summary.json`, `analysis/flux_analysis_summary.json`, `output/data/results.json`)
-   - calls `researchclaw.pipeline.requirements_judge.judge_requirements()` — LLM produces `{verdict: proceed|reject|partial, per_requirement: [...], delta_feedback}`
-   - normalizes verdict from `per_requirement.met` (defends against LLM envelope inconsistency)
-3. **On `reject` AND retry budget remains** (default 1 retry): writes `REPAIR_PROMPT.md` to the stage-12 sandbox workspace listing the unmet must_pass items, sets `decision = "refine"`. Runner-side override at `runner.py:718` redirects `refine` → `EXPERIMENT_RUN` for agent modes (not the python-refine `ITERATIVE_REFINE`), so the agent re-runs atomically. The sandbox's `_prepare_workspace()` consumes `REPAIR_PROMPT.md` (deletes it) and prepends it as a **FOLLOWUP DELTA** section ahead of the original plan.
-4. **On `reject` with retry exhausted**, `proceed`, or `partial`: sets `decision = "proceed"`. The `requirements_unmet` flag (when present) flows into `requirements_verdict.json` at run root and downstream stages can surface caveats.
-
-To raise the retry budget, change `_REQUIREMENTS_MAX_RETRIES` (default 1) in `_analysis.py`. To gate ML modes the same way, drop the `experiment.mode in ("collider_agent", "biology_agent")` guard at the top of `_execute_research_decision`.
-
-Tight requirements work best — keep the must_pass set to **2-5 items** that the agent can unambiguously satisfy or fail. Use must_pass=false for nice-to-haves (mechanistic discussion, seed documentation) so the LLM can flag them without forcing a rerun.
-
-### Step 12 — Paper-quality meta-rubric (applies to ALL topics)
-
-Per-topic rubrics in `config/<domain>/rubrics/<id>.json` grade the **science**.
-A separate file — `experiments/arc_bench/config/_meta_paper_quality.json` —
-grades the **paper output** uniformly across every topic. You do NOT need to
-write a separate paper-quality rubric for your domain: the same 19 leaves
-(paper-content / code-orchestration / visual-layout / content-accuracy) apply
-to ML, physics, biology, and any future domain.
-
-The meta-rubric is graded **manually** via `scripts/judge_paper_manual.sh`,
-which launches a vision-equipped Claude Code session against the run's
-deliverable directory. The bench pipeline does NOT auto-invoke this — paper
-quality is too expensive (~$0.5 + 10 min per run) and too subtle (figure
-inspection, code review) to bake into every CI cycle.
-
-Your domain only needs to make sure its **deliverables are present** for the
-manual grader to find:
-- `paper_final.md` (or `paper_revised.md`, or `paper_draft.md`) under any
-  `stage-22/`, `stage-19/`, `stage-17/`, or `deliverables/`
-- A `charts/` directory (PNG / PDF) under `stage-22/` or `deliverables/`
-- A `code/` or `experiment_final/` directory with your domain's source
-
-These are produced by the standard pipeline stages 16-22, so no extra wiring
-is required.
-
-### Step 13 — Place the agent repo under `external/agents/` with attribution
-
-Don't reference absolute paths like `/home/<user>/MyDomainAgent` in the default config. Instead:
-
-```bash
-mkdir -p external/agents
-ln -s /path/to/MyDomainAgent external/agents/MyDomainAgent
-```
-
-Then in `MyDomainAgentConfig`:
-
-```python
-my_domain_agent_dir: str = "external/agents/MyDomainAgent"
-```
-
-(Resolved relative to the repo root, which is the cwd when researchclaw runs.) Add an entry to `external/agents/README.md` crediting the upstream — for `ColliderAgent` the upstream is `https://github.com/HET-AGI/ColliderAgent`. The bench's per-run README must mention which external agent produced the results so reviewers can attribute correctly.
-
-## 9. Validation Steps
-
-Run these in order. Each command prints success/failure for one layer.
-
-```bash
-# 1. Profile loads (Layer 1)
-python -c "from researchclaw.domains.detector import load_all_profiles; \
-           print([p.domain_id for p in load_all_profiles()])"
-# Expected: a list including "my_domain"
-
-# 2. Detector matches your topic (Layer 4)
-python -c "from researchclaw.domains.detector import detect_domain; \
-           print(detect_domain('your topic with my_domain keyword'))"
-# Expected: DomainProfile(domain_id="my_domain", ...)
-
-# 3. Adapter dispatches (Layer 2)
-python -c "from researchclaw.domains.detector import detect_domain; \
-           from researchclaw.domains.prompt_adapter import get_adapter; \
-           p = detect_domain('your topic with my_domain keyword'); \
-           print(type(get_adapter(p)).__name__)"
-# Expected: MyDomainPromptAdapter
-
-# 4. Prompt bank loads (Layer 3 — only if you added one)
-python -c "from researchclaw.prompts.manager import PromptManager; \
-           pm = PromptManager(domain='my_domain'); \
-           print(pm.domain, pm.stage_names()[:5])"
-# Expected: ('my_domain', ['topic_init', 'problem_decompose', ...])
-
-# 5. Parity test passes (catches missing stages or placeholder mismatches)
-pytest tests/test_prompt_bank_parity.py -v
-# Expected: all green; failures call out exactly which stage/placeholder
-# diverged from the ML reference.
-
-# 6. End-to-end smoke (single iteration, full 23 stages)
-python -m researchclaw run \
-    --profile my_domain \
-    --topic "your domain-relevant topic" \
-    --auto-approve \
-    --max-iterations 1
-# Expected: pipeline runs to completion; stage outputs land in the
-# configured run_dir; the export stage emits a paper PDF in your
-# preferred template.
-```
-
-If step 5 fails with "extra stage in my_domain" or "missing placeholder `{X}` in stage Y", fix the bank module — those are the precise contract violations the parity test catches.
-
-## 10. Worked Example: `hep_ph` (ColliderAgent)
-
-How the seven layers fit together for the existing HEP integration.
-
-| Layer | File / line | What it contributes |
-|---|---|---|
-| 1. Profile | `researchclaw/domains/profiles/hep_ph.yaml` (131 lines) | `domain_id: hep_ph`, `preferred_experiment_mode: collider_agent`, `preferred_target_conference: jhep`, baselines = LZ/XENON1T/PandaX/ATLAS/CMS, statistical_tests = `cls_exclusion`, condition_terminology maps `metric` → "cross section / exclusion limit / signal significance", `code_generation_hints` injects the natural-units + anti-ML-pattern guidance into the blueprint stage. |
-| 2. Adapter | `researchclaw/domains/adapters/hep_ph.py` | `HEPPhPromptAdapter` — minimal class. Three stage methods return empty `PromptBlocks()` because the prompt bank covers them; only `get_export_publish_blocks` is non-trivial — it returns `preferred_template="jhep"` and a guidance string telling the export pass to keep natural units and skip NeurIPS-style broader-impact paragraphs (`hep_ph.py:33-44`). |
-| 2. Adapter registration | `researchclaw/domains/prompt_adapter.py:322-327` | Lazy import inside `_build_adapter_registry`; both exact key `"hep_ph"` and prefix `"hep_ph_"` map to `HEPPhPromptAdapter`. |
-| 3. Prompt bank | `researchclaw/prompts/hep.py` (1404 lines) | Full STAGES dict with 20 entries (`topic_init` through `export_publish`); HEP-native debate roles `theorist / phenomenologist / experimentalist` for hypothesis (lines 34-118) and `model_builder / phenomenologist / experimentalist` for analysis (lines 121-192); HYPOTHESIS_GEN system prompt at lines 437-488 demands BSM Lagrangians, natural units, falsification numbers in cm²/pb. |
-| 3. Bank registration | `researchclaw/prompts/manager.py:31, 85-86` | `SUPPORTED_DOMAINS = ("ml", "hep_ph")`; `_load_bank` branches on `domain == "hep_ph"` to import `researchclaw.prompts.hep`. |
-| 4. Detector | `researchclaw/domains/detector.py:283-291` (with backup tuple at 293-296) | 27 keyword phrases (`dark matter`, `wimp`, `madgraph5`, `feynrules`, `delphes`, `xenon1t`, `pandax`, `monojet`, `mono-x`, `exclusion contour`, …) routed to `hep_ph`. Placed BEFORE the generic `physics_simulation` rule at line 299 so dark-matter topics don't get reclassified as molecular dynamics. |
-| 5. Experiment mode | `researchclaw/config.py:105` (`"collider_agent"` in `EXPERIMENT_MODES`) | Adds the mode token. |
-| 5. Config dataclass | `researchclaw/config.py:296-333` (`ColliderAgentConfig`) | Holds `collider_agent_dir`, `working_dir`, `timeout_sec=7200`, `extra_args=("--dangerously-skip-permissions",)`, `install_skills=True`, `max_turns=150`, optional `magnus_address`/`magnus_token`, and `incremental` re-entry flag. |
-| 5. Wired into ExperimentConfig | `researchclaw/config.py:475` | `collider_agent: ColliderAgentConfig = field(default_factory=ColliderAgentConfig)` |
-| 5. Parser | `researchclaw/config.py:1094-1110` (`_parse_collider_agent_config`) | Reads `experiment.collider_agent.*` from user YAML. |
-| 5. Sandbox | `researchclaw/experiment/collider_agent_sandbox.py` (656 lines) | `ColliderAgentSandbox.run(prompt_text)` writes `collider_plan.md`, `mkdir`s the canonical subtree (`models/`, `scripts/`, `events/`, `analysis/`, `output/figures/`, `output/data/`, `progress/`) (line 261), copies `<ColliderAgentDir>/src/{skills,agents}` into both `~/.claude/` (global, line 459) and `workspace/.claude/` (project-scoped, line 492), invokes `claude -p "Execute the analysis following collider_plan.md" --max-turns 150 --dangerously-skip-permissions` (lines 507-524), then collects artifacts (figures/data/scripts/models/logs) and merges them into `results.json` without clobbering ColliderAgent's own structured output (lines 538-655). Also implements an `incremental` re-entry mode that snapshots prior stage-12 runs into `stage-12_v{N}/`, builds a workspace manifest, and prepends a CONTINUATION CONTEXT block so the next run touches only the deltas (lines 169-258, 270-409). |
-| 5. Factory dispatch | `researchclaw/experiment/factory.py:85-89` | `if config.mode == "collider_agent": return ColliderAgentSandbox(config.collider_agent, workdir)` |
-| 6. LaTeX template | `researchclaw/templates/conference.py:345-373` | `JHEP` template (`name="jhep"`, `style_package="jheppub"`, `author_format="jhep"`, points to the official JHEP TeXclass download URL); registered under alias `"jhep"` at line 531. Selected automatically via `HEPPhPromptAdapter.get_export_publish_blocks(...).preferred_template == "jhep"` whenever the user hasn't manually set `export.target_conference`. |
-
-End-to-end flow when a user runs `python -m researchclaw run --profile hep_ph --topic "dark photon mediator dark matter"`:
-
-1. `deploy.py` reads `profiles/hep_ph.yaml` → fills `experiment.mode = "collider_agent"`, `export.target_conference = "jhep"`, time budget 7200 s, etc.
-2. `PromptManager(domain="hep_ph")` loads `prompts/hep.py` STAGES.
-3. Pipeline runs stages 0–10. Hypothesis generation uses the theorist/phenomenologist/experimentalist debate roles. Code generation stage's blueprint context is enriched with `typical_file_structure` and `code_generation_hints` from the YAML via `HEPPhPromptAdapter.get_blueprint_context()`.
-4. Stage 12 (experiment execution) calls `create_sandbox(config)` → factory dispatches to `ColliderAgentSandbox` → writes the assembled physics plan to `collider_plan.md`, installs skills, invokes Claude Code, collects artifacts back into `metrics`.
-5. Stage 13 result analysis uses the model_builder/phenomenologist/experimentalist debate roles.
-6. Stage 22 export reads `HEPPhPromptAdapter.get_export_publish_blocks()` → selects the JHEP template → renders the paper using `jheppub.cls`.
-
-The 23-stage runner code is unchanged from the ML pipeline.
-
-## 11. What You Do NOT Need to Touch
-
-The plug-in surface is intentionally narrow. **None of the following ever needs domain-specific edits**:
-
-- The pipeline runner (`researchclaw/pipeline/runner.py`) — it iterates stages by name and calls `pm.for_stage(name, **vars)`.
-- Stage handlers (`researchclaw/pipeline/stages/*.py`) — they request a `RenderedPrompt` from the manager and pass the same kwarg set regardless of domain.
-- LLM dispatch (`researchclaw/llm/*.py`) — model selection, retries, token accounting.
-- Gates and judges (`researchclaw/pipeline/gates/*.py`, `researchclaw/judge/*`) — they evaluate outputs against generic structural and quality rubrics.
-- Knowledge base writer (`researchclaw/kb/*.py`) — markdown/Obsidian backends are domain-agnostic.
-- Evaluators / scoreboards (`researchclaw/evaluator/*`).
-- Experiment auto-repair (`researchclaw/experiment/repair*.py`).
-- Code-generation agent core (`researchclaw/code_agent/*`) — it consumes the blueprint context the adapter produced.
-- The CLI (`researchclaw/__main__.py`, `researchclaw/cli/*`).
-
-If you find yourself editing any of these to make a new domain work, stop — that's a sign the plug-in surface needs widening (or you're doing too much). The five layers above are the contract.
-
----
-
-### Quick reference: minimum viable new domain
-
-```text
-1. researchclaw/domains/profiles/<id>.yaml         — write the YAML
-2. researchclaw/domains/adapters/<id>.py           — copy biology.py, edit
-3. researchclaw/domains/adapters/__init__.py       — import + __all__
-   researchclaw/domains/prompt_adapter.py          — add to _build_adapter_registry
-4. researchclaw/domains/detector.py:_KEYWORD_RULES — append your tuple
-5. pytest tests/test_prompt_bank_parity.py         — must pass (no-op
-                                                      unless you added a bank)
-6. python -m researchclaw run --profile <id> ...   — smoke test
-```
-
-That's the whole thing. The HEP integration adds layers 5/6 (external agent + JHEP template) on top of the same skeleton.
diff --git a/docs/HITL_GUIDE.md b/docs/HITL_GUIDE.md
deleted file mode 100644
index 8409b596..00000000
--- a/docs/HITL_GUIDE.md
+++ /dev/null
@@ -1,620 +0,0 @@
-# Human-in-the-Loop Co-Pilot Guide
-
-> **AutoResearchClaw v0.4.0** transforms the pipeline from purely autonomous to a human-AI collaborative research engine. This guide covers everything you need to know.
-
----
-
-## Table of Contents
-
-1. [Why Co-Pilot?](#1-why-co-pilot)
-2. [Quick Start](#2-quick-start)
-3. [Intervention Modes](#3-intervention-modes)
-4. [The Co-Pilot Workflow](#4-the-co-pilot-workflow)
-5. [CLI Commands](#5-cli-commands)
-6. [Stage-by-Stage Intervention Guide](#6-stage-by-stage-intervention-guide)
-7. [Workshops](#7-workshops)
-8. [Detached Operation](#8-detached-operation)
-9. [Safety & Guardrails](#9-safety--guardrails)
-10. [Intelligence Layer](#10-intelligence-layer)
-11. [Pipeline Branching](#11-pipeline-branching)
-12. [Adapters (CLI / WebSocket / MCP)](#12-adapters)
-13. [Configuration Reference](#13-configuration-reference)
-14. [FAQ](#14-faq)
-
----
-
-## 1. Why Co-Pilot?
-
-Fully autonomous research pipelines produce papers fast, but testing reveals consistent quality gaps:
-
-| Problem | Root Cause |
-|---------|-----------|
-| Weak research ideas | AI lacks taste for what's truly novel and impactful |
-| Missing baselines | AI doesn't know which comparisons reviewers expect |
-| Fragile experiment code | No human sanity check before execution |
-| Thin analysis | AI draws superficial conclusions from results |
-| Generic paper writing | AI produces correct-but-bland academic prose |
-
-The HITL Co-Pilot system solves this by letting you **intervene exactly where your expertise matters most**, while the AI handles the heavy lifting everywhere else.
-
-**The result**: papers that combine AI speed with human judgment.
-
----
-
-## 2. Quick Start
-
-### Option A: Co-Pilot Mode (Recommended)
-
-```bash
-researchclaw run --topic "Your research idea" --mode co-pilot
-```
-
-The pipeline will run automatically and pause at key decision points for your input. At each pause, you'll see an interactive prompt with available actions.
-
-### Option B: Express Mode (Minimal Interruption)
-
-```bash
-researchclaw run --topic "Your research idea" --mode express
-```
-
-Only pauses at 3 critical gates: hypothesis approval (Stage 8), experiment design (Stage 9), and final quality check (Stage 20).
-
-### Option C: Full Auto (Original Behavior)
-
-```bash
-researchclaw run --topic "Your research idea" --auto-approve
-```
-
-No human intervention. Identical to pre-v0.4.0 behavior.
-
----
-
-## 3. Intervention Modes
-
-| Mode | Flag | Pauses At | Best For |
-|------|------|-----------|----------|
-| **Full Auto** | `--auto-approve` | Never | Quick exploration, low-stakes experiments |
-| **Gate Only** | `--mode gate-only` | 3 gate stages (5, 9, 20) | Light oversight |
-| **Checkpoint** | `--mode checkpoint` | End of each phase (8 points) | Phase-level review |
-| **Co-Pilot** | `--mode co-pilot` | Critical stages + SmartPause triggers | **Recommended for production** |
-| **Step-by-Step** | `--mode step-by-step` | After every stage (23 pauses) | Learning the pipeline |
-| **Express** | `--mode express` | 3 most critical gates only | Experienced users |
-| **Custom** | `--mode custom` | User-defined per-stage policies | Advanced configuration |
-
-### How to Choose
-
-- **First time using the pipeline?** Start with `step-by-step` to understand each stage.
-- **Publishing a real paper?** Use `co-pilot` for the best quality.
-- **Running overnight?** Use `gate-only` or `express` — fewer interruptions.
-- **Batch processing many topics?** Use `full-auto`.
-
----
-
-## 4. The Co-Pilot Workflow
-
-When the pipeline pauses, you'll see an interactive panel:
-
-```
-──────────────────────────────────────────────────────────
-  HITL | Stage 08: HYPOTHESIS_GEN
-  Post-stage review
-──────────────────────────────────────────────────────────
-
-  Stage 8 (HYPOTHESIS_GEN) — done
-
-  Hypotheses generated. This is a CRITICAL decision point —
-  review each hypothesis for novelty, feasibility, and significance.
-
-  Outputs:
-    hypotheses.md (1,247 bytes)
-      → ## Hypothesis 1: Quantum gate noise as structured regularization
-    novelty_report.json (892 bytes)
-
-  Novelty score: 0.72 (moderate)
-
-  Available actions:
-    [a] Approve and continue
-    [r] Reject and rollback
-    [e] Edit stage output
-    [c] Start collaborative chat
-    [i] Inject guidance / direction
-    [s] Skip this stage
-    [q] Abort pipeline
-    [v] View full stage output
-
-Action >
-```
-
-### Available Actions at Every Pause
-
-| Key | Action | What Happens |
-|-----|--------|-------------|
-| `a` | **Approve** | Accept the output and continue to the next stage |
-| `r` | **Reject** | Reject the output; pipeline rolls back to an earlier stage |
-| `e` | **Edit** | Opens the output file in your `$EDITOR` (vim, nano, VS Code, etc.) |
-| `c` | **Collaborate** | Start a multi-turn chat with the AI to refine the output together |
-| `i` | **Inject Guidance** | Provide direction that will be incorporated into subsequent stages |
-| `s` | **Skip** | Skip this stage entirely (use with caution) |
-| `b` | **Rollback** | Jump back to a specific earlier stage |
-| `q` | **Abort** | Stop the pipeline entirely |
-| `v` | **View** | Display the full contents of output files |
-
----
-
-## 5. CLI Commands
-
-### Starting a Run
-
-```bash
-# Co-Pilot mode
-researchclaw run --topic "Quantum noise as neural network regularization" --mode co-pilot
-
-# With explicit config
-researchclaw run --config config.arc.yaml --topic "..." --mode co-pilot
-
-# Resume a previous run in co-pilot mode
-researchclaw run --config config.arc.yaml --resume --mode co-pilot
-```
-
-### Detached Interaction
-
-These commands let you interact with a paused pipeline from a separate terminal:
-
-```bash
-# Check status
-researchclaw status artifacts/rc-2026-0328-abc123
-
-# Attach interactively (full TUI)
-researchclaw attach artifacts/rc-2026-0328-abc123
-
-# Quick approve (non-interactive)
-researchclaw approve artifacts/rc-2026-0328-abc123 --message "Looks good"
-
-# Quick reject
-researchclaw reject artifacts/rc-2026-0328-abc123 --reason "Missing ResNet baseline"
-
-# Inject guidance for a specific stage
-researchclaw guide artifacts/rc-2026-0328-abc123 --stage 9 --message "Add Dropout as baseline"
-```
-
----
-
-## 6. Stage-by-Stage Intervention Guide
-
-### Where Your Input Matters Most
-
-| Stage | Name | Co-Pilot Behavior | Your Role |
-|-------|------|-------------------|-----------|
-| 1-2 | Scoping | Pause after | Confirm research direction and scope |
-| 3 | Search Strategy | Pause after | Add missing search terms or sources |
-| 5 | Literature Screen | **Approval required** | Verify important papers aren't filtered out |
-| 7 | Synthesis | Pause after | Check if the identified gaps match your understanding |
-| **8** | **Hypothesis Gen** | **Collaboration** | **Review, discuss, and refine the core research idea** |
-| **9** | **Experiment Design** | **Collaboration + Approval** | **Verify baselines, benchmarks, metrics, ablations** |
-| 10 | Code Generation | Pause after | Spot-check code quality |
-| 12 | Experiment Run | Stream output | Monitor training metrics in real-time |
-| 13 | Iterative Refine | Pause after | Decide if refinement should continue |
-| **15** | **Research Decision** | **Approval required** | **Choose PROCEED, PIVOT, or REFINE** |
-| 16 | Paper Outline | Pause after | Adjust section structure |
-| **17** | **Paper Draft** | **Collaboration** | **Co-write key sections** |
-| 18 | Peer Review | Pause after | Prioritize which review comments to address |
-| **20** | **Quality Gate** | **Approval required** | **Final publication decision** |
-| 23 | Citation Verify | Pause after | Review flagged citations |
-
-### Guidance Injection
-
-You can inject guidance for **any stage at any time**, even before it runs:
-
-```bash
-researchclaw guide artifacts/rc-xxx --stage 17 --message "Emphasize the theoretical contribution in the Introduction. Keep the Method section concise."
-```
-
-Guidance is automatically included in the LLM's context when that stage executes.
-
----
-
-## 7. Workshops
-
-Workshops are specialized collaboration tools for the most critical stages.
-
-### Idea Workshop (Stage 7-8)
-
-When you enter collaboration mode at Stage 8, the Idea Workshop activates:
-
-```
-You > c  (start collaboration)
-
-AI > I generated 3 candidate hypotheses. Here's my evaluation:
-     1. "Quantum gate noise as structured regularization" — Novelty: 8/10, Feasibility: 6/10
-     2. "Entanglement-based feature selection" — Novelty: 7/10, Feasibility: 4/10
-     3. "Quantum sampling for data augmentation" — Novelty: 5/10, Feasibility: 8/10
-
-     Which direction interests you most?
-
-You > Hypothesis 1 is the most promising, but we need to be more specific.
-     The noise should be modeled as a stochastic regularizer, and we need
-     to compare against Dropout, Label Smoothing, and MixUp.
-
-AI > Updated hypothesis:
-     "We theoretically characterize quantum gate noise as a structured
-     stochastic regularizer and empirically demonstrate it outperforms
-     Dropout, Label Smoothing, and MixUp on small-sample classification."
-
-     Baselines updated: + Dropout, + Label Smoothing, + MixUp, + CutMix
-
-     Shall I proceed with this direction?
-
-You > Yes, approve.
-```
-
-The workshop automatically:
-- Saves your refined hypothesis to `hypotheses.md`
-- Records the conversation in `hitl/chat_stage_08.jsonl`
-- Persists candidate evaluations in `hitl/idea_workshop.json`
-
-### Baseline Navigator (Stage 9)
-
-The Baseline Navigator helps you build a rigorous experiment design:
-
-```
-AI > Here are the suggested baselines:
-     [AI] ResNet-50 (standard image classification baseline)
-     [AI] ViT-B/16 (transformer baseline)
-     [AI] Dropout (regularization baseline)
-
-     Experiment Design Checklist:
-     [+] Baselines: 3
-     [ ] Benchmarks: CIFAR-10, CIFAR-100
-     [ ] Metrics: accuracy, F1
-
-You > Add Label Smoothing and MixUp as baselines. Also add STL-10.
-
-AI > Updated. Current checklist:
-     [+] Baselines: 5 (ResNet-50, ViT-B/16, Dropout, Label Smoothing, MixUp)
-     [+] Benchmarks: 3 (CIFAR-10, CIFAR-100, STL-10)
-     [+] Metrics: accuracy, F1
-
-     Review questions:
-     [ ] Are all standard baselines for this domain included? ✓
-     [ ] Is there at least one ablation planned? — suggest: "noise magnitude sweep"
-```
-
-### Paper Co-Writer (Stage 16-17)
-
-The Paper Co-Writer supports three workflows:
-
-1. **AI-first** (default): AI writes the draft, you edit and refine
-2. **Human-first**: You write key paragraphs, AI expands and polishes
-3. **Interleaved**: You alternate — you write the Method, AI writes Related Work
-
-```
-AI > Paper draft completed (5,200 words, 7 sections).
-     Which sections would you like to review or co-edit?
-
-You > The Introduction is too generic. Let me rewrite the first paragraph.
-
-AI > Opening your editor... (saves backup to hitl/snapshots/)
-
-     [Editor opens with Introduction section]
-
-You > [Saves edited file]
-
-AI > Your edits look great. I'll polish the transitions and ensure
-     the notation is consistent with the Method section.
-
-     Updated Introduction: 420 words → 380 words (tighter, more specific).
-     Change summary: +3 added, -5 deleted, ~8 changed, 22 unchanged
-```
-
----
-
-## 8. Detached Operation
-
-Research runs can take hours. You don't need to sit and watch.
-
-### How It Works
-
-1. Pipeline pauses → writes `hitl/waiting.json`
-2. Pipeline enters file-polling mode (checks every 2 seconds for `response.json`)
-3. You respond whenever you're ready via `attach`, `approve`, or web dashboard
-4. Pipeline picks up your response and resumes
-
-### Scenario: Overnight Run
-
-```bash
-# Start the run at 6 PM
-researchclaw run --topic "..." --mode co-pilot &
-
-# Pipeline runs Stages 1-7, pauses at Stage 8...
-# You go home
-
-# Next morning, check status
-researchclaw status artifacts/rc-2026-xxx
-# Output: "WAITING for input at Stage 8 — HYPOTHESIS_GEN (since 18:42)"
-
-# Review and approve
-researchclaw attach artifacts/rc-2026-xxx
-# Interactive review → approve → pipeline resumes
-```
-
-### Timeout Behavior
-
-By default, the pipeline waits 24 hours for a response. You can configure this:
-
-```yaml
-hitl:
-  timeouts:
-    default_human_timeout_sec: 86400   # 24h (default)
-    auto_proceed_on_timeout: false     # true = auto-approve after timeout
-```
-
----
-
-## 9. Safety & Guardrails
-
-### Cost Budget
-
-Set a spending limit to prevent runaway API costs:
-
-```yaml
-hitl:
-  cost_budget_usd: 50.0   # Pipeline pauses at 50%, 80%, and 100% of budget
-```
-
-When a threshold is breached, the pipeline pauses with a cost summary:
-```
-Cost budget alert: Cost: $42.50 / $50.00 [████████████████░░░░] 85%
-```
-
-### Claim Verification
-
-The Claim Verifier automatically checks AI-generated text against your collected literature:
-
-- **Citation claims**: Are cited papers in your shortlist? Or fabricated?
-- **Numerical claims**: Do reported numbers match actual experiment data?
-- **Factual claims**: Are "it has been shown that..." statements grounded?
-
-Unverified claims are flagged in the review summary, letting you decide what to keep.
-
-### SHA256 Artifact Checksums
-
-Every stage output gets a SHA256 manifest (`manifest.json`) for reproducibility. If an artifact is modified outside the pipeline, verification will detect it.
-
-### Escalation Policy
-
-For team/production use, configure tiered notification escalation:
-
-```yaml
-hitl:
-  escalation:
-    levels:
-      - delay_sec: 0       # Immediate terminal notification
-        channel: terminal
-      - delay_sec: 1800    # After 30 min → Slack
-        channel: slack
-        message: "Pipeline needs attention"
-      - delay_sec: 7200    # After 2h → email
-        channel: email
-      - delay_sec: 86400   # After 24h → auto-abort
-        channel: terminal
-        auto_action: abort
-```
-
-### Extensible Hooks
-
-Run custom scripts before/after any stage:
-
-```bash
-# Create a hook script
-cat > artifacts/rc-xxx/hooks/post_stage_10.sh << 'EOF'
-#!/bin/sh
-echo "Running linter on generated code..."
-cd $RC_RUN_DIR/stage-10/experiment && python -m py_compile main.py
-EOF
-chmod +x artifacts/rc-xxx/hooks/post_stage_10.sh
-```
-
-Hooks receive environment variables: `RC_STAGE_NUM`, `RC_STAGE_NAME`, `RC_RUN_DIR`, `RC_HOOK_NAME`.
-
----
-
-## 10. Intelligence Layer
-
-### SmartPause
-
-SmartPause goes beyond fixed gate stages. It dynamically decides whether to pause based on:
-
-- **Quality score** (from PRM or heuristics): Low quality → pause for review
-- **Stage criticality**: High-impact stages (hypotheses, experiment design) have lower thresholds
-- **Historical rejection rate**: Stages you frequently reject get paused more often
-- **Confidence**: When the AI is uncertain, it asks for help
-
-You don't need to configure SmartPause — it works automatically in co-pilot mode.
-
-### Intervention Learning (ALHF)
-
-Every time you approve, reject, or edit, the system learns:
-
-- Stages you always approve → future runs auto-approve them
-- Stages you frequently reject → future runs pause more aggressively
-- Your edit patterns → inform SmartPause thresholds
-
-After 5+ runs, the system adapts to your review style.
-
-### Quality Predictor
-
-At any pause point, the system estimates the final paper quality based on current artifacts:
-
-- Literature coverage (number and diversity of papers)
-- Hypothesis specificity and falsifiability
-- Experiment design completeness (baselines, ablations, metrics)
-- Result strength (improvement over baselines)
-- Draft quality (length, structure, section coverage)
-- Citation integrity
-
-Risk factors are highlighted so you know where to focus your attention.
-
----
-
-## 11. Pipeline Branching
-
-When you're unsure which research direction to pursue, branch the pipeline:
-
-```
-# At Stage 8, you see 3 promising hypotheses
-Action > b  (branch)
-
-# Fork to explore Hypothesis A
-researchclaw branch create --run-dir artifacts/rc-xxx --name "quantum-noise" --stage 8
-
-# Fork to explore Hypothesis B
-researchclaw branch create --run-dir artifacts/rc-xxx --name "entanglement" --stage 8
-```
-
-Each branch gets its own copy of the pipeline state. Run them independently, then compare:
-
-```bash
-# Compare branches at Stage 14 (after experiments)
-researchclaw branch compare --run-dir artifacts/rc-xxx --stage 14
-```
-
-```
-Branch Comparison — Stage 14: RESULT_ANALYSIS
-
-  main:
-    artifacts: 3, quality: 0.72
-    → Best accuracy: 78.3%
-
-  quantum-noise:
-    artifacts: 3, quality: 0.85
-    → Best accuracy: 82.1%
-
-  entanglement:
-    artifacts: 2, quality: 0.61
-    → Best accuracy: 74.5%
-```
-
-Merge the winner:
-
-```bash
-researchclaw branch merge --run-dir artifacts/rc-xxx --branch "quantum-noise" --from-stage 9
-```
-
----
-
-## 12. Adapters
-
-The HITL system supports three interaction channels:
-
-### CLI Adapter (Default)
-
-Terminal-based interaction with ANSI colors, `$EDITOR` integration, and multi-line input. Works over SSH.
-
-### WebSocket Adapter
-
-For the web dashboard. Provides real-time updates via WebSocket:
-
-```
-Browser → WebSocket → ws_adapter.py → waiting.json / response.json → Pipeline
-```
-
-Message types: `get_status`, `approve`, `reject`, `edit`, `inject_guidance`, `chat_message`.
-
-### MCP Adapter
-
-External AI agents (Claude, OpenClaw) can interact with the HITL system via MCP tool calls:
-
-- `hitl_get_status` — Check if the pipeline is waiting
-- `hitl_approve_stage` — Approve the current gate
-- `hitl_reject_stage` — Reject with reason
-- `hitl_inject_guidance` — Provide direction
-- `hitl_view_output` — Read stage artifacts
-
-This enables **agent-in-the-loop** workflows where another AI system reviews and approves the pipeline's work.
-
----
-
-## 13. Configuration Reference
-
-```yaml
-hitl:
-  enabled: true                        # Master switch (default: false)
-  mode: co-pilot                       # Intervention mode (see table above)
-  cost_budget_usd: 0.0                 # Cost limit in USD (0 = unlimited)
-
-  notifications:
-    on_pause: true                     # Notify on pipeline pause
-    on_quality_drop: true              # Notify on quality issues
-    on_error: true                     # Notify on stage errors
-    channels: ["terminal"]             # terminal | slack | email | webhook
-
-  collaboration:
-    llm_model: ""                      # Model for chat (default: primary model)
-    max_chat_turns: 50                 # Max turns per collaboration session
-    save_chat_history: true            # Persist chat logs to hitl/
-
-  timeouts:
-    default_human_timeout_sec: 86400   # Wait time for human input (24h)
-    auto_proceed_on_timeout: false     # Auto-approve on timeout
-
-  # Per-stage policies (for 'custom' mode)
-  stage_policies:
-    8:
-      require_approval: true           # Must approve before continuing
-      enable_collaboration: true       # Enable chat mode
-      pause_before: false              # Pause before execution
-      pause_after: true                # Pause after execution
-      allow_edit_output: true          # Allow editing output files
-      allow_inject_prompt: true        # Allow guidance injection
-      stream_output: false             # Stream LLM output in real-time
-      min_quality_score: 0.0           # Pause if quality below threshold
-      max_auto_retries: 2              # Auto-retry count before pausing
-      human_timeout_sec: 86400         # Per-stage timeout override
-      auto_proceed_on_timeout: false   # Per-stage auto-proceed override
-```
-
-### Environment Variables
-
-| Variable | Purpose |
-|----------|---------|
-| `EDITOR` | Editor for file editing (default: nano on Unix, notepad on Windows) |
-| `RESEARCHCLAW_SLACK_WEBHOOK` | Slack webhook URL for notifications |
-| `RESEARCHCLAW_WEBHOOK_URL` | Generic webhook URL for notifications |
-
----
-
-## 14. FAQ
-
-### Does HITL slow down the pipeline?
-
-Only at the stages where you choose to intervene. In co-pilot mode, ~15 of 23 stages run automatically. Typical human time is 30-60 minutes per run, compared to 2-4 hours of autonomous execution.
-
-### Can I switch modes mid-run?
-
-Not currently, but you can resume a paused run with a different mode:
-
-```bash
-researchclaw run --resume --output artifacts/rc-xxx --mode step-by-step
-```
-
-### What if I'm not sure what to do at a pause?
-
-Press `v` to view the full output, then `c` to chat with the AI about it. The AI can explain what it did and why, and suggest what to focus on.
-
-### Does HITL work with ACP/OpenClaw?
-
-Yes. The MCP adapter exposes HITL tools that any ACP-compatible agent can call. OpenClaw can automatically review and approve gates.
-
-### What data does HITL store?
-
-Everything goes in `{run_dir}/hitl/`:
-- `session.json` — Session state
-- `interventions.jsonl` — All interventions (append log)
-- `chat_stage_NN.jsonl` — Chat histories
-- `snapshots/` — File backups before edits
-- `guidance/` — Injected guidance per stage
-- `notifications.jsonl` — Notification log
-
-### Is it backward compatible?
-
-Yes. Without `hitl.enabled: true` or `--mode`, the pipeline behaves identically to v0.3.x. The `--auto-approve` flag still works and takes precedence over HITL settings.
diff --git a/docs/README_AR.md b/docs/README_AR.md
deleted file mode 100644
index 260cec83..00000000
--- a/docs/README_AR.md
+++ /dev/null
@@ -1,790 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="700" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center"><b>شارك فكرة. احصل على ورقة بحثية. ذاتي، تعاوني ومتطور ذاتياً.</b></h2>
-
-
-
-<p align="center">
-  <b><i><font size="5">تحدث مع <a href="#-تكامل-openclaw">OpenClaw</a>: «ابحث عن X» → تمّ.</font></i></b>
-</p>
-
-<p align="center">
-  📄 <b>ورقتنا البحثية متاحة الآن على arXiv — تعالوا واقرأوها!</b> <a href="https://arxiv.org/abs/2605.20025"><i>AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration</i></a>
-</p>
-
-<p align="center">
-  <img src="../image/framework_v2.png" width="100%" alt="AutoResearchClaw Framework">
-</p>
-
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2605.20025"><img src="https://img.shields.io/badge/arXiv-2605.20025-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
-  <a href="https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ARC--Bench-yellow" alt="ARC-Bench on Hugging Face"></a>
-  <a href="../LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT License"></a>
-  <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" alt="Python 3.11+"></a>
-  <a href="#الاختبار"><img src="https://img.shields.io/badge/Tests-2699%20passed-brightgreen?logo=pytest&logoColor=white" alt="2699 Tests Passed"></a>
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/GitHub-AutoResearchClaw-181717?logo=github" alt="GitHub"></a>
-  <a href="#-تكامل-openclaw"><img src="https://img.shields.io/badge/OpenClaw-Compatible-ff4444?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6IiBmaWxsPSJ3aGl0ZSIvPjwvc3ZnPg==" alt="OpenClaw Compatible"></a>
-  <a href="https://discord.gg/u4ksqW5P"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
-</p>
-
-<p align="center">
-  <a href="../README.md">🇺🇸 English</a> ·
-  <a href="README_CN.md">🇨🇳 中文</a> ·
-  <a href="README_JA.md">🇯🇵 日本語</a> ·
-  <a href="README_KO.md">🇰🇷 한국어</a> ·
-  <a href="README_FR.md">🇫🇷 Français</a> ·
-  <a href="README_DE.md">🇩🇪 Deutsch</a> ·
-  <a href="README_ES.md">🇪🇸 Español</a> ·
-  <a href="README_PT.md">🇧🇷 Português</a> ·
-  <a href="README_RU.md">🇷🇺 Русский</a> ·
-  <a href="README_AR.md">🇸🇦 العربية</a>
-</p>
-
-<p align="center">
-  <a href="showcase/SHOWCASE.md">🏆 معرض الأوراق</a> · <a href="HITL_GUIDE.md">🧑‍✈️ دليل مساعد الطيار</a> · <a href="integration-guide.md">📖 دليل التكامل</a> · <a href="https://discord.gg/u4ksqW5P">💬 مجتمع Discord</a>
-</p>
-
----
-
-<table>
-<tr>
-<td width="18%">
-<a href="showcase/SHOWCASE.md"><img src="showcase/thumbnails/paper_I_random_matrix-01.png" width="120" alt="ورقة نموذجية"/></a>
-</td>
-<td valign="middle">
-<b>🏆 معرض الأوراق المُولّدة</b><br><br>
-<b>8 أوراق في 8 مجالات</b> — الرياضيات، الإحصاء، الأحياء، الحوسبة، NLP، RL، الرؤية الحاسوبية، المتانة — مُولّدة بشكل مستقل تماماً أو بتوجيه مساعد الطيار Human-in-the-Loop.<br><br>
-<a href="showcase/SHOWCASE.md"><img src="https://img.shields.io/badge/عرض_المعرض_الكامل_→-جميع_الأوراق_الـ8-d73a49?style=for-the-badge" alt="عرض المعرض"></a>
-</td>
-</tr>
-</table>
-
----
-
-> **🧪 نبحث عن مختبرين!** جرّب خط الأنابيب بفكرتك البحثية الخاصة — من أي مجال — و[أخبرنا برأيك](TESTER_GUIDE.md). ملاحظاتك تشكّل الإصدار القادم مباشرة. **[→ Testing Guide](TESTER_GUIDE.md)** | **[→ 中文测试指南](TESTER_GUIDE_CN.md)** | **[→ 日本語テストガイド](TESTER_GUIDE_JA.md)**
-
----
-
-## 🔥 News
-- **[05/19/2026]** **v0.5.0** — **وكلاء تجارب متعددة المجالات + ARC-Bench** — تحديثان رئيسيان. **(1) وكلاء تنفيذ متخصصون حسب المجال:** أصبحت مرحلة التجارب (المراحل 10–13) توجّه المهام إلى وكلاء متخصصين حسب التخصص بدلاً من بيئة ML الافتراضية فقط — **فيزياء الطاقة العالية** (ColliderAgent: FeynRules، MadGraph5، Delphes عبر سحابة Magnus)، و**الأحياء** (نمذجة الأيض على مستوى الجينوم باستخدام COBRApy)، و**الإحصاء** (وكيل دراسات المحاكاة)، مع منفّذ Docker عام للكيمياء/المواد. يختار المسار التنفيذي المنفّذ المناسب تلقائيًا حسب مجال البحث. **(2) ARC-Bench:** معيار مرجعي مفتوح للبحث الذاتي يضم **55 موضوعًا** يغطي **ML (25)، وفيزياء الطاقة العالية (10)، والكم (10)، والأحياء (7)، والإحصاء (3)**، مع بيان (manifest) ومقياس تقييم (rubric) لكل موضوع (`experiments/arc_bench/`، ومتاح أيضًا على [🤗 Hugging Face](https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench)). **[→ دليل تكامل المجالات](DOMAIN_INTEGRATION_GUIDE.md)**
-- **[04/01/2026]** **v0.4.0** — **نظام مساعد الطيار Human-in-the-Loop** — لم يعد AutoResearchClaw مستقلاً بالكامل فحسب. نظام HITL الجديد يضيف 6 أوضاع تدخل (`full-auto`، `gate-only`، `checkpoint`، `step-by-step`، `co-pilot`، `custom`)، وسياسات لكل مرحلة، وتعاون عميق بين الإنسان والذكاء الاصطناعي. يتضمن: ورشة الأفكار لتطوير الفرضيات التعاوني، متصفح خطوط الأساس لمراجعة تصميم التجارب، الكاتب المشارك للورقة للصياغة التعاونية، SmartPause (تدخل ديناميكي مدفوع بالثقة)، تعلّم التدخل ALHF، التحقق من الادعاءات لمكافحة الهلوسة، حواجز ميزانية التكلفة، تفريع خط الأنابيب لاستكشاف فرضيات متوازية، وأوامر CLI (`attach`/`status`/`approve`/`reject`/`guide`). **[→ دليل HITL الكامل](HITL_GUIDE.md)**
-- **[03/30/2026]** **تحميل مرن للمهارات** — يدعم AutoResearchClaw الآن تحميل مهارات مفتوحة المصدر ومخصصة من أي تخصص لتعزيز تجربتك البحثية. 20 مهارة مُحمّلة مسبقاً متضمنة كمراجع جاهزة للاستخدام، تغطي الكتابة العلمية وتصميم التجارب والكيمياء والأحياء والمزيد — بما في ذلك مهارة [A-Evolve](https://github.com/A-EVO-Lab/a-evolve) للتطور الذكي ساهم بها المجتمع. حمّل مهاراتك عبر `researchclaw skills install` أو ضع `SKILL.md` في `.claude/skills/`. انظر [مكتبة المهارات](#-مكتبة-المهارات).
-- **[03/22/2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) — **دعم متعدد المنصات + استقرار كبير** — يعمل AutoResearchClaw الآن مع أي وكيل متوافق مع ACP (Claude Code، Codex CLI، Copilot CLI، Gemini CLI، Kimi CLI) ويدعم منصات المراسلة (Discord، Telegram، Lark، WeChat) عبر جسر OpenClaw. واجهة خلفية جديدة لتوليد الكود عبر CLI-agent تفوّض المرحلتين 10 و13 لوكلاء CLI خارجيين مع التحكم في الميزانية وإدارة المهلة الزمنية. يتضمن نظام مكافحة التلفيق (VerifiedRegistry + حلقة تشخيص وإصلاح التجارب)، 100+ إصلاح أخطاء، إعادة هيكلة modular executor، كشف تلقائي لـ `--resume`، تعزيز إعادة محاولات LLM، وإصلاحات المجتمع.
-
-<details>
-<summary>الإصدارات السابقة</summary>
-
-- **[03/18/2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) — **OpenCode Beast Mode + Community Contributions** — New "Beast Mode" routes complex code generation to [OpenCode](https://github.com/anomalyco/opencode) with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
-- **[03/17/2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) — **MetaClaw Integration** — AutoResearchClaw now supports [MetaClaw](https://github.com/aiming-lab/MetaClaw) cross-run learning: pipeline failures → structured lessons → reusable skills, injected into all 23 stages. **+18.3%** robustness in controlled experiments. Opt-in (`metaclaw_bridge.enabled: true`), fully backward-compatible. See [Integration Guide](#-metaclaw-integration).
-- **[03/16/2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) — Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
-- **[03/15/2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) — We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.
-
-</details>
-
----
-
-## ⚡ أمر واحد. ورقة واحدة.
-
-```bash
-# مستقل تماماً — بدون تدخل بشري
-pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve
-
-# وضع مساعد الطيار — تعاون مع الذكاء الاصطناعي في نقاط القرار الرئيسية
-researchclaw run --topic "Your research idea here" --mode co-pilot
-```
-
-
----
-
-## 🤔 ما هذا؟
-
-**أنت تفكر. AutoResearchClaw يكتب. أنت توجّه القرارات الرئيسية.**
-
-أعطِ موضوعاً بحثياً — احصل على ورقة أكاديمية كاملة مع أدبيات حقيقية من OpenAlex و Semantic Scholar و arXiv، وتجارب في بيئة معزولة واعية بالعتاد (كشف تلقائي لـ GPU/MPS/CPU)، وتحليل إحصائي، ومراجعة أقران متعددة الوكلاء، و LaTeX جاهز للمؤتمرات يستهدف NeurIPS/ICML/ICLR. شغّله بشكل مستقل تماماً، أو استخدم **وضع مساعد الطيار** لتوجيه الذكاء الاصطناعي في نقاط القرار الحاسمة — اختر اتجاهات البحث، راجع تصاميم التجارب، وشارك في كتابة الورقة. بدون مراجع مُلفّقة.
-
-<table>
-<tr><td>📄</td><td><code>paper_draft.md</code></td><td>ورقة أكاديمية كاملة (مقدمة، أعمال سابقة، المنهجية، التجارب، النتائج، الخاتمة)</td></tr>
-<tr><td>📐</td><td><code>paper.tex</code></td><td>LaTeX جاهز للمؤتمرات (قوالب NeurIPS / ICLR / ICML)</td></tr>
-<tr><td>📚</td><td><code>references.bib</code></td><td>مراجع BibTeX حقيقية من OpenAlex و Semantic Scholar و arXiv — مُنقّحة تلقائياً لمطابقة الاستشهادات المضمّنة</td></tr>
-<tr><td>🔍</td><td><code>verification_report.json</code></td><td>تحقق من سلامة الاستشهادات على 4 طبقات + التحقق من الصلة (arXiv، CrossRef، DataCite، LLM)</td></tr>
-<tr><td>🧪</td><td><code>experiment runs/</code></td><td>كود مُولّد + نتائج البيئة المعزولة + مقاييس JSON منظمة</td></tr>
-<tr><td>📊</td><td><code>charts/</code></td><td>رسوم بيانية مُولّدة تلقائياً لمقارنة الظروف مع أشرطة الخطأ وفترات الثقة</td></tr>
-<tr><td>📝</td><td><code>reviews.md</code></td><td>مراجعة أقران متعددة الوكلاء مع فحص اتساق المنهجية والأدلة</td></tr>
-<tr><td>🧬</td><td><code>evolution/</code></td><td>دروس تعلّم ذاتي مستخلصة من كل تشغيل</td></tr>
-<tr><td>📦</td><td><code>deliverables/</code></td><td>جميع المخرجات النهائية في مجلد واحد — جاهزة للترجمة على Overleaf</td></tr>
-</table>
-
-يعمل خط الأنابيب **من البداية إلى النهاية** — بشكل مستقل تماماً أو بتعاون human-in-the-loop. عندما تفشل التجارب، يصلح نفسه. عندما لا تصمد الفرضيات، يغيّر المسار. عندما تكون الاستشهادات مُلفّقة، يزيلها. عندما تريد التوجيه، يتوقف ويستمع.
-
-🌍 **شغّله من أي مكان.** AutoResearchClaw ليس مقيّدًا بمنصة واحدة. استخدمه مستقلاً عبر CLI، أو وصّله بـ [OpenClaw](https://github.com/openclaw/openclaw)، أو ادمجه مع أي وكيل متوافق مع ACP — 🤖 Claude Code، 💻 Codex CLI، 🐙 Copilot CLI، ♊ Gemini CLI، 🌙 Kimi CLI، وغيرها. بفضل جسر الرسائل في OpenClaw، يمكنك إطلاق بحث كامل من 💬 Discord، ✈️ Telegram، 🐦 Lark (飞书)، 💚 WeChat، أو أي منصة يستخدمها فريقك بالفعل. موضوع واحد كمُدخل، ورقة بحثية كمُخرج — بغض النظر عن المكان الذي تكتب منه.
-
----
-
-## 🚀 البداية السريعة
-
-```bash
-# 1. استنساخ وتثبيت
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-python3 -m venv .venv && source .venv/bin/activate
-pip install -e .
-
-# 2. الإعداد (تفاعلي — يثبّت OpenCode beast mode، يتحقق من Docker/LaTeX)
-researchclaw setup
-
-# 3. التهيئة
-researchclaw init          # تفاعلي: اختر مزوّد LLM، ينشئ config.arc.yaml
-# أو يدوياً: cp config.researchclaw.example.yaml config.arc.yaml
-
-# 4. التشغيل
-export OPENAI_API_KEY="sk-..."
-researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve
-```
-
-المخرجات → `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` — LaTeX و BibTeX وكود التجارب والرسوم البيانية جاهزة للترجمة.
-
-<details>
-<summary>📝 الحد الأدنى من التهيئة المطلوبة</summary>
-
-```yaml
-project:
-  name: "my-research"
-
-research:
-  topic: "Your research topic here"
-
-llm:
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-4o"
-  fallback_models: ["gpt-4o-mini"]
-
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python"
-```
-
-</details>
-
----
-
-## 🧠 ما الذي يميّزه
-
-| القدرة | كيف يعمل |
-|-----------|-------------|
-| **🧑‍✈️ وضع مساعد الطيار** | 6 أوضاع تدخل — من مستقل تماماً إلى خطوة بخطوة. وجّه الذكاء الاصطناعي في القرارات الحاسمة (الفرضيات، خطوط الأساس، كتابة الورقة) أو دعه يعمل بحرية. SmartPause يكتشف تلقائياً متى يكون التدخل البشري مفيداً. |
-| **🔄 حلقة PIVOT / REFINE** | المرحلة 15 تقرر بشكل مستقل: PROCEED أو REFINE (تعديل المعاملات) أو PIVOT (اتجاه جديد). المخرجات تُحفظ بإصدارات تلقائياً. |
-| **🤖 نقاش متعدد الوكلاء** | توليد الفرضيات وتحليل النتائج ومراجعة الأقران تستخدم نقاشاً منظماً بوجهات نظر متعددة. |
-| **🧬 التعلّم الذاتي** | دروس مستخلصة من كل تشغيل (مبررات القرارات، تحذيرات وقت التشغيل، شذوذ المقاييس) مع تناقص زمني بنصف عمر 30 يوماً. التشغيلات المستقبلية تتعلم من الأخطاء السابقة. |
-| **📚 قاعدة المعرفة** | كل تشغيل يبني قاعدة معرفة منظمة عبر 6 فئات (قرارات، تجارب، اكتشافات، أدبيات، أسئلة، مراجعات). |
-| **🛡️ الحارس المراقب Sentinel** | مراقب جودة في الخلفية: كشف NaN/Inf، اتساق الورقة والأدلة، تقييم صلة الاستشهادات، حماية ضد التلفيق. |
-| **🔍 التحقق من الادعاءات** | فحص حقائق مضمّن: يستخلص الادعاءات من النص المُولّد بالذكاء الاصطناعي ويتحقق منها مقابل الأدبيات المجمّعة. يُبلّغ عن الاستشهادات غير المؤسسة والأرقام المُلفّقة. |
-| **🌿 استكشاف الفروع** | افرع خط الأنابيب لاستكشاف اتجاهات بحثية متعددة في وقت واحد، قارن النتائج جنباً إلى جنب، وادمج أفضل مسار. |
-
----
-
-## 🦞 تكامل OpenClaw
-
-<table>
-<tr>
-
-**AutoResearchClaw هو خدمة متوافقة مع [OpenClaw](https://github.com/openclaw/openclaw).** قم بتثبيته في OpenClaw وابدأ بحثاً مستقلاً برسالة واحدة — أو استخدمه بشكل مستقل عبر سطر الأوامر أو Claude Code أو أي مساعد برمجة بالذكاء الاصطناعي.
-
-</tr>
-</table>
-
-### 🚀 الاستخدام مع OpenClaw (موصى به)
-
-إذا كنت تستخدم [OpenClaw](https://github.com/openclaw/openclaw) بالفعل كمساعد ذكاء اصطناعي:
-
-```
-1️⃣  شارك رابط مستودع GitHub مع OpenClaw
-2️⃣  OpenClaw يقرأ تلقائياً RESEARCHCLAW_AGENTS.md → يفهم خط الأنابيب
-3️⃣  قل: "ابحث عن [موضوعك]"
-4️⃣  تم — OpenClaw يستنسخ، يثبّت، يهيّئ، يشغّل، ويعيد النتائج
-```
-
-**هذا كل شيء.** يتعامل OpenClaw مع `git clone`، `pip install`، إعداد التهيئة، وتنفيذ خط الأنابيب تلقائياً. أنت فقط تتحدث.
-
-<details>
-<summary>💡 ماذا يحدث خلف الكواليس</summary>
-
-1. يقرأ OpenClaw ملف `RESEARCHCLAW_AGENTS.md` → يتعلم دور منسّق البحث
-2. يقرأ OpenClaw ملف `README.md` → يفهم التثبيت وبنية خط الأنابيب
-3. يقرأ OpenClaw ملف `config.researchclaw.example.yaml` → `config.yaml`
-4. يسأل عن مفتاح API لنموذج اللغة (أو يستخدم متغير البيئة)
-5. يشغّل `pip install -e .` + `researchclaw run --topic "..." --auto-approve`
-6. يعيد الورقة و LaTeX والتجارب والاستشهادات
-
-</details>
-
-### 🔌 جسر OpenClaw (متقدم)
-
-للتكامل الأعمق، يتضمن AutoResearchClaw **نظام محوّلات جسر** مع 6 إمكانيات اختيارية:
-
-```yaml
-# config.arc.yaml
-openclaw_bridge:
-  use_cron: true              # ⏰ عمليات تشغيل بحث مجدولة
-  use_message: true           # 💬 إشعارات التقدم (Discord/Slack/Telegram)
-  use_memory: true            # 🧠 استمرارية المعرفة عبر الجلسات
-  use_sessions_spawn: true    # 🔀 إطلاق جلسات فرعية متوازية للمراحل المتزامنة
-  use_web_fetch: true         # 🌐 بحث ويب مباشر أثناء مراجعة الأدبيات
-  use_browser: false          # 🖥️ جمع الأوراق عبر المتصفح
-```
-
-كل علامة تفعّل بروتوكول محوّل مُحدد النوع. عندما يوفر OpenClaw هذه الإمكانيات، تستهلكها المحوّلات بدون تغييرات في الكود. راجع [`integration-guide.md`](integration-guide.md) للتفاصيل الكاملة.
-
-### ACP (Agent Client Protocol)
-
-يمكن لـ AutoResearchClaw استخدام **أي وكيل برمجة متوافق مع ACP** كواجهة خلفية لنموذج اللغة — بدون الحاجة لمفاتيح API. يتواصل الوكيل عبر [acpx](https://github.com/openclaw/acpx)، ويحافظ على جلسة واحدة مستمرة عبر جميع مراحل خط الأنابيب الـ 23.
-
-| الوكيل | الأمر | ملاحظات |
-|-------|---------|-------|
-| Claude Code | `claude` | Anthropic |
-| Codex CLI | `codex` | OpenAI |
-| Copilot CLI | `gh` | GitHub |
-| Gemini CLI | `gemini` | Google |
-| OpenCode | `opencode` | SST |
-| Kimi CLI | `kimi` | Moonshot |
-
-```yaml
-# config.yaml — مثال ACP
-llm:
-  provider: "acp"
-  acp:
-    agent: "claude"   # أي أمر CLI لوكيل متوافق مع ACP
-    cwd: "."          # دليل العمل للوكيل
-  # لا حاجة لـ base_url أو api_key — الوكيل يدير مصادقته بنفسه.
-```
-
-```bash
-# فقط شغّل — الوكيل يستخدم بيانات اعتماده الخاصة
-researchclaw run --config config.yaml --topic "Your research idea" --auto-approve
-```
-
-### 🛠️ طرق أخرى للتشغيل
-
-| الطريقة | الكيفية |
-|--------|-----|
-| **سطر أوامر مستقل** | `researchclaw run --topic "..." --auto-approve` (مستقل) أو `--mode co-pilot` (تعاوني) |
-| **واجهة Python البرمجية** | `from researchclaw.pipeline import Runner; Runner(config).run()` |
-| **Claude Code** | يقرأ `RESEARCHCLAW_CLAUDE.md` — فقط قل *"شغّل بحثاً عن [موضوع]"* |
-| **Copilot CLI** | `researchclaw run --topic "..."` مع `llm.acp.agent: "gh"` |
-| **OpenCode** | يقرأ `.claude/skills/` — نفس واجهة اللغة الطبيعية |
-| **أي واجهة ذكاء اصطناعي** | قدّم `RESEARCHCLAW_AGENTS.md` كسياق → الوكيل يبدأ تلقائياً |
-
----
-
-## 🔬 خط الأنابيب: 23 مرحلة، 8 أطوار
-
-```
-Phase A: تحديد نطاق البحث          Phase E: تنفيذ التجارب
-  1. TOPIC_INIT                      12. EXPERIMENT_RUN
-  2. PROBLEM_DECOMPOSE               13. ITERATIVE_REFINE  ← إصلاح ذاتي
-
-Phase B: اكتشاف الأدبيات          Phase F: التحليل والقرار
-  3. SEARCH_STRATEGY                 14. RESULT_ANALYSIS    ← متعدد الوكلاء
-  4. LITERATURE_COLLECT  ← API حقيقي  15. RESEARCH_DECISION  ← PIVOT/REFINE
-  5. LITERATURE_SCREEN   [بوابة]
-  6. KNOWLEDGE_EXTRACT               Phase G: كتابة الورقة
-                                     16. PAPER_OUTLINE
-Phase C: توليف المعرفة              17. PAPER_DRAFT
-  7. SYNTHESIS                       18. PEER_REVIEW        ← فحص الأدلة
-  8. HYPOTHESIS_GEN    ← نقاش        19. PAPER_REVISION
-
-Phase D: تصميم التجارب            Phase H: الإنهاء
-  9. EXPERIMENT_DESIGN   [بوابة]      20. QUALITY_GATE      [بوابة]
- 10. CODE_GENERATION                 21. KNOWLEDGE_ARCHIVE
- 11. RESOURCE_PLANNING               22. EXPORT_PUBLISH     ← LaTeX
-                                     23. CITATION_VERIFY    ← فحص الصلة
-```
-
-> **مراحل البوابات** (5، 9، 20) تتوقف للحصول على موافقة بشرية أو موافقة تلقائية مع `--auto-approve`. عند الرفض، يعود خط الأنابيب للخلف.
-
-> **وضع مساعد الطيار** (`--mode co-pilot`): تعاون عميق بين الإنسان والذكاء الاصطناعي في المراحل 7-8 (ورشة الأفكار)، المرحلة 9 (متصفح خطوط الأساس)، والمراحل 16-17 (الكاتب المشارك للورقة). المراحل الأخرى تُنفّذ تلقائياً مع مراقبة SmartPause.
-
-> **حلقات القرار**: يمكن للمرحلة 15 تفعيل REFINE (→ المرحلة 13) أو PIVOT (→ المرحلة 8)، مع إصدار تلقائي للمخرجات.
-
-<details>
-<summary>📋 ماذا يفعل كل طور</summary>
-
-| الطور | ما يحدث |
-|-------|-------------|
-| **A: تحديد النطاق** | يفكك نموذج اللغة الموضوع إلى شجرة مشاكل منظمة مع أسئلة بحثية |
-| **A+: العتاد** | كشف تلقائي لـ GPU (NVIDIA CUDA / Apple MPS / CPU فقط)، تحذير إذا كان العتاد المحلي محدوداً، تكييف توليد الكود وفقاً لذلك |
-| **B: الأدبيات** | بحث متعدد المصادر (OpenAlex → Semantic Scholar → arXiv) عن أوراق حقيقية، فرز حسب الصلة، استخلاص بطاقات معرفية |
-| **C: التوليف** | تجميع النتائج، تحديد فجوات البحث، توليد فرضيات قابلة للاختبار عبر نقاش متعدد الوكلاء |
-| **D: التصميم** | تصميم خطة التجارب، توليد كود Python قابل للتشغيل واعٍ بالعتاد (مستوى GPU → اختيار الحزم)، تقدير احتياجات الموارد |
-| **E: التنفيذ** | تشغيل التجارب في بيئة معزولة، كشف NaN/Inf وأخطاء وقت التشغيل، إصلاح ذاتي للكود عبر إصلاح مُستهدف بنموذج اللغة |
-| **F: التحليل** | تحليل متعدد الوكلاء للنتائج؛ قرار مستقل PROCEED / REFINE / PIVOT مع المبررات |
-| **G: الكتابة** | مخطط → صياغة قسم بقسم (5,000-6,500 كلمة) → مراجعات أقران (مع اتساق المنهجية والأدلة) → مراجعة مع حماية الطول |
-| **H: الإنهاء** | بوابة جودة، أرشفة المعرفة، تصدير LaTeX مع قالب المؤتمر، التحقق من سلامة الاستشهادات + الصلة |
-
-</details>
-
----
-
-## ✨ الميزات الرئيسية
-
-| الميزة | الوصف |
-|---------|------------|
-| **📚 أدبيات متعددة المصادر** | أوراق حقيقية من OpenAlex و Semantic Scholar و arXiv — توسيع الاستعلام، إزالة التكرار، قاطع دائرة مع تدهور أنيق |
-| **🔍 تحقق من الاستشهادات على 4 طبقات** | فحص arXiv ID → CrossRef/DataCite DOI → مطابقة عنوان Semantic Scholar → تقييم صلة LLM. المراجع المُلفّقة تُزال تلقائياً. |
-| **🖥️ تنفيذ واعٍ بالعتاد** | كشف تلقائي لـ GPU (NVIDIA CUDA / Apple MPS / CPU فقط) مع تكييف توليد الكود والاستيرادات ونطاق التجارب |
-| **🦾 OpenCode Beast Mode** | التجارب المعقدة تُوجّه تلقائياً إلى [OpenCode](https://github.com/anomalyco/opencode) — يولّد مشاريع متعددة الملفات مع بنى مخصصة وحلقات تدريب ودراسات استئصال. التثبيت عبر `researchclaw setup`. |
-| **🧪 تجارب في بيئة معزولة** | كود مُتحقق بـ AST، إطار غير قابل للتعديل، فشل سريع عند NaN/Inf، إصلاح ذاتي، تحسين تكراري (حتى 10 جولات)، التقاط نتائج جزئية |
-| **📝 كتابة بمستوى المؤتمرات** | قوالب NeurIPS/ICML/ICLR، صياغة قسم بقسم (5,000-6,500 كلمة)، حماية ضد التلفيق، حماية طول المراجعة، فرض مضاد لإخلاءات المسؤولية |
-| **📐 تبديل القوالب** | `neurips_2025`، `iclr_2026`، `icml_2026` — Markdown → LaTeX مع رياضيات وجداول وأشكال ومراجع تبادلية و `\cite{}` |
-| **🛡️ مكافحة التلفيق** | VerifiedRegistry يفرض بيانات تجارب حقيقية في الأوراق. تشخيص تلقائي للتجارب الفاشلة وإصلاحها قبل الكتابة. الأرقام غير المُتحقق منها تُنقّى. |
-| **🚦 بوابات الجودة** | 3 بوابات بمشاركة بشرية (المراحل 5، 9، 20) مع إمكانية التراجع. تخطّ باستخدام `--auto-approve`. |
-| **🧑‍✈️ مساعد الطيار HITL** | 6 أوضاع تدخل مع سياسات لكل مرحلة. ورشة الأفكار، متصفح خطوط الأساس، الكاتب المشارك للورقة للتعاون العميق. SmartPause، حواجز التكلفة، سياسات التصعيد، وتعلّم التدخل لسلامة الإنتاج. محوّلات CLI/WebSocket/MCP. |
-| **💰 حواجز التكلفة** | مراقبة الميزانية مع تنبيهات عتبات قابلة للتهيئة (50%/80%/100%). خط الأنابيب يتوقف تلقائياً عند تجاوز الميزانية. |
-| **🔐 قابلية إعادة الإنتاج** | مجاميع اختبارية SHA256 لجميع مخرجات المراحل. بيانات غير قابلة للتعديل للتحقق. تراجع متعدد المستويات مع لقطات مُصدّرة. |
-
----
-
-## 🧑‍✈️ مساعد الطيار Human-in-the-Loop
-
-**يقدّم AutoResearchClaw v0.4.0 نظام Human-in-the-Loop (HITL) متكاملاً** يحوّل خط الأنابيب من مستقل بالكامل إلى محرك بحث تعاوني بين الإنسان والذكاء الاصطناعي. اختر مستوى مشاركتك:
-
-### أوضاع التدخل
-
-| الوضع | الأمر | ماذا يفعل |
-|------|---------|-------------|
-| **مستقل تماماً** | `--auto-approve` | السلوك الأصلي — بدون تدخل بشري |
-| **بوابات فقط** | `--mode gate-only` | توقف عند 3 مراحل بوابات (5، 9، 20) للموافقة |
-| **نقاط تفتيش** | `--mode checkpoint` | توقف عند كل حدود طور (8 نقاط تفتيش) |
-| **مساعد الطيار** | `--mode co-pilot` | تعاون عميق في المراحل الحاسمة، تلقائي في البقية |
-| **خطوة بخطوة** | `--mode step-by-step` | توقف بعد كل مرحلة — تعلّم خط الأنابيب |
-| **سريع** | `--mode express` | مراجعة سريعة — فقط 3 بوابات أكثر أهمية |
-
-### سير عمل مساعد الطيار
-
-```
-You: researchclaw run --topic "Quantum noise as neural network regularization" --mode co-pilot
-
-خط الأنابيب يشغّل المراحل 1-7 تلقائياً...
-
-  ┌─────────────────────────────────────────────────────────────┐
-  │  HITL | المرحلة 08: HYPOTHESIS_GEN                            │
-  │  مراجعة ما بعد المرحلة                                        │
-  │                                                             │
-  │  الفرضيات المذكورة: 3                                        │
-  │  درجة الجدّة: 0.72 (متوسطة)                                   │
-  │                                                             │
-  │  [a] موافقة  [r] رفض  [e] تعديل  [c] تعاون                   │
-  │  [i] حقن توجيه  [v] عرض المخرجات  [q] إلغاء                  │
-  └─────────────────────────────────────────────────────────────┘
-
-You: c  (بدء محادثة تعاونية)
-You: الفرضية 3 مثيرة لكنها تحتاج Dropout/Label Smoothing كخطوط أساس
-AI:  تم التحديث — أُضيف Dropout، Label Smoothing، MixUp، CutMix كخطوط أساس...
-You: approve
-
-خط الأنابيب يستمر بفرضيتك المُحسّنة...
-```
-
-### أوامر CLI
-
-```bash
-# البدء بوضع HITL
-researchclaw run --topic "..." --mode co-pilot
-
-# الاتصال بخط أنابيب متوقف (من طرفية أخرى)
-researchclaw attach artifacts/rc-2026-xxx
-
-# التحقق من حالة خط الأنابيب و HITL
-researchclaw status artifacts/rc-2026-xxx
-
-# الموافقة/الرفض من طرفية أخرى أو سكريبت
-researchclaw approve artifacts/rc-2026-xxx --message "LGTM"
-researchclaw reject artifacts/rc-2026-xxx --reason "Missing key baseline"
-
-# حقن توجيه لمرحلة (حتى قبل تشغيلها)
-researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "Use ResNet-50 as primary baseline"
-```
-
-### القدرات الرئيسية
-
-| الميزة | الوصف |
-|---------|------------|
-| **ورشة الأفكار** | عصف ذهني وتقييم وتحسين الفرضيات تعاونياً (المرحلة 7-8) |
-| **متصفح خطوط الأساس** | الذكاء الاصطناعي يقترح خطوط أساس + الإنسان يضيف/يزيل + قائمة تحقق قابلية إعادة الإنتاج (المرحلة 9) |
-| **الكاتب المشارك للورقة** | صياغة قسم بقسم مع تحرير بشري وتنقيح بالذكاء الاصطناعي (المرحلة 16-19) |
-| **SmartPause** | توقف ديناميكي مدفوع بالثقة — يكتشف تلقائياً متى يكون التدخل البشري مفيداً |
-| **التحقق من الادعاءات** | فحص حقائق مضمّن مقابل الأدبيات المجمّعة — يُبلّغ عن الادعاءات غير المؤسسة |
-| **حواجز التكلفة** | مراقبة الميزانية مع تنبيهات عتبات 50%/80%/100% |
-| **تعلّم التدخل** | ALHF — يتعلم من أنماط مراجعتك لتحسين قرارات التوقف المستقبلية |
-| **استكشاف الفروع** | افرع خط الأنابيب لاستكشاف فرضيات متعددة، قارن، وادمج الأفضل |
-| **سياسة التصعيد** | إشعارات متدرجة (طرفية → Slack → بريد → توقف تلقائي) عند عدم المتابعة |
-| **3 محوّلات** | CLI (طرفية)، WebSocket (لوحة ويب)، MCP (وكلاء خارجيون) |
-
-### التهيئة
-
-```yaml
-# config.arc.yaml
-hitl:
-  enabled: true
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | co-pilot | custom
-  cost_budget_usd: 50.0              # توقف عند تجاوز التكلفة للميزانية (0 = بلا حد)
-
-  notifications:
-    on_pause: true
-    on_quality_drop: true
-    channels: ["terminal"]            # terminal | slack | webhook
-
-  timeouts:
-    default_human_timeout_sec: 86400  # 24 ساعة انتظار افتراضي
-    auto_proceed_on_timeout: false
-
-  collaboration:
-    max_chat_turns: 50
-    save_chat_history: true
-
-  # سياسات مخصصة لكل مرحلة (اختياري، لوضع 'custom')
-  stage_policies:
-    8: { require_approval: true, enable_collaboration: true }
-    9: { require_approval: true, allow_edit_output: true }
-```
-
-### التوافق العكسي
-
-- **الافتراضي: مُعطّل.** بدون `hitl.enabled: true` أو `--mode`، يعمل خط الأنابيب تماماً كما كان.
-- **`--auto-approve` لا يزال يعمل.** يتجاوز وضع HITL.
-- **جميع الاختبارات الـ 2,699 الحالية تنجح** مع وجود كود HITL.
-
----
-
-## 🧠 تكامل MetaClaw
-
-**AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = خط أنابيب يتعلم من كل تشغيل.**
-
-يضيف MetaClaw **نقل المعرفة عبر التشغيلات** إلى AutoResearchClaw. عند التفعيل، يلتقط خط الأنابيب تلقائياً الدروس من الإخفاقات والتحذيرات، ويحوّلها إلى مهارات قابلة لإعادة الاستخدام، ويحقنها في جميع مراحل خط الأنابيب الـ 23 في التشغيلات اللاحقة — بحيث لا تتكرر نفس الأخطاء أبداً.
-
-### كيف يعمل
-
-```
-Run N ينفّذ → الإخفاقات/التحذيرات تُلتقط كـ Lessons
-                      ↓
-          MetaClaw Lesson → تحويل إلى Skill
-                      ↓
-          ملفات arc-* Skill تُخزّن في ~/.metaclaw/skills/
-                      ↓
-Run N+1 → build_overlay() يحقن المهارات في كل أمر LLM
-                      ↓
-          LLM يتجنب المزالق المعروفة → جودة أعلى، محاولات أقل
-```
-
-### الإعداد السريع
-
-```bash
-# 1. تثبيت MetaClaw (إذا لم يكن مُثبّتاً)
-pip install metaclaw
-
-# 2. التفعيل في التهيئة
-```
-
-```yaml
-# config.arc.yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000"        # وكيل MetaClaw (اختياري)
-  skills_dir: "~/.metaclaw/skills"          # أين تُخزّن المهارات
-  fallback_url: "https://api.openai.com/v1" # بديل LLM مباشر
-  fallback_api_key: ""                      # مفتاح API لعنوان البديل
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # تحويل التحذيرات + الأخطاء
-    max_skills_per_run: 3
-```
-
-```bash
-# 3. شغّل كالمعتاد — MetaClaw يعمل بشفافية
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-```
-
-بعد كل تشغيل، تحقق من `~/.metaclaw/skills/arc-*/SKILL.md` لمشاهدة المهارات التي تعلّمها خط أنابيبك.
-
-### نتائج التجارب
-
-في تجارب A/B مُحكمة (نفس الموضوع، نفس LLM، نفس التهيئة):
-
-| المقياس | خط الأساس | مع MetaClaw | التحسين |
-|---------|----------|---------------|----------|
-| معدل إعادة المحاولة لكل مرحلة | 10.5% | 7.9% | **-24.8%** |
-| عدد دورات REFINE | 2.0 | 1.2 | **-40.0%** |
-| إكمال مراحل خط الأنابيب | 18/19 | 19/19 | **+5.3%** |
-| درجة المتانة الإجمالية (مركّبة) | 0.714 | 0.845 | **+18.3%** |
-
-> درجة المتانة المركّبة هي متوسط مرجّح لمعدل إكمال المراحل (40%) وتقليل المحاولات (30%) وكفاءة دورات REFINE (30%).
-
-### التوافق العكسي
-
-- **الافتراضي: مُعطّل.** إذا كان `metaclaw_bridge` غائباً أو `enabled: false`، يعمل خط الأنابيب تماماً كما كان.
-- **بدون تبعيات جديدة.** MetaClaw اختياري — خط الأنابيب الأساسي يعمل بدونه.
-- **جميع الاختبارات الـ 2,699 الحالية تنجح** مع وجود كود التكامل.
-
----
-
-## 🧩 مكتبة المهارات
-
-يدعم AutoResearchClaw الآن تحميل **مهارات مفتوحة المصدر ومخصصة** لتعزيز تجربتك البحثية. نوفر أيضاً **20 مهارة مُدمجة مُحمّلة مسبقاً** (الكتابة العلمية، البحث في الأدبيات، الكيمياء، الأحياء، والمزيد) كمراجع جاهزة للاستخدام، توفر درجة عالية من المرونة فوراً. عطّل أي مهارة بإضافة `enabled: false` إلى بيانات YAML الوصفية.
-
-**نماذج من المهارات المُدمجة:**
-
-| الفئة | المهارة | الوصف |
-|----------|-------|-------------|
-| **الكتابة** | `scientific-writing` | بنية IMRAD، تنسيق الاستشهادات، إرشادات الإبلاغ |
-| **التخصص** | `chemistry-rdkit` | تحليل جزيئي، SMILES، بصمات جزيئية، اكتشاف الأدوية |
-| **التجارب** | `literature-search` | مراجعة منهجية، منهجية PRISMA |
-
-> شاهد جميع المهارات الـ 20 باستخدام `researchclaw skills list`.
-
-### تحميل مهاراتك الخاصة
-
-```bash
-# الخيار 1: تثبيت مهارة (تبقى عبر المشاريع)
-researchclaw skills install /path/to/my-skill/
-
-# الخيار 2: ضع SKILL.md في المشروع
-mkdir -p .claude/skills/my-custom-skill
-# ثم أنشئ SKILL.md مع بيانات YAML وصفية (name، description، trigger-keywords، applicable-stages)
-
-# الخيار 3: هيّئ مجلدات مهارات مشتركة في config.arc.yaml
-# skills:
-#   custom_dirs:
-#     - /path/to/team-shared-skills
-```
-
-### استخدام المهارات
-
-تُحمّل المهارات وتُحقن في أوامر LLM تلقائياً — لا حاجة لتفعيل يدوي. استخدم CLI للفحص:
-
-```bash
-researchclaw skills list               # عرض جميع المهارات المُحمّلة مع المصادر
-researchclaw skills validate ./my-skill # التحقق من تنسيق SKILL.md
-```
-
-تصفح مهارات المجتمع: [K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills) (150+ مهارة علمية عبر تخصصات متعددة).
-
----
-
-## ⚙️ مرجع التهيئة
-
-<details>
-<summary>انقر لتوسيع مرجع التهيئة الكامل</summary>
-
-```yaml
-# === المشروع ===
-project:
-  name: "my-research"              # معرّف المشروع
-  mode: "docs-first"               # docs-first | semi-auto | full-auto
-
-# === البحث ===
-research:
-  topic: "..."                     # موضوع البحث (مطلوب)
-  domains: ["ml", "nlp"]           # مجالات البحث للبحث في الأدبيات
-  daily_paper_count: 8             # عدد الأوراق المستهدف لكل استعلام بحث
-  quality_threshold: 4.0           # الحد الأدنى لدرجة جودة الأوراق
-
-# === وقت التشغيل ===
-runtime:
-  timezone: "America/New_York"     # للطوابع الزمنية
-  max_parallel_tasks: 3            # حد التجارب المتزامنة
-  approval_timeout_hours: 12       # مهلة مرحلة البوابة
-  retry_limit: 2                   # عدد إعادة المحاولة عند فشل المرحلة
-
-# === نموذج اللغة ===
-llm:
-  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
-  base_url: "https://..."          # نقطة نهاية API (مطلوب لـ openai-compatible)
-  api_key_env: "OPENAI_API_KEY"    # متغير بيئة لمفتاح API (مطلوب لـ openai-compatible)
-  api_key: ""                      # أو ضع المفتاح هنا مباشرة
-  primary_model: "gpt-4o"          # النموذج الأساسي
-  fallback_models: ["gpt-4o-mini"] # سلسلة النماذج الاحتياطية
-  s2_api_key: ""                   # مفتاح Semantic Scholar API (اختياري، حدود معدل أعلى)
-  acp:                             # يُستخدم فقط عند provider: "acp"
-    agent: "claude"                # أمر CLI لوكيل ACP (claude، codex، gemini، إلخ)
-    cwd: "."                       # دليل العمل للوكيل
-
-# === التجارب ===
-experiment:
-  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
-  time_budget_sec: 300             # أقصى وقت تنفيذ لكل تشغيل (الافتراضي: 300 ثانية)
-  max_iterations: 10               # أقصى عدد تكرارات التحسين
-  metric_key: "val_loss"           # اسم المقياس الأساسي
-  metric_direction: "minimize"     # minimize | maximize
-  sandbox:
-    python_path: ".venv/bin/python"
-    gpu_required: false
-    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
-    max_memory_mb: 4096
-  docker:
-    image: "researchclaw/experiment:latest"
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    auto_install_deps: true        # كشف تلقائي للاستيراد → requirements.txt
-  ssh_remote:
-    host: ""                       # اسم مضيف خادم GPU
-    gpu_ids: []                    # معرّفات GPU المتاحة
-    remote_workdir: "/tmp/researchclaw_experiments"
-  opencode:                          # OpenCode Beast Mode (يُثبّت تلقائياً عبر `researchclaw setup`)
-    enabled: true                    # المفتاح الرئيسي (الافتراضي: true)
-    auto: true                       # تشغيل تلقائي بدون تأكيد (الافتراضي: true)
-    complexity_threshold: 0.2        # 0.0-1.0 — أعلى = فقط للتجارب المعقدة
-    model: ""                        # تجاوز النموذج (فارغ = يستخدم llm.primary_model)
-    timeout_sec: 600                 # أقصى ثوانٍ لتوليد OpenCode
-    max_retries: 1                   # عدد المحاولات عند الفشل
-    workspace_cleanup: true          # حذف مساحة العمل المؤقتة بعد الجمع
-  code_agent:                        # CodeAgent v2 — توليد كود متعدد المراحل
-    enabled: true                    # استخدام CodeAgent بدلاً من التوليد القديم بأمر واحد
-    architecture_planning: true      # توليد مخطط تنفيذ تفصيلي قبل البرمجة
-    sequential_generation: true      # توليد الملفات واحداً تلو الآخر حسب رسم التبعيات DAG
-    hard_validation: true            # بوابات تحقق مبنية على AST (تمنع الاستئصالات المتطابقة والمقاييس المشفرة)
-    hard_validation_max_repairs: 2   # أقصى محاولات إصلاح عند فشل التحقق
-    exec_fix_max_iterations: 3       # محاولات إصلاح أثناء التنفيذ
-    exec_fix_timeout_sec: 60         # مهلة لكل محاولة إصلاح
-  benchmark_agent:                   # BenchmarkAgent — اختيار تلقائي لمجموعات البيانات وخطوط الأساس
-    enabled: true                    # تفعيل خط أنابيب من 4 وكلاء (Surveyor→Selector→Acquirer→Validator)
-    enable_hf_search: true           # البحث في HuggingFace Datasets
-    enable_web_search: true          # البحث عن المعايير في Google Scholar
-    tier_limit: 2                    # تصفية مستوى مجموعات البيانات (1=صغيرة/مخزنة، 2=متوسطة، 3=كبيرة)
-    min_benchmarks: 1                # الحد الأدنى لمجموعات البيانات المطلوبة
-    min_baselines: 2                 # الحد الأدنى لطرق خط الأساس المطلوبة
-  figure_agent:                      # FigureAgent — توليد أشكال أكاديمية
-    enabled: true                    # تفعيل خط أنابيب من 5 وكلاء (Planner→CodeGen→Renderer→Critic→Integrator)
-    min_figures: 3                   # الحد الأدنى للأشكال المُولّدة
-    max_figures: 8                   # الحد الأقصى للأشكال
-    max_iterations: 3                # تكرارات التحسين عبر Critic
-    dpi: 300                         # دقة المخرجات
-    strict_mode: false               # فشل خط الأنابيب إذا فشل توليد الأشكال
-  repair:                            # مكافحة التلفيق — إصلاح التجارب
-    enabled: true                    # تشخيص وإصلاح التجارب الفاشلة تلقائياً
-    max_cycles: 3                    # دورات الإصلاح
-    min_completion_rate: 0.5         # >=50% من الشروط يجب أن تكتمل للمتابعة
-    min_conditions: 2                # شرطان على الأقل لتجربة صالحة
-    use_opencode: true               # توجيه الإصلاحات عبر OpenCode Beast Mode
-
-# === البحث على الويب (اختياري) ===
-web_search:
-  enabled: true                      # تفعيل البحث في الأدبيات مع الويب
-  tavily_api_key_env: "TAVILY_API_KEY"  # متغير بيئة لمفتاح Tavily API (اختياري)
-  enable_scholar: true               # البحث في Google Scholar
-  enable_pdf_extraction: true        # استخلاص نص من ملفات PDF
-  max_web_results: 10                # أقصى نتائج ويب لكل استعلام
-
-# === التصدير ===
-export:
-  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
-  authors: "Anonymous"
-  bib_file: "references"
-
-# === الأوامر النصية ===
-prompts:
-  custom_file: ""                  # مسار ملف YAML للأوامر المخصصة (فارغ = الافتراضي)
-
-# === مساعد الطيار HITL (جديد في v0.4.0) ===
-hitl:
-  enabled: false                     # اضبط على true لتفعيل HITL
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
-  cost_budget_usd: 0.0              # حد التكلفة بالدولار (0 = بلا حد)
-  notifications:
-    on_pause: true                   # إشعار عند توقف خط الأنابيب
-    on_quality_drop: true            # إشعار عند مشاكل الجودة
-    channels: ["terminal"]           # terminal | slack | webhook
-  timeouts:
-    default_human_timeout_sec: 86400 # انتظار حتى 24 ساعة للمدخلات البشرية
-    auto_proceed_on_timeout: false   # إذا true، موافقة تلقائية عند انتهاء المهلة
-  collaboration:
-    max_chat_turns: 50               # أقصى عدد جولات لكل جلسة تعاون
-    save_chat_history: true          # حفظ سجلات المحادثة
-  stage_policies: {}                 # تجاوزات لكل مرحلة (لوضع 'custom')
-
-# === الأمان ===
-security:
-  hitl_required_stages: [5, 9, 20] # المراحل التي تتطلب موافقة بشرية
-  allow_publish_without_approval: false
-  redact_sensitive_logs: true
-
-# === قاعدة المعرفة ===
-knowledge_base:
-  backend: "markdown"              # markdown | obsidian
-  root: "docs/kb"
-
-# === الإشعارات ===
-notifications:
-  channel: "console"               # console | discord | slack
-  target: ""
-
-# === جسر MetaClaw (اختياري) ===
-metaclaw_bridge:
-  enabled: false                   # اضبط على true لتفعيل التعلم عبر التشغيلات
-  proxy_url: "http://localhost:30000"  # عنوان وكيل MetaClaw
-  skills_dir: "~/.metaclaw/skills" # أين تُخزّن مهارات arc-*
-  fallback_url: ""                 # بديل LLM مباشر عند عدم توفر الوكيل
-  fallback_api_key: ""             # مفتاح API لنقطة نهاية البديل
-  lesson_to_skill:
-    enabled: true                  # تحويل الدروس إلى مهارات تلقائياً
-    min_severity: "warning"        # أدنى شدة للتحويل
-    max_skills_per_run: 3          # أقصى مهارات جديدة لكل تشغيل
-  prm:                             # بوابة جودة نموذج مكافأة العملية (اختياري)
-    enabled: false                 # استخدام LLM-as-judge لتقييم مخرجات المراحل
-    model: "gpt-5.4"              # نموذج حكم PRM
-    votes: 3                       # عدد التصويت بالأغلبية
-    gate_stages: [5, 9, 15, 20]   # المراحل لتطبيق بوابات PRM
-
-# === جسر OpenClaw ===
-openclaw_bridge:
-  use_cron: false                  # عمليات تشغيل بحث مجدولة
-  use_message: false               # إشعارات التقدم
-  use_memory: false                # استمرارية المعرفة عبر الجلسات
-  use_sessions_spawn: false        # إطلاق جلسات فرعية متوازية
-  use_web_fetch: false             # بحث ويب مباشر
-  use_browser: false               # جمع الأوراق عبر المتصفح
-```
-
-</details>
-
----
-
-## 🙏 شكر وتقدير
-
-مستوحى من:
-
-- 🔬 [AI Scientist](https://github.com/SakanaAI/AI-Scientist) (Sakana AI) — رائد البحث الآلي
-- 🧠 [AutoResearch](https://github.com/karpathy/autoresearch) (Andrej Karpathy) — أتمتة البحث من البداية إلى النهاية
-- 🌐 [FARS](https://analemma.ai/blog/introducing-fars/) (Analemma) — نظام بحث مؤتمت بالكامل
-
----
-
-## 📄 الرخصة
-
-MIT — راجع [LICENSE](../LICENSE) للتفاصيل.
-
----
-
-## 📌 الاستشهاد
-
-إذا وجدت AutoResearchClaw مفيداً، يرجى الاستشهاد:
-
-```bibtex
-@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
-      title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
-      author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
-      year={2026},
-      eprint={2605.20025},
-      archivePrefix={arXiv},
-      primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2605.20025},
-}
-```
-
-<p align="center">
-  <sub>بُني بـ 🦞 بواسطة فريق AutoResearchClaw</sub>
-</p>
diff --git a/docs/README_CN.md b/docs/README_CN.md
deleted file mode 100644
index a1af0d9e..00000000
--- a/docs/README_CN.md
+++ /dev/null
@@ -1,790 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="700" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center"><b>聊一个想法。出一篇论文。全自动、协作 & 自演化。</b></h2>
-
-
-
-<p align="center">
-  <b><i><font size="5">直接与 <a href="#openclaw-集成">OpenClaw</a> 对话："研究 X" → 搞定。</font></i></b>
-</p>
-
-<p align="center">
-  📄 <b>我们的论文已发布在 arXiv —— 欢迎阅读！</b> <a href="https://arxiv.org/abs/2605.20025"><i>AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration</i></a>
-</p>
-
-<p align="center">
-  <img src="../image/framework_v2.png" width="100%" alt="AutoResearchClaw Framework">
-</p>
-
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2605.20025"><img src="https://img.shields.io/badge/arXiv-2605.20025-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
-  <a href="https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ARC--Bench-yellow" alt="ARC-Bench on Hugging Face"></a>
-  <a href="../LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT License"></a>
-  <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" alt="Python 3.11+"></a>
-  <a href="#测试"><img src="https://img.shields.io/badge/Tests-2699%20passed-brightgreen?logo=pytest&logoColor=white" alt="2699 Tests Passed"></a>
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/GitHub-AutoResearchClaw-181717?logo=github" alt="GitHub"></a>
-  <a href="#openclaw-集成"><img src="https://img.shields.io/badge/OpenClaw-Compatible-ff4444?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6IiBmaWxsPSJ3aGl0ZSIvPjwvc3ZnPg==" alt="OpenClaw Compatible"></a>
-  <a href="https://discord.gg/u4ksqW5P"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
-</p>
-
-<p align="center">
-  <a href="../README.md">🇺🇸 English</a> ·
-  <a href="README_CN.md">🇨🇳 中文</a> ·
-  <a href="README_JA.md">🇯🇵 日本語</a> ·
-  <a href="README_KO.md">🇰🇷 한국어</a> ·
-  <a href="README_FR.md">🇫🇷 Français</a> ·
-  <a href="README_DE.md">🇩🇪 Deutsch</a> ·
-  <a href="README_ES.md">🇪🇸 Español</a> ·
-  <a href="README_PT.md">🇧🇷 Português</a> ·
-  <a href="README_RU.md">🇷🇺 Русский</a> ·
-  <a href="README_AR.md">🇸🇦 العربية</a>
-</p>
-
-<p align="center">
-  <a href="showcase/SHOWCASE.md">🏆 论文展示</a> · <a href="HITL_GUIDE.md">🧑‍✈️ 协同引导指南</a> · <a href="integration-guide.md">📖 集成指南</a> · <a href="https://discord.gg/u4ksqW5P">💬 Discord 社区</a>
-</p>
-
----
-
-<table>
-<tr>
-<td width="18%">
-<a href="showcase/SHOWCASE.md"><img src="showcase/thumbnails/paper_I_random_matrix-01.png" width="120" alt="Sample Paper"/></a>
-</td>
-<td valign="middle">
-<b>🏆 生成论文展示</b><br><br>
-<b>8 篇论文覆盖 8 个领域</b> — 数学、统计、生物、计算、NLP、RL、视觉、鲁棒性 — 完全自主生成，或通过人机协作的 Co-Pilot 引导。<br><br>
-<a href="showcase/SHOWCASE.md"><img src="https://img.shields.io/badge/View_Full_Showcase_→-All_8_Papers-d73a49?style=for-the-badge" alt="View Showcase"></a>
-</td>
-</tr>
-</table>
-
----
-
-> **🧪 我们正在寻找测试者！** 用你自己的研究想法试试这个流水线 — 任何领域 — 然后 [告诉我们你的反馈](TESTER_GUIDE.md)。你的反馈将直接影响下一个版本。 **[→ Testing Guide](TESTER_GUIDE.md)** | **[→ 中文测试指南](TESTER_GUIDE_CN.md)** | **[→ 日本語テストガイド](TESTER_GUIDE_JA.md)**
-
----
-
-## 🔥 News
-- **[05/19/2026]** **v0.5.0** — **多领域实验智能体 + ARC-Bench** — 两大更新。**(1) 领域专家执行智能体：** 实验阶段（第 10–13 阶段）不再局限于默认的 ML 沙箱，而是按学科路由到专业智能体——**高能物理**（ColliderAgent：拉格朗日量 → FeynRules → MadGraph5 → Delphes，经 Magnus 云）、**生物学**（COBRApy 全基因组代谢建模）与**统计学**（模拟研究智能体），并由通用 Docker 执行器覆盖化学/材料。流水线会根据研究领域自动选择执行器。**(2) ARC-Bench：** 一个 **55 个主题**的开放式自主研究基准，覆盖 **ML（25）、高能物理（10）、量子（10）、生物（7）、统计（3）**，每个主题都附带清单（manifest）与评分量规（rubric），位于 `experiments/arc_bench/`，并已发布到 [🤗 Hugging Face](https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench)。**[→ 领域集成指南](DOMAIN_INTEGRATION_GUIDE.md)**
-- **[04/01/2026]** **v0.4.0** — **人机协作 Co-Pilot 系统** — AutoResearchClaw 不再是纯自动化工具。新增 HITL 系统支持 6 种干预模式（`full-auto`、`gate-only`、`checkpoint`、`step-by-step`、`co-pilot`、`custom`），支持逐阶段策略配置与深度人机协作。包括：Idea Workshop（假设共创）、Baseline Navigator（实验设计审核）、Paper Co-Writer（协作撰写论文）、SmartPause（基于置信度的动态暂停）、ALHF 干预学习、反幻觉声明验证、成本预算护栏、流水线分支并行探索假设，以及 CLI 命令（`attach`/`status`/`approve`/`reject`/`guide`）。**[→ 完整 HITL 指南](HITL_GUIDE.md)**
-- **[03/30/2026]** **灵活技能加载** — AutoResearchClaw 现已支持从任何学科加载开源和自定义技能。内置 20 个预加载技能作为即用参考，覆盖科学写作、实验设计、化学、生物等领域，包括社区贡献的 [A-Evolve](https://github.com/A-EVO-Lab/a-evolve) 自进化技能。通过 `researchclaw skills install` 加载或将 `SKILL.md` 放入 `.claude/skills/`。参见[技能库](#-技能库)。
-- **[03/22/2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) — **跨平台支持 + 重大稳定性更新** — AutoResearchClaw 现已支持任何 ACP 兼容的 AI 代理后端（Claude Code、Codex CLI、Copilot CLI、Gemini CLI、Kimi CLI），并通过 OpenClaw 桥接支持消息平台（Discord、Telegram、飞书、微信）。新增 CLI-agent 代码生成后端，将 Stage 10 和 13 委托给外部 CLI agent，支持预算控制和超时管理。同时包含反数据捏造系统（VerifiedRegistry + 实验诊断与修复循环），100+ 个 bug 修复，模块化 executor 重构，`--resume` 自动检测，LLM 重试加固，以及社区反馈修复。
-
-<details>
-<summary>早期版本</summary>
-
-- **[03/18/2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) — **OpenCode Beast Mode + Community Contributions** — New "Beast Mode" routes complex code generation to [OpenCode](https://github.com/anomalyco/opencode) with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
-- **[03/17/2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) — **MetaClaw Integration** — AutoResearchClaw now supports [MetaClaw](https://github.com/aiming-lab/MetaClaw) cross-run learning: pipeline failures → structured lessons → reusable skills, injected into all 23 stages. **+18.3%** robustness in controlled experiments. Opt-in (`metaclaw_bridge.enabled: true`), fully backward-compatible. See [Integration Guide](#-metaclaw-integration).
-- **[03/16/2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) — Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
-- **[03/15/2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) — We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.
-
-</details>
-
----
-
-## ⚡ 一行命令。一篇论文。
-
-```bash
-# 完全自动 — 无需人工干预
-pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve
-
-# Co-Pilot 模式 — 在关键决策点与 AI 协作
-researchclaw run --topic "Your research idea here" --mode co-pilot
-```
-
-
----
-
-## 🤔 这是什么？
-
-**你有一个灵感，AutoResearchClaw 把它写出来。你来引导关键决策。**
-
-输入一个研究主题——获得一篇完整的学术论文，包含来自 OpenAlex、Semantic Scholar 和 arXiv 的真实文献，硬件感知沙箱实验（自动检测 GPU/MPS/CPU），统计分析，多 Agent 同行评审，以及面向 NeurIPS/ICML/ICLR 的顶会级 LaTeX。完全自主运行，或使用 **Co-Pilot 模式**在关键决策点引导 AI——选择研究方向、审核实验设计、协作撰写论文。不会出现幻觉引用。
-
-<table>
-<tr><td>📄</td><td><code>paper_draft.md</code></td><td>完整学术论文（引言、相关工作、方法、实验、结果、结论）</td></tr>
-<tr><td>📐</td><td><code>paper.tex</code></td><td>适配顶会模板的 LaTeX 文件（NeurIPS / ICLR / ICML）</td></tr>
-<tr><td>📚</td><td><code>references.bib</code></td><td>来自 OpenAlex、Semantic Scholar 和 arXiv 的真实 BibTeX 引用——自动精简至与正文引用一致</td></tr>
-<tr><td>🔍</td><td><code>verification_report.json</code></td><td>四层引用完整性 + 相关性核查（arXiv、CrossRef、DataCite、LLM）</td></tr>
-<tr><td>🧪</td><td><code>experiment runs/</code></td><td>生成的代码 + 沙箱结果 + 结构化 JSON 指标</td></tr>
-<tr><td>📊</td><td><code>charts/</code></td><td>自动生成的条件对比图（含误差线和置信区间）</td></tr>
-<tr><td>📝</td><td><code>reviews.md</code></td><td>多 Agent 同行评审（含方法论-证据一致性检查）</td></tr>
-<tr><td>🧬</td><td><code>evolution/</code></td><td>从每次运行中提取的自学习教训</td></tr>
-<tr><td>📦</td><td><code>deliverables/</code></td><td>所有最终产出集中在一个文件夹——可直接上传 Overleaf 编译</td></tr>
-</table>
-
-流水线**端到端运行** — 完全自动或人机协作。实验失败时自动修复。假设不成立时自主转向。引用是假的？自动删除。你想介入？它会暂停等候。
-
-🌍 **随处可用。** AutoResearchClaw 不绑定任何单一平台。你可以通过 CLI 独立运行，接入 [OpenClaw](https://github.com/openclaw/openclaw)，或对接任何 ACP 兼容的 AI 代理 —— 🤖 Claude Code、💻 Codex CLI、🐙 Copilot CLI、♊ Gemini CLI、🌙 Kimi CLI，应有尽有。而且，借助 OpenClaw 的消息桥接能力，你还可以从 💬 Discord、✈️ Telegram、🐦 飞书、💚 微信，或任何你团队日常使用的平台发起一次完整的研究。输入一个课题，输出一篇论文 —— 无论你在哪里输入。
-
----
-
-## 🚀 快速开始
-
-```bash
-# 1. 克隆 & 安装
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-python3 -m venv .venv && source .venv/bin/activate
-pip install -e .
-
-# 2. 初始化（交互式 — 安装 OpenCode Beast Mode，检查 Docker/LaTeX）
-researchclaw setup
-
-# 3. 配置
-researchclaw init          # 交互式：选择 LLM 提供商，创建 config.arc.yaml
-# 或手动：cp config.researchclaw.example.yaml config.arc.yaml
-
-# 4. 运行
-export OPENAI_API_KEY="sk-..."
-researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve
-```
-
-输出 → `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` — 可编译的 LaTeX、BibTeX、实验代码、图表。
-
-<details>
-<summary>📝 最小必要配置</summary>
-
-```yaml
-project:
-  name: "my-research"
-
-research:
-  topic: "Your research topic here"
-
-llm:
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-4o"
-  fallback_models: ["gpt-4o-mini"]
-
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python"
-```
-
-</details>
-
----
-
-## 🧠 有什么不同
-
-| 能力 | 工作原理 |
-|------|----------|
-| **🧑‍✈️ Co-Pilot 模式** | 6 种干预模式 — 从完全自动到逐步引导。在关键决策（假设、基线、论文写作）时引导 AI，或放手让它自由运行。SmartPause 自动检测何时需要人类输入。 |
-| **🔄 PIVOT / REFINE 循环** | 第 15 阶段自主决策：PROCEED、REFINE（调参）或 PIVOT（新方向）。产物自动版本化。 |
-| **🤖 多 Agent 辩论** | 假设生成、结果分析、同行评审均使用结构化的多视角辩论。 |
-| **🧬 自学习** | 每次运行提取教训（决策理由、运行时警告、指标异常），30 天时间衰减。未来运行从过去的错误中学习。 |
-| **📚 知识库** | 每次运行在 6 个类别（决策、实验、发现、文献、问题、评审）中构建结构化知识库。 |
-| **🛡️ Sentinel 看门狗** | 后台质量监控：NaN/Inf 检测、论文-证据一致性、引用相关性评分、反数据捏造守卫。 |
-| **🔍 声明验证** | 内联事实检查：从 AI 生成的文本中提取声明，与收集的文献交叉比对。标记无依据的引用和捏造的数字。 |
-| **🌿 分支探索** | 分叉流水线以同时探索多个研究方向，并排比较结果，合并最佳路径继续推进。 |
-
----
-
-## 🦞 OpenClaw 集成
-
-<table>
-<tr>
-
-**AutoResearchClaw 是 [OpenClaw](https://github.com/openclaw/openclaw) 兼容服务。** 在 OpenClaw 中安装后，一句话即可启动自主研究——也可通过 CLI、Claude Code 或其他 AI 编码助手独立使用。
-
-</tr>
-</table>
-
-### 🚀 通过 OpenClaw 使用（推荐）
-
-如果你已经在使用 [OpenClaw](https://github.com/openclaw/openclaw) 作为 AI 助手：
-
-```
-1️⃣  把 GitHub 仓库地址分享给 OpenClaw
-2️⃣  OpenClaw 自动读取 RESEARCHCLAW_AGENTS.md → 理解流水线
-3️⃣  对它说："帮我研究 [你的主题]"
-4️⃣  完成 — OpenClaw 自动克隆、安装、配置、运行，然后返回结果
-```
-
-**就这么简单。** OpenClaw 自动处理 `git clone`、`pip install`、配置和流水线执行。你只需聊天。
-
-<details>
-<summary>💡 底层发生了什么</summary>
-
-1. OpenClaw 读取 `RESEARCHCLAW_AGENTS.md` → 学习研究编排器角色
-2. OpenClaw 读取 `README.md` → 理解安装方式和流水线结构
-3. OpenClaw 复制 `config.researchclaw.example.yaml` → `config.yaml`
-4. 向你询问 LLM API Key（或使用环境变量）
-5. 运行 `pip install -e .` + `researchclaw run --topic "..." --auto-approve`
-6. 返回论文、LaTeX、实验结果和引用
-
-</details>
-
-### 🔌 OpenClaw Bridge（高级功能）
-
-AutoResearchClaw 内置了 **Bridge 适配器系统**，提供 6 个可选集成能力：
-
-```yaml
-# config.arc.yaml
-openclaw_bridge:
-  use_cron: true              # ⏰ 定时研究任务
-  use_message: true           # 💬 进度通知（Discord/Slack/Telegram）
-  use_memory: true            # 🧠 跨会话知识持久化
-  use_sessions_spawn: true    # 🔀 为并行阶段派生子会话
-  use_web_fetch: true         # 🌐 文献检索中的实时网络搜索
-  use_browser: false          # 🖥️ 基于浏览器的论文采集
-```
-
-每个标志激活一个类型化适配器协议。当 OpenClaw 提供对应能力时，适配器无需改代码即可消费。详见 [`integration-guide.md`](integration-guide.md)。
-
-### ACP (Agent Client Protocol)
-
-AutoResearchClaw 可以使用**任何 ACP 兼容的编码 Agent** 作为其 LLM 后端——无需 API 密钥。Agent 通过 [acpx](https://github.com/openclaw/acpx) 通信，在全部 23 个流水线阶段中维持单个持久会话。
-
-| Agent | 命令 | 备注 |
-|-------|------|------|
-| Claude Code | `claude` | Anthropic |
-| Codex CLI | `codex` | OpenAI |
-| Copilot CLI | `gh` | GitHub |
-| Gemini CLI | `gemini` | Google |
-| OpenCode | `opencode` | SST |
-| Kimi CLI | `kimi` | Moonshot |
-
-```yaml
-# config.yaml — ACP 示例
-llm:
-  provider: "acp"
-  acp:
-    agent: "claude"   # 任何 ACP 兼容的 Agent CLI 命令
-    cwd: "."          # Agent 的工作目录
-  # 无需 base_url 或 api_key — Agent 自行处理认证。
-```
-
-```bash
-# 直接运行 — Agent 使用自己的凭据
-researchclaw run --config config.yaml --topic "Your research idea" --auto-approve
-```
-
-### 🛠️ 其他运行方式
-
-| 方式 | 怎么用 |
-|------|--------|
-| **独立 CLI** | `researchclaw run --topic "..." --auto-approve`（自动）或 `--mode co-pilot`（协作） |
-| **Python API** | `from researchclaw.pipeline import Runner; Runner(config).run()` |
-| **Claude Code** | 读取 `RESEARCHCLAW_CLAUDE.md` — 直接说 *"Run research on [主题]"* |
-| **Copilot CLI** | `researchclaw run --topic "..."` 配合 `llm.acp.agent: "gh"` |
-| **OpenCode** | 读取 `.claude/skills/` — 同样的自然语言交互 |
-| **任何 AI CLI** | 提供 `RESEARCHCLAW_AGENTS.md` 作为上下文 → agent 自动引导 |
-
----
-
-## 🔬 流水线：23 个阶段，8 个阶段组
-
-```
-阶段组 A：研究定义                阶段组 E：实验执行
-  1. TOPIC_INIT                    12. EXPERIMENT_RUN
-  2. PROBLEM_DECOMPOSE             13. ITERATIVE_REFINE  ← 自修复
-
-阶段组 B：文献发现                阶段组 F：分析与决策
-  3. SEARCH_STRATEGY               14. RESULT_ANALYSIS    ← 多Agent
-  4. LITERATURE_COLLECT ← 真实API  15. RESEARCH_DECISION  ← PIVOT/REFINE
-  5. LITERATURE_SCREEN  [门控]
-  6. KNOWLEDGE_EXTRACT             阶段组 G：论文撰写
-                                   16. PAPER_OUTLINE
-阶段组 C：知识综合                 17. PAPER_DRAFT
-  7. SYNTHESIS                     18. PEER_REVIEW        ← 证据审查
-  8. HYPOTHESIS_GEN   ← 辩论      19. PAPER_REVISION
-
-阶段组 D：实验设计                阶段组 H：终稿
-  9. EXPERIMENT_DESIGN  [门控]     20. QUALITY_GATE     [门控]
- 10. CODE_GENERATION               21. KNOWLEDGE_ARCHIVE
- 11. RESOURCE_PLANNING             22. EXPORT_PUBLISH    ← LaTeX
-                                   23. CITATION_VERIFY   ← 相关性审查
-```
-
-> **门控阶段**（5、9、20）可暂停等待人工审批，也可用 `--auto-approve` 自动通过。拒绝后流水线回滚。
-
-> **Co-Pilot 模式**（`--mode co-pilot`）：在阶段 7-8（Idea Workshop）、阶段 9（Baseline Navigator）和阶段 16-17（Paper Co-Writer）进行深度人机协作。其他阶段自动执行，SmartPause 持续监控。
-
-> **决策循环**：第 15 阶段可触发 REFINE（→ 第 13 阶段）或 PIVOT（→ 第 8 阶段），自动版本化之前的产物。
-
-<details>
-<summary>📋 各阶段组职责</summary>
-
-| 阶段组 | 做什么 |
-|--------|--------|
-| **A：定义** | LLM 将主题分解为结构化问题树和研究问题 |
-| **A+：硬件检测** | 自动检测 GPU（NVIDIA CUDA / Apple MPS / 纯 CPU），性能不足时警告用户，据此调整代码生成策略 |
-| **B：文献** | 多源搜索（OpenAlex → Semantic Scholar → arXiv）获取真实论文，按相关性筛选，提取知识卡片 |
-| **C：综合** | 聚类研究发现，识别研究空白，通过多 Agent 辩论生成可验证假设 |
-| **D：设计** | 设计实验方案，生成硬件感知的可运行 Python 代码（GPU 等级 → 包选择），估算资源需求 |
-| **E：执行** | 在沙箱中运行实验，检测 NaN/Inf 和运行时 Bug，通过定向 LLM 修复自愈代码 |
-| **F：分析** | 多 Agent 分析实验结果；自主 PROCEED / REFINE / PIVOT 决策并附理由 |
-| **G：写作** | 大纲 → 分段撰写初稿（5,000-6,500 词）→ 同行评审（含方法论-证据一致性）→ 带长度保障的修订 |
-| **H：终稿** | 质量门控，知识归档，LaTeX 导出（适配顶会模板），引用完整性 + 相关性核查 |
-
-</details>
-
----
-
-## ✨ 核心功能
-
-| 功能 | 说明 |
-|------|------|
-| **📚 多源文献** | 来自 OpenAlex、Semantic Scholar 和 arXiv 的真实论文——查询扩展、去重、三态熔断器与优雅降级 |
-| **🔍 四层引用核查** | arXiv ID 校验 → CrossRef/DataCite DOI → Semantic Scholar 标题匹配 → LLM 相关性评分。幻觉引用自动删除。 |
-| **🖥️ 硬件感知执行** | 自动检测 GPU（NVIDIA CUDA / Apple MPS / 纯 CPU），据此调整代码生成、import 和实验规模 |
-| **🦾 OpenCode Beast Mode** | 复杂实验自动路由至 [OpenCode](https://github.com/anomalyco/opencode)——生成多文件项目，含自定义架构、训练循环和消融实验。通过 `researchclaw setup` 安装。 |
-| **🧪 沙箱实验** | AST 验证代码、不可变 harness、NaN/Inf 快速失败、自修复、迭代优化（最多 10 轮）、部分结果捕获 |
-| **📝 顶会级写作** | NeurIPS/ICML/ICLR 模板，分段撰写（5,000-6,500 词），反数据捏造守卫、修订长度保障、反免责声明强制 |
-| **📐 模板切换** | `neurips_2025`、`iclr_2026`、`icml_2026` — Markdown → LaTeX，含数学公式、表格、图片、交叉引用、`\cite{}` |
-| **🛡️ 反数据捏造** | VerifiedRegistry 强制论文中使用经过验证的实验数据。自动诊断失败实验并在写作前修复。未验证数字被清理。 |
-| **🚦 质量门控** | 3 个人工审批门控（阶段 5、9、20），支持回滚。用 `--auto-approve` 跳过。 |
-| **🧑‍✈️ HITL Co-Pilot** | 6 种干预模式，支持逐阶段策略。Idea Workshop、Baseline Navigator、Paper Co-Writer 实现深度协作。SmartPause、成本护栏、升级策略和干预学习确保生产安全。CLI/WebSocket/MCP 适配器。 |
-| **💰 成本护栏** | 预算监控，可配置阈值告警（50%/80%/100%）。超出预算时流水线自动暂停。 |
-| **🔐 可复现性** | 所有阶段产物的 SHA256 校验和。不可变清单用于验证。多级撤销与版本化快照。 |
-
----
-
-## 🧑‍✈️ 人机协作 Co-Pilot
-
-**AutoResearchClaw v0.4.0 引入了完整的人机协作（HITL）系统**，将流水线从纯自动化转变为人机协作的研究引擎。选择你的参与程度：
-
-### 干预模式
-
-| 模式 | 命令 | 做什么 |
-|------|------|--------|
-| **完全自动** | `--auto-approve` | 原始行为——无人工干预 |
-| **仅门控** | `--mode gate-only` | 在 3 个门控阶段（5、9、20）暂停等待审批 |
-| **检查点** | `--mode checkpoint` | 在每个阶段组边界暂停（8 个检查点） |
-| **Co-Pilot** | `--mode co-pilot` | 在关键阶段深度协作，其余自动执行 |
-| **逐步** | `--mode step-by-step` | 每个阶段后暂停——用于学习流水线 |
-| **快速** | `--mode express` | 快速审核——仅 3 个最关键的门控 |
-
-### Co-Pilot 工作流
-
-```
-You: researchclaw run --topic "量子噪声作为神经网络正则化" --mode co-pilot
-
-流水线自动运行阶段 1-7...
-
-  ┌─────────────────────────────────────────────────────────────┐
-  │  HITL | Stage 08: HYPOTHESIS_GEN                            │
-  │  阶段后审查                                                  │
-  │                                                             │
-  │  提及的假设数: 3                                              │
-  │  新颖性得分: 0.72（中等）                                      │
-  │                                                             │
-  │  [a] 通过  [r] 拒绝  [e] 编辑  [c] 协作                      │
-  │  [i] 注入引导  [v] 查看输出  [q] 中止                          │
-  └─────────────────────────────────────────────────────────────┘
-
-You: c  (开始协作对话)
-You: 假设 3 很有趣，但需要 Dropout/Label Smoothing 作为基线
-AI:  已更新——添加了 Dropout、Label Smoothing、MixUp、CutMix 作为基线...
-You: approve
-
-流水线继续运行你优化后的假设...
-```
-
-### CLI 命令
-
-```bash
-# 以 HITL 模式启动
-researchclaw run --topic "..." --mode co-pilot
-
-# 附加到暂停的流水线（从另一个终端）
-researchclaw attach artifacts/rc-2026-xxx
-
-# 检查流水线和 HITL 状态
-researchclaw status artifacts/rc-2026-xxx
-
-# 从另一个终端或脚本审批/拒绝
-researchclaw approve artifacts/rc-2026-xxx --message "LGTM"
-researchclaw reject artifacts/rc-2026-xxx --reason "缺少关键基线"
-
-# 为某个阶段注入引导（甚至在它运行之前）
-researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "使用 ResNet-50 作为主要基线"
-```
-
-### 核心能力
-
-| 功能 | 说明 |
-|------|------|
-| **Idea Workshop** | 协作式头脑风暴、评估和优化假设（阶段 7-8） |
-| **Baseline Navigator** | AI 建议基线 + 人工增删 + 可复现性检查清单（阶段 9） |
-| **Paper Co-Writer** | 分段撰写，人工编辑与 AI 润色结合（阶段 16-19） |
-| **SmartPause** | 基于置信度的动态暂停——自动检测何时需要人类输入 |
-| **声明验证** | 与收集的文献进行内联事实检查——标记无依据的声明 |
-| **成本护栏** | 预算监控，50%/80%/100% 阈值告警 |
-| **干预学习** | ALHF——从你的审查模式中学习，优化未来的暂停决策 |
-| **分支探索** | 分叉流水线探索多个假设，比较后合并最佳路径 |
-| **升级策略** | 分级通知（终端 → Slack → 邮件 → 自动暂停），无人值守时触发 |
-| **3 种适配器** | CLI（终端）、WebSocket（Web 仪表板）、MCP（外部 Agent） |
-
-### 配置
-
-```yaml
-# config.arc.yaml
-hitl:
-  enabled: true
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | co-pilot | custom
-  cost_budget_usd: 50.0              # 超出预算时暂停（0 = 无限制）
-
-  notifications:
-    on_pause: true
-    on_quality_drop: true
-    channels: ["terminal"]            # terminal | slack | webhook
-
-  timeouts:
-    default_human_timeout_sec: 86400  # 默认等待 24 小时
-    auto_proceed_on_timeout: false
-
-  collaboration:
-    max_chat_turns: 50
-    save_chat_history: true
-
-  # 逐阶段自定义策略（可选，用于 'custom' 模式）
-  stage_policies:
-    8: { require_approval: true, enable_collaboration: true }
-    9: { require_approval: true, allow_edit_output: true }
-```
-
-### 向后兼容性
-
-- **默认：关闭。** 不设置 `hitl.enabled: true` 或 `--mode` 时，流水线行为与之前完全一致。
-- **`--auto-approve` 仍然有效。** 它会覆盖 HITL 模式。
-- **所有 2,699 项现有测试通过**（包含 HITL 代码）。
-
----
-
-## 🧠 MetaClaw 集成
-
-**AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = 一个能从每次运行中学习的流水线。**
-
-MetaClaw 为 AutoResearchClaw 添加了**跨运行知识迁移**。启用后，流水线会自动从失败和警告中提取教训，将其转化为可复用的技能，并在后续运行中注入到全部 23 个阶段——让同样的错误不再重犯。
-
-### 工作原理
-
-```
-运行 N 执行 → 失败/警告被捕获为 Lessons
-                      ↓
-          MetaClaw Lesson → Skill 转换
-                      ↓
-          arc-* Skill 文件存储在 ~/.metaclaw/skills/
-                      ↓
-运行 N+1 → build_overlay() 将技能注入每个 LLM 提示
-                      ↓
-          LLM 规避已知陷阱 → 更高质量，更少重试
-```
-
-### 快速配置
-
-```bash
-# 1. 安装 MetaClaw（如未安装）
-pip install metaclaw
-
-# 2. 在配置中启用
-```
-
-```yaml
-# config.arc.yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000"        # MetaClaw 代理（可选）
-  skills_dir: "~/.metaclaw/skills"          # 技能存储位置
-  fallback_url: "https://api.openai.com/v1" # 直连 LLM 回退
-  fallback_api_key: ""                      # 回退 URL 的 API key
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # 转换 warning + error
-    max_skills_per_run: 3
-```
-
-```bash
-# 3. 照常运行 — MetaClaw 透明运作
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-```
-
-每次运行后，查看 `~/.metaclaw/skills/arc-*/SKILL.md` 以了解流水线学到了哪些技能。
-
-### 实验结果
-
-在对照 A/B 实验中（相同主题、相同 LLM、相同配置）：
-
-| 指标 | 基线 | 使用 MetaClaw | 改善 |
-|------|------|---------------|------|
-| 阶段重试率 | 10.5% | 7.9% | **-24.8%** |
-| Refine 循环次数 | 2.0 | 1.2 | **-40.0%** |
-| 流水线阶段完成率 | 18/19 | 19/19 | **+5.3%** |
-| 整体鲁棒性得分（综合） | 0.714 | 0.845 | **+18.3%** |
-
-> 综合鲁棒性得分是阶段完成率（40%）、重试减少（30%）和 Refine 循环效率（30%）的加权平均。
-
-### 向后兼容性
-
-- **默认：关闭。** 如果 `metaclaw_bridge` 不存在或 `enabled: false`，流水线行为与之前完全一致。
-- **无新依赖。** MetaClaw 是可选的——核心流水线无需它即可运行。
-- **所有 2,699 项现有测试通过**（包含集成代码）。
-
----
-
-## 🧩 技能库
-
-AutoResearchClaw 现已支持加载**开源和自定义技能**，进一步增强你的研究体验。同时内置 **20 个预加载技能**（科学写作、文献搜索、化学、生物等）作为即用参考，开箱即用的灵活性极高。通过在技能的 frontmatter 中添加 `enabled: false` 可禁用任何技能。
-
-**内置技能示例：**
-
-| 类别 | 技能 | 说明 |
-|------|------|------|
-| **写作** | `scientific-writing` | IMRAD 结构、引用格式、报告规范 |
-| **领域** | `chemistry-rdkit` | 分子分析、SMILES、指纹、药物发现 |
-| **实验** | `literature-search` | 系统综述、PRISMA 方法论 |
-
-> 使用 `researchclaw skills list` 查看全部 20 个技能。
-
-### 加载自定义技能
-
-```bash
-# 方式 1：安装技能（跨项目持久化）
-researchclaw skills install /path/to/my-skill/
-
-# 方式 2：将 SKILL.md 放入项目中
-mkdir -p .claude/skills/my-custom-skill
-# 然后创建一个带有 YAML frontmatter 的 SKILL.md（name、description、trigger-keywords、applicable-stages）
-
-# 方式 3：在 config.arc.yaml 中配置共享技能目录
-# skills:
-#   custom_dirs:
-#     - /path/to/team-shared-skills
-```
-
-### 使用技能
-
-技能会自动加载并注入到 LLM 提示中——无需手动激活。使用 CLI 进行检查：
-
-```bash
-researchclaw skills list               # 显示所有已加载的技能及来源
-researchclaw skills validate ./my-skill # 检查 SKILL.md 格式
-```
-
-浏览社区技能：[K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills)（150+ 个跨学科的科学技能）。
-
----
-
-## ⚙️ 配置参考
-
-<details>
-<summary>点击展开完整配置参考</summary>
-
-```yaml
-# === 项目 ===
-project:
-  name: "my-research"              # 项目标识符
-  mode: "docs-first"               # docs-first | semi-auto | full-auto
-
-# === 研究 ===
-research:
-  topic: "..."                     # 研究主题（必填）
-  domains: ["ml", "nlp"]           # 文献搜索的研究领域
-  daily_paper_count: 8             # 每个搜索查询的目标论文数
-  quality_threshold: 4.0           # 论文最低质量分
-
-# === 运行时 ===
-runtime:
-  timezone: "America/New_York"     # 用于时间戳
-  max_parallel_tasks: 3            # 并发实验限制
-  approval_timeout_hours: 12       # 门控阶段超时
-  retry_limit: 2                   # 阶段失败重试次数
-
-# === LLM ===
-llm:
-  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
-  base_url: "https://..."          # API 端点（openai-compatible 必填）
-  api_key_env: "OPENAI_API_KEY"    # API key 环境变量（openai-compatible 必填）
-  api_key: ""                      # 或直接填写 key
-  primary_model: "gpt-4o"          # 主模型
-  fallback_models: ["gpt-4o-mini"] # 回退链
-  s2_api_key: ""                   # Semantic Scholar API key（可选，更高速率限制）
-  acp:                             # 仅在 provider: "acp" 时使用
-    agent: "claude"                # ACP Agent CLI 命令（claude, codex, gemini 等）
-    cwd: "."                       # Agent 的工作目录
-
-# === 实验 ===
-experiment:
-  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
-  time_budget_sec: 300             # 每次运行最大执行时间（默认：300 秒）
-  max_iterations: 10               # 最大优化迭代次数
-  metric_key: "val_loss"           # 主指标名称
-  metric_direction: "minimize"     # minimize | maximize
-  sandbox:
-    python_path: ".venv/bin/python"
-    gpu_required: false
-    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
-    max_memory_mb: 4096
-  docker:
-    image: "researchclaw/experiment:latest"
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    auto_install_deps: true        # 自动检测 import → requirements.txt
-  ssh_remote:
-    host: ""                       # GPU 服务器主机名
-    gpu_ids: []                    # 可用 GPU ID
-    remote_workdir: "/tmp/researchclaw_experiments"
-  opencode:                          # OpenCode Beast Mode（通过 `researchclaw setup` 自动安装）
-    enabled: true                    # 主开关（默认：true）
-    auto: true                       # 无需确认自动触发（默认：true）
-    complexity_threshold: 0.2        # 0.0-1.0 — 越高 = 仅在复杂实验时触发
-    model: ""                        # 覆盖模型（空 = 使用 llm.primary_model）
-    timeout_sec: 600                 # OpenCode 生成最大秒数
-    max_retries: 1                   # 失败重试次数
-    workspace_cleanup: true          # 采集后清理临时工作区
-  code_agent:                        # CodeAgent v2 — 多阶段代码生成
-    enabled: true                    # 使用 CodeAgent 替代传统单 prompt 代码生成
-    architecture_planning: true      # 生成代码前先生成深度实现蓝图
-    sequential_generation: true      # 按依赖 DAG 逐文件生成
-    hard_validation: true            # 基于 AST 的验证门控（拦截相同消融、硬编码指标）
-    hard_validation_max_repairs: 2   # 验证失败时最大修复次数
-    exec_fix_max_iterations: 3       # 执行修复循环最大次数
-    exec_fix_timeout_sec: 60         # 每次执行修复超时（秒）
-  benchmark_agent:                   # BenchmarkAgent — 自动数据集和基线选择
-    enabled: true                    # 启用 4-agent 基准测试流水线（Surveyor→Selector→Acquirer→Validator）
-    enable_hf_search: true           # 搜索 HuggingFace Datasets
-    enable_web_search: true          # 搜索 Google Scholar 获取基准
-    tier_limit: 2                    # 数据集级别过滤（1=小型/已缓存，2=中型，3=大型）
-    min_benchmarks: 1                # 最少需要的数据集数量
-    min_baselines: 2                 # 最少需要的基线方法数量
-  figure_agent:                      # FigureAgent — 学术图表生成
-    enabled: true                    # 启用 5-agent 图表流水线（Planner→CodeGen→Renderer→Critic→Integrator）
-    min_figures: 3                   # 最少生成图表数
-    max_figures: 8                   # 最多生成图表数
-    max_iterations: 3                # Critic 驱动的迭代优化次数
-    dpi: 300                         # 输出分辨率
-    strict_mode: false               # 图表生成失败时是否阻塞流水线
-  repair:                            # 反数据捏造实验修复
-    enabled: true                    # 自动诊断并修复失败的实验
-    max_cycles: 3                    # 修复重试循环数
-    min_completion_rate: 0.5         # >=50% 条件必须完成才可继续
-    min_conditions: 2                # 有效实验至少需要 2 个条件
-    use_opencode: true               # 通过 OpenCode Beast Mode 进行修复
-
-# === 网络搜索（可选）===
-web_search:
-  enabled: true                      # 启用网络增强文献搜索
-  tavily_api_key_env: "TAVILY_API_KEY"  # Tavily API key 环境变量（可选）
-  enable_scholar: true               # Google Scholar 搜索
-  enable_pdf_extraction: true        # 从 PDF 中提取文本
-  max_web_results: 10                # 每次查询最大网络结果数
-
-# === 导出 ===
-export:
-  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
-  authors: "Anonymous"
-  bib_file: "references"
-
-# === Prompts ===
-prompts:
-  custom_file: ""                  # 自定义 Prompt YAML 路径（空 = 使用默认）
-
-# === HITL Co-Pilot（v0.4.0 新增）===
-hitl:
-  enabled: false                     # 设为 true 以启用 HITL
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
-  cost_budget_usd: 0.0              # 成本限制（美元，0 = 无限制）
-  notifications:
-    on_pause: true                   # 流水线暂停时通知
-    on_quality_drop: true            # 质量下降时通知
-    channels: ["terminal"]           # terminal | slack | webhook
-  timeouts:
-    default_human_timeout_sec: 86400 # 最多等待人类输入 24 小时
-    auto_proceed_on_timeout: false   # 如为 true，超时后自动通过
-  collaboration:
-    max_chat_turns: 50               # 每次协作会话的最大轮数
-    save_chat_history: true          # 持久化聊天记录
-  stage_policies: {}                 # 逐阶段覆盖（用于 'custom' 模式）
-
-# === 安全 ===
-security:
-  hitl_required_stages: [5, 9, 20] # 需要人工审批的阶段
-  allow_publish_without_approval: false
-  redact_sensitive_logs: true
-
-# === 知识库 ===
-knowledge_base:
-  backend: "markdown"              # markdown | obsidian
-  root: "docs/kb"
-
-# === 通知 ===
-notifications:
-  channel: "console"               # console | discord | slack
-  target: ""
-
-# === MetaClaw Bridge（可选）===
-metaclaw_bridge:
-  enabled: false                   # 设为 true 以启用跨运行学习
-  proxy_url: "http://localhost:30000"  # MetaClaw 代理 URL
-  skills_dir: "~/.metaclaw/skills" # arc-* 技能的存储位置
-  fallback_url: ""                 # 代理不可用时的直连 LLM 回退
-  fallback_api_key: ""             # 回退端点的 API key
-  lesson_to_skill:
-    enabled: true                  # 自动将教训转换为技能
-    min_severity: "warning"        # 转换的最低严重级别
-    max_skills_per_run: 3          # 每次流水线运行的最大新技能数
-  prm:                             # 过程奖励模型质量门控（可选）
-    enabled: false                 # 使用 LLM-as-judge 评分阶段产出
-    model: "gpt-5.4"              # PRM 评判模型
-    votes: 3                       # 多数投票次数
-    gate_stages: [5, 9, 15, 20]   # 应用 PRM 门控的阶段
-
-# === OpenClaw Bridge ===
-openclaw_bridge:
-  use_cron: false                  # 定时研究运行
-  use_message: false               # 进度通知
-  use_memory: false                # 跨会话知识持久化
-  use_sessions_spawn: false        # 派生并行子会话
-  use_web_fetch: false             # 实时网络搜索
-  use_browser: false               # 基于浏览器的论文采集
-```
-
-</details>
-
----
-
-## 🙏 致谢
-
-灵感来源：
-
-- 🔬 [AI Scientist](https://github.com/SakanaAI/AI-Scientist)（Sakana AI）— 自动化研究先驱
-- 🧠 [AutoResearch](https://github.com/karpathy/autoresearch)（Andrej Karpathy）— 端到端研究自动化
-- 🌐 [FARS](https://analemma.ai/blog/introducing-fars/)（Analemma）— 全自动研究系统
-
----
-
-## 📄 许可证
-
-MIT — 详见 [LICENSE](../LICENSE)。
-
----
-
-## 📌 引用
-
-如果你觉得 AutoResearchClaw 有用，请引用：
-
-```bibtex
-@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
-      title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
-      author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
-      year={2026},
-      eprint={2605.20025},
-      archivePrefix={arXiv},
-      primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2605.20025},
-}
-```
-
-<p align="center">
-  <sub>Built with 🦞 by the AutoResearchClaw team</sub>
-</p>
diff --git a/docs/README_DE.md b/docs/README_DE.md
deleted file mode 100644
index e6d08f92..00000000
--- a/docs/README_DE.md
+++ /dev/null
@@ -1,790 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="700" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center"><b>Idee besprechen. Paper erhalten. Autonom, Kollaborativ & Selbstevolvierend.</b></h2>
-
-
-
-<p align="center">
-  <b><i><font size="5">Einfach mit <a href="#-openclaw-integration">OpenClaw</a> chatten: "Research X" → erledigt.</font></i></b>
-</p>
-
-<p align="center">
-  📄 <b>Unser Paper ist auf arXiv — schau es dir an!</b> <a href="https://arxiv.org/abs/2605.20025"><i>AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration</i></a>
-</p>
-
-<p align="center">
-  <img src="../image/framework_v2.png" width="100%" alt="AutoResearchClaw Framework">
-</p>
-
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2605.20025"><img src="https://img.shields.io/badge/arXiv-2605.20025-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
-  <a href="https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ARC--Bench-yellow" alt="ARC-Bench on Hugging Face"></a>
-  <a href="../LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT License"></a>
-  <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" alt="Python 3.11+"></a>
-  <a href="#testing"><img src="https://img.shields.io/badge/Tests-2699%20passed-brightgreen?logo=pytest&logoColor=white" alt="2699 Tests Passed"></a>
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/GitHub-AutoResearchClaw-181717?logo=github" alt="GitHub"></a>
-  <a href="#-openclaw-integration"><img src="https://img.shields.io/badge/OpenClaw-Compatible-ff4444?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6IiBmaWxsPSJ3aGl0ZSIvPjwvc3ZnPg==" alt="OpenClaw Compatible"></a>
-  <a href="https://discord.gg/u4ksqW5P"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
-</p>
-
-<p align="center">
-  <a href="../README.md">🇺🇸 English</a> ·
-  <a href="README_CN.md">🇨🇳 中文</a> ·
-  <a href="README_JA.md">🇯🇵 日本語</a> ·
-  <a href="README_KO.md">🇰🇷 한국어</a> ·
-  <a href="README_FR.md">🇫🇷 Français</a> ·
-  <a href="README_DE.md">🇩🇪 Deutsch</a> ·
-  <a href="README_ES.md">🇪🇸 Español</a> ·
-  <a href="README_PT.md">🇧🇷 Português</a> ·
-  <a href="README_RU.md">🇷🇺 Русский</a> ·
-  <a href="README_AR.md">🇸🇦 العربية</a>
-</p>
-
-<p align="center">
-  <a href="showcase/SHOWCASE.md">🏆 Paper-Showcase</a> · <a href="HITL_GUIDE.md">🧑‍✈️ Co-Pilot-Anleitung</a> · <a href="integration-guide.md">📖 Integrationsanleitung</a> · <a href="https://discord.gg/u4ksqW5P">💬 Discord-Community</a>
-</p>
-
----
-
-<table>
-<tr>
-<td width="18%">
-<a href="showcase/SHOWCASE.md"><img src="showcase/thumbnails/paper_I_random_matrix-01.png" width="120" alt="Sample Paper"/></a>
-</td>
-<td valign="middle">
-<b>🏆 Showcase generierter Paper</b><br><br>
-<b>8 Paper aus 8 Disziplinen</b> — Mathematik, Statistik, Biologie, Informatik, NLP, RL, Vision, Robustheit — vollstaendig autonom generiert oder mit Human-in-the-Loop Co-Pilot-Fuehrung.<br><br>
-<a href="showcase/SHOWCASE.md"><img src="https://img.shields.io/badge/View_Full_Showcase_→-All_8_Papers-d73a49?style=for-the-badge" alt="View Showcase"></a>
-</td>
-</tr>
-</table>
-
----
-
-> **🧪 Wir suchen Tester!** Teste die Pipeline mit deiner eigenen Forschungsidee — aus jedem Fachgebiet — und [sag uns, was du denkst](TESTER_GUIDE.md). Dein Feedback beeinflusst direkt die naechste Version. **[→ Testing Guide](TESTER_GUIDE.md)** | **[→ 中文测试指南](TESTER_GUIDE_CN.md)** | **[→ 日本語テストガイド](TESTER_GUIDE_JA.md)**
-
----
-
-## 🔥 News
-- **[05/19/2026]** **v0.5.0** — **Multi-Domain-Experimentagenten + ARC-Bench** — Zwei wesentliche Updates. **(1) Domaenenspezifische Ausfuehrungsagenten:** Die Experimentphase (Stufen 10–13) leitet nun ueber die Standard-ML-Sandbox hinaus an Fachagenten weiter — **Hochenergiephysik** (ColliderAgent: FeynRules → MadGraph5 → Delphes ueber die Magnus-Cloud), **Biologie** (COBRApy-Stoffwechselmodellierung im Genommassstab) und **Statistik** (Simulationsstudien-Agent), mit einem generischen Docker-Executor fuer Chemie/Materialien. Die Pipeline waehlt den passenden Executor automatisch anhand der Forschungsdomaene. **(2) ARC-Bench:** ein offener Benchmark fuer autonome Forschung mit **55 Themen**, der **ML (25), Hochenergiephysik (10), Quanten (10), Biologie (7) und Statistik (3)** abdeckt — jedes Thema mit Manifest und Bewertungsrubrik (`experiments/arc_bench/`, jetzt auch auf [🤗 Hugging Face](https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench)). **[→ Leitfaden zur Domaenenintegration](DOMAIN_INTEGRATION_GUIDE.md)**
-- **[04/01/2026]** **v0.4.0** — **Human-in-the-Loop Co-Pilot-System** — AutoResearchClaw ist nicht mehr rein autonom. Das neue HITL-System fuegt 6 Interventionsmodi hinzu (`full-auto`, `gate-only`, `checkpoint`, `step-by-step`, `co-pilot`, `custom`), stufenspezifische Richtlinien und tiefe Mensch-KI-Kollaboration. Enthalten: Ideen-Workshop zur gemeinsamen Hypothesenerstellung, Baseline-Navigator zur Ueberpruefung des Experimentdesigns, Paper-Co-Writer fuer kollaboratives Verfassen, SmartPause (konfidenzgesteuerte dynamische Intervention), ALHF-Interventionslernen, Anti-Halluzinations-Behauptungsverifikation, Kostenbudget-Leitplanken, Pipeline-Verzweigung fuer parallele Hypothesenerkundung und CLI-Befehle (`attach`/`status`/`approve`/`reject`/`guide`). **[→ Vollstaendige HITL-Anleitung](HITL_GUIDE.md)**
-- **[03/30/2026]** **Flexibles Skill-Laden** — AutoResearchClaw unterstuetzt jetzt das Laden von Open-Source- und benutzerdefinierten Skills aus jeder Disziplin, um Ihr Forschungserlebnis weiter zu verbessern. 20 vorinstallierte Skills sind als sofort einsetzbare Referenzen enthalten, die wissenschaftliches Schreiben, Experimentdesign, Chemie, Biologie und mehr abdecken — einschliesslich eines [A-Evolve](https://github.com/A-EVO-Lab/a-evolve) Agentic-Evolution-Skills, der von der Community beigesteuert wurde. Laden Sie eigene ueber `researchclaw skills install` oder legen Sie eine `SKILL.md` in `.claude/skills/` ab. Siehe [Skills-Bibliothek](#-skills-bibliothek).
-- **[03/22/2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) — **Plattformuebergreifende Unterstuetzung + grosse Stabilitaet** — AutoResearchClaw laeuft jetzt mit jedem ACP-kompatiblen Agenten-Backend (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) und unterstuetzt Messaging-Plattformen (Discord, Telegram, Lark, WeChat) ueber die OpenClaw-Bruecke. Neues CLI-Agent-Code-Generierungs-Backend delegiert Stages 10 und 13 an externe CLI-Agenten mit Budgetkontrolle und Timeout-Management. Enthaelt Anti-Fabrication-System (VerifiedRegistry + Experiment-Diagnose- und Reparaturschleife), 100+ Bugfixes, modulares Executor-Refactoring, `--resume` Auto-Erkennung, LLM-Retry-Haertung und Community-Fixes.
-
-<details>
-<summary>Frühere Versionen</summary>
-
-- **[03/18/2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) — **OpenCode Beast Mode + Community Contributions** — New "Beast Mode" routes complex code generation to [OpenCode](https://github.com/anomalyco/opencode) with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
-- **[03/17/2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) — **MetaClaw Integration** — AutoResearchClaw now supports [MetaClaw](https://github.com/aiming-lab/MetaClaw) cross-run learning: pipeline failures → structured lessons → reusable skills, injected into all 23 stages. **+18.3%** robustness in controlled experiments. Opt-in (`metaclaw_bridge.enabled: true`), fully backward-compatible. See [Integration Guide](#-metaclaw-integration).
-- **[03/16/2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) — Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
-- **[03/15/2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) — We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.
-
-</details>
-
----
-
-## ⚡ Ein Befehl. Ein Paper.
-
-```bash
-# Vollstaendig autonom — kein menschliches Eingreifen
-pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve
-
-# Co-Pilot-Modus — an wichtigen Entscheidungspunkten mit KI zusammenarbeiten
-researchclaw run --topic "Your research idea here" --mode co-pilot
-```
-
-
----
-
-## 🤔 Was ist das?
-
-**Du denkst es. AutoResearchClaw schreibt es. Du triffst die wichtigen Entscheidungen.**
-
-Gib ein Forschungsthema ein — erhalte ein vollstaendiges wissenschaftliches Paper mit echter Literatur von OpenAlex, Semantic Scholar und arXiv, hardwarebewussten Sandbox-Experimenten (automatische GPU/MPS/CPU-Erkennung), statistischer Analyse, Multi-Agenten-Peer-Review und konferenzfertigem LaTeX fuer NeurIPS/ICML/ICLR. Fuehre es vollstaendig autonom aus, oder verwende den **Co-Pilot-Modus**, um die KI an kritischen Entscheidungspunkten zu lenken — waehle Forschungsrichtungen, pruefe Experimentdesigns und verfasse das Paper gemeinsam. Keine halluzinierten Referenzen.
-
-<table>
-<tr><td>📄</td><td><code>paper_draft.md</code></td><td>Vollstaendiges wissenschaftliches Paper (Einleitung, Verwandte Arbeiten, Methode, Experimente, Ergebnisse, Fazit)</td></tr>
-<tr><td>📐</td><td><code>paper.tex</code></td><td>Konferenzfertiges LaTeX (NeurIPS / ICLR / ICML Templates)</td></tr>
-<tr><td>📚</td><td><code>references.bib</code></td><td>Echte BibTeX-Referenzen von OpenAlex, Semantic Scholar und arXiv — automatisch bereinigt, um Inline-Zitationen zu entsprechen</td></tr>
-<tr><td>🔍</td><td><code>verification_report.json</code></td><td>4-Schicht-Zitationsintegritaets- und Relevanzpruefung (arXiv, CrossRef, DataCite, LLM)</td></tr>
-<tr><td>🧪</td><td><code>experiment runs/</code></td><td>Generierter Code + Sandbox-Ergebnisse + strukturierte JSON-Metriken</td></tr>
-<tr><td>📊</td><td><code>charts/</code></td><td>Automatisch generierte Vergleichsdiagramme mit Fehlerbalken und Konfidenzintervallen</td></tr>
-<tr><td>📝</td><td><code>reviews.md</code></td><td>Multi-Agenten-Peer-Review mit Methodik-Evidenz-Konsistenzpruefungen</td></tr>
-<tr><td>🧬</td><td><code>evolution/</code></td><td>Selbstlernende Erkenntnisse aus jedem Durchlauf</td></tr>
-<tr><td>📦</td><td><code>deliverables/</code></td><td>Alle finalen Ergebnisse in einem Ordner — kompilierbereit fuer Overleaf</td></tr>
-</table>
-
-Die Pipeline laeuft **End-to-End** — vollstaendig autonom oder mit Human-in-the-Loop-Kollaboration. Wenn Experimente fehlschlagen, repariert sie sich selbst. Wenn Hypothesen nicht bestaetigt werden, schwenkt sie um. Wenn Zitationen gefaelscht sind, entfernt sie diese. Wenn du steuern moechtest, pausiert sie und hoert zu.
-
-🌍 **Ueberall ausfuehrbar.** AutoResearchClaw ist nicht an eine einzelne Plattform gebunden. Nutzen Sie es eigenstaendig ueber die CLI, verbinden Sie es mit [OpenClaw](https://github.com/openclaw/openclaw), oder integrieren Sie es mit jedem ACP-kompatiblen AI-Agenten — 🤖 Claude Code, 💻 Codex CLI, 🐙 Copilot CLI, ♊ Gemini CLI, 🌙 Kimi CLI und mehr. Dank der Messaging-Bruecke von OpenClaw koennen Sie eine komplette Forschung von 💬 Discord, ✈️ Telegram, 🐦 Lark (飞书), 💚 WeChat oder jeder anderen Plattform starten, die Ihr Team bereits nutzt. Ein Thema rein, ein Paper raus — egal wo Sie tippen.
-
----
-
-## 🚀 Schnellstart
-
-```bash
-# 1. Klonen & installieren
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-python3 -m venv .venv && source .venv/bin/activate
-pip install -e .
-
-# 2. Setup (interaktiv — installiert OpenCode Beast Mode, prueft Docker/LaTeX)
-researchclaw setup
-
-# 3. Konfigurieren
-researchclaw init          # Interaktiv: LLM-Anbieter waehlen, erstellt config.arc.yaml
-# Oder manuell: cp config.researchclaw.example.yaml config.arc.yaml
-
-# 4. Ausfuehren
-export OPENAI_API_KEY="sk-..."
-researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve
-```
-
-Ausgabe → `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` — kompilierfertiges LaTeX, BibTeX, Experimentcode, Diagramme.
-
-<details>
-<summary>📝 Minimale erforderliche Konfiguration</summary>
-
-```yaml
-project:
-  name: "my-research"
-
-research:
-  topic: "Your research topic here"
-
-llm:
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-4o"
-  fallback_models: ["gpt-4o-mini"]
-
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python"
-```
-
-</details>
-
----
-
-## 🧠 Was macht es anders
-
-| Faehigkeit | Funktionsweise |
-|-----------|---------------|
-| **🧑‍✈️ Co-Pilot-Modus** | 6 Interventionsmodi — von vollstaendig autonom bis Schritt-fuer-Schritt. Lenke die KI bei kritischen Entscheidungen (Hypothesen, Baselines, Paper-Erstellung) oder lass sie frei laufen. SmartPause erkennt automatisch, wann menschlicher Input hilfreich waere. |
-| **🔄 PIVOT / REFINE Schleife** | Stufe 15 entscheidet autonom: PROCEED, REFINE (Parameter anpassen) oder PIVOT (neue Richtung). Artefakte automatisch versioniert. |
-| **🤖 Multi-Agenten-Debatte** | Hypothesengenerierung, Ergebnisanalyse und Peer-Review verwenden jeweils strukturierte Multi-Perspektiven-Debatten. |
-| **🧬 Selbstlernen** | Erkenntnisse pro Durchlauf extrahiert (Entscheidungsbegruendungen, Laufzeitwarnungen, Metrikanaomalien) mit 30-Tage-Zeitabklingung. Zukuenftige Durchlaeufe lernen aus vergangenen Fehlern. |
-| **📚 Wissensdatenbank** | Jeder Durchlauf baut eine strukturierte KB ueber 6 Kategorien auf (Entscheidungen, Experimente, Ergebnisse, Literatur, Fragen, Reviews). |
-| **🛡️ Sentinel Watchdog** | Hintergrund-Qualitaetsmonitor: NaN/Inf-Erkennung, Paper-Evidenz-Konsistenz, Zitationsrelevanz-Bewertung, Anti-Fabrikationsschutz. |
-| **🔍 Behauptungsverifikation** | Inline-Faktencheck: extrahiert Behauptungen aus KI-generiertem Text und gleicht sie mit gesammelter Literatur ab. Markiert unbegruendete Zitationen und fabrizierte Zahlen. |
-| **🌿 Verzweigungserkundung** | Forke die Pipeline, um mehrere Forschungsrichtungen gleichzeitig zu erkunden, Ergebnisse nebeneinander zu vergleichen und den besten Pfad zusammenzufuehren. |
-
----
-
-## 🦞 OpenClaw-Integration
-
-<table>
-<tr>
-
-**AutoResearchClaw ist ein [OpenClaw](https://github.com/openclaw/openclaw)-kompatibler Dienst.** Installiere es in OpenClaw und starte autonome Forschung mit einer einzigen Nachricht — oder verwende es eigenstaendig ueber CLI, Claude Code oder jeden anderen KI-Coding-Assistenten.
-
-</tr>
-</table>
-
-### 🚀 Verwendung mit OpenClaw (empfohlen)
-
-Wenn du bereits [OpenClaw](https://github.com/openclaw/openclaw) als KI-Assistenten nutzt:
-
-```
-1️⃣  Teile die GitHub-Repo-URL mit OpenClaw
-2️⃣  OpenClaw liest automatisch RESEARCHCLAW_AGENTS.md → versteht die Pipeline
-3️⃣  Sage: "Research [dein Thema]"
-4️⃣  Fertig — OpenClaw klont, installiert, konfiguriert, fuehrt aus und liefert Ergebnisse
-```
-
-**Das war's.** OpenClaw uebernimmt `git clone`, `pip install`, Konfiguration und Pipeline-Ausfuehrung automatisch. Du chattest einfach.
-
-<details>
-<summary>💡 Was unter der Haube passiert</summary>
-
-1. OpenClaw liest `RESEARCHCLAW_AGENTS.md` → lernt die Forschungs-Orchestrator-Rolle
-2. OpenClaw liest `README.md` → versteht Installation und Pipeline-Struktur
-3. OpenClaw kopiert `config.researchclaw.example.yaml` → `config.yaml`
-4. Fragt nach deinem LLM-API-Schluessel (oder verwendet deine Umgebungsvariable)
-5. Fuehrt `pip install -e .` + `researchclaw run --topic "..." --auto-approve` aus
-6. Liefert Paper, LaTeX, Experimente und Zitationen zurueck
-
-</details>
-
-### 🔌 OpenClaw Bridge (Fortgeschritten)
-
-Fuer tiefere Integration enthaelt AutoResearchClaw ein **Bridge-Adapter-System** mit 6 optionalen Faehigkeiten:
-
-```yaml
-# config.arc.yaml
-openclaw_bridge:
-  use_cron: true              # ⏰ Geplante Forschungsdurchlaeufe
-  use_message: true           # 💬 Fortschrittsbenachrichtigungen (Discord/Slack/Telegram)
-  use_memory: true            # 🧠 Sitzungsuebergreifende Wissenspersistenz
-  use_sessions_spawn: true    # 🔀 Parallele Sub-Sessions fuer gleichzeitige Stufen
-  use_web_fetch: true         # 🌐 Live-Websuche waehrend der Literaturrecherche
-  use_browser: false          # 🖥️ Browserbasierte Paper-Sammlung
-```
-
-Jedes Flag aktiviert ein typisiertes Adapter-Protokoll. Wenn OpenClaw diese Faehigkeiten bereitstellt, nutzen die Adapter sie ohne Codeaenderungen. Siehe [`integration-guide.md`](integration-guide.md) fuer vollstaendige Details.
-
-### ACP (Agent Client Protocol)
-
-AutoResearchClaw kann **jeden ACP-kompatiblen Coding-Agenten** als LLM-Backend verwenden — keine API-Schluessel erforderlich. Der Agent kommuniziert ueber [acpx](https://github.com/openclaw/acpx) und haelt eine einzige persistente Sitzung ueber alle 23 Pipeline-Stufen aufrecht.
-
-| Agent | Befehl | Hinweise |
-|-------|--------|----------|
-| Claude Code | `claude` | Anthropic |
-| Codex CLI | `codex` | OpenAI |
-| Copilot CLI | `gh` | GitHub |
-| Gemini CLI | `gemini` | Google |
-| OpenCode | `opencode` | SST |
-| Kimi CLI | `kimi` | Moonshot |
-
-```yaml
-# config.yaml — ACP-Beispiel
-llm:
-  provider: "acp"
-  acp:
-    agent: "claude"   # Jeder ACP-kompatible Agent-CLI-Befehl
-    cwd: "."          # Arbeitsverzeichnis fuer den Agenten
-  # Kein base_url oder api_key noetig — der Agent verwaltet seine eigene Authentifizierung.
-```
-
-```bash
-# Einfach ausfuehren — der Agent verwendet seine eigenen Anmeldedaten
-researchclaw run --config config.yaml --topic "Your research idea" --auto-approve
-```
-
-### 🛠️ Weitere Ausfuehrungsmoeglichkeiten
-
-| Methode | Anleitung |
-|---------|-----------|
-| **Standalone CLI** | `researchclaw run --topic "..." --auto-approve` (autonom) oder `--mode co-pilot` (kollaborativ) |
-| **Python API** | `from researchclaw.pipeline import Runner; Runner(config).run()` |
-| **Claude Code** | Liest `RESEARCHCLAW_CLAUDE.md` — sage einfach *"Run research on [Thema]"* |
-| **Copilot CLI** | `researchclaw run --topic "..."` mit `llm.acp.agent: "gh"` |
-| **OpenCode** | Liest `.claude/skills/` — gleiche natuerliche Sprachschnittstelle |
-| **Jeder KI-CLI** | Uebergib `RESEARCHCLAW_AGENTS.md` als Kontext → Agent bootstrappt automatisch |
-
----
-
-## 🔬 Pipeline: 23 Stufen, 8 Phasen
-
-```
-Phase A: Forschungsplanung            Phase E: Experimentausfuehrung
-  1. TOPIC_INIT                          12. EXPERIMENT_RUN
-  2. PROBLEM_DECOMPOSE                   13. ITERATIVE_REFINE  ← Selbstheilung
-
-Phase B: Literaturrecherche            Phase F: Analyse & Entscheidung
-  3. SEARCH_STRATEGY                     14. RESULT_ANALYSIS    ← Multi-Agent
-  4. LITERATURE_COLLECT  ← echte API     15. RESEARCH_DECISION  ← PIVOT/REFINE
-  5. LITERATURE_SCREEN   [Gate]
-  6. KNOWLEDGE_EXTRACT                   Phase G: Papiererstellung
-                                         16. PAPER_OUTLINE
-Phase C: Wissenssynthese                 17. PAPER_DRAFT
-  7. SYNTHESIS                           18. PEER_REVIEW        ← Evidenzpruefung
-  8. HYPOTHESIS_GEN    ← Debatte         19. PAPER_REVISION
-
-Phase D: Experimentdesign             Phase H: Finalisierung
-  9. EXPERIMENT_DESIGN   [Gate]          20. QUALITY_GATE      [Gate]
- 10. CODE_GENERATION                     21. KNOWLEDGE_ARCHIVE
- 11. RESOURCE_PLANNING                   22. EXPORT_PUBLISH     ← LaTeX
-                                         23. CITATION_VERIFY    ← Relevanzpruefung
-```
-
-> **Gate-Stufen** (5, 9, 20) pausieren fuer menschliche Genehmigung oder werden mit `--auto-approve` automatisch genehmigt. Bei Ablehnung wird die Pipeline zurueckgesetzt.
-
-> **Co-Pilot-Modus** (`--mode co-pilot`): Tiefe Mensch-KI-Kollaboration in den Stufen 7-8 (Ideen-Workshop), Stufe 9 (Baseline-Navigator) und Stufen 16-17 (Paper-Co-Writer). Andere Stufen laufen automatisch mit SmartPause-Ueberwachung.
-
-> **Entscheidungsschleifen**: Stufe 15 kann REFINE (→ Stufe 13) oder PIVOT (→ Stufe 8) ausloesen, mit automatischer Artefakt-Versionierung.
-
-<details>
-<summary>📋 Was jede Phase bewirkt</summary>
-
-| Phase | Beschreibung |
-|-------|-------------|
-| **A: Planung** | LLM zerlegt das Thema in einen strukturierten Problembaum mit Forschungsfragen |
-| **A+: Hardware** | Automatische GPU-Erkennung (NVIDIA CUDA / Apple MPS / nur CPU), Warnung bei eingeschraenkter Hardware, Codegenerierung wird entsprechend angepasst |
-| **B: Literatur** | Multi-Source-Suche (OpenAlex → Semantic Scholar → arXiv) nach echten Papern, Relevanzscreening, Extraktion von Wissenskarten |
-| **C: Synthese** | Clustering der Ergebnisse, Identifizierung von Forschungsluecken, Generierung testbarer Hypothesen via Multi-Agenten-Debatte |
-| **D: Design** | Experimentplan entwerfen, hardwarebewussten ausfuehrbaren Python-Code generieren (GPU-Stufe → Paketauswahl), Ressourcenbedarf schaetzen |
-| **E: Ausfuehrung** | Experimente in Sandbox ausfuehren, NaN/Inf und Laufzeitfehler erkennen, Code via gezielter LLM-Reparatur selbst heilen |
-| **F: Analyse** | Multi-Agenten-Analyse der Ergebnisse; autonome PROCEED / REFINE / PIVOT Entscheidung mit Begruendung |
-| **G: Schreiben** | Gliederung → abschnittsweises Verfassen (5.000-6.500 Woerter) → Peer-Review (mit Methodik-Evidenz-Konsistenz) → Revision mit Laengenpruefung |
-| **H: Finalisierung** | Qualitaets-Gate, Wissensarchivierung, LaTeX-Export mit Konferenztemplate, Zitationsintegritaets- und Relevanzpruefung |
-
-</details>
-
----
-
-## ✨ Hauptfunktionen
-
-| Funktion | Beschreibung |
-|----------|-------------|
-| **📚 Multi-Source-Literatur** | Echte Paper von OpenAlex, Semantic Scholar und arXiv — Abfrageerweiterung, Deduplizierung, Circuit Breaker mit Graceful Degradation |
-| **🔍 4-Schicht-Zitationsverifikation** | arXiv-ID-Pruefung → CrossRef/DataCite-DOI → Semantic-Scholar-Titelabgleich → LLM-Relevanzbewertung. Halluzinierte Refs automatisch entfernt. |
-| **🖥️ Hardwarebewusste Ausfuehrung** | Automatische GPU-Erkennung (NVIDIA CUDA / Apple MPS / nur CPU) und Anpassung von Codegenerierung, Imports und Experimentumfang |
-| **🦾 OpenCode Beast Mode** | Komplexe Experimente werden automatisch an [OpenCode](https://github.com/anomalyco/opencode) weitergeleitet — generiert Multi-File-Projekte mit individuellen Architekturen, Trainingsschleifen und Ablationsstudien. Installation ueber `researchclaw setup`. |
-| **🧪 Sandbox-Experimente** | AST-validierter Code, unveraenderlicher Harness, NaN/Inf-Schnellabbruch, selbstheilende Reparatur, iterative Verfeinerung (bis zu 10 Runden), Teilergebnis-Erfassung |
-| **📝 Konferenzqualitaet** | NeurIPS/ICML/ICLR-Templates, abschnittsweises Verfassen (5.000-6.500 Woerter), Anti-Fabrikationsschutz, Revisions-Laengenschutz, Anti-Disclaimer-Durchsetzung |
-| **📐 Template-Umschaltung** | `neurips_2025`, `iclr_2026`, `icml_2026` — Markdown → LaTeX mit Mathematik, Tabellen, Abbildungen, Querverweisen, `\cite{}` |
-| **🛡️ Anti-Fabrikation** | VerifiedRegistry erzwingt Ground-Truth-Experimentdaten in Papern. Automatische Diagnose und Reparatur fehlgeschlagener Experimente vor dem Schreiben. Ungepruefte Zahlen bereinigt. |
-| **🚦 Qualitaets-Gates** | 3 Human-in-the-Loop-Gates (Stufen 5, 9, 20) mit Rollback. Ueberspringen mit `--auto-approve`. |
-| **🧑‍✈️ HITL-Co-Pilot** | 6 Interventionsmodi mit stufenspezifischen Richtlinien. Ideen-Workshop, Baseline-Navigator, Paper-Co-Writer fuer tiefe Kollaboration. SmartPause, Kostenbudget-Leitplanken, Eskalationsrichtlinien und Interventionslernen fuer Produktionssicherheit. CLI/WebSocket/MCP-Adapter. |
-| **💰 Kostenbudget-Leitplanken** | Budgetueberwachung mit konfigurierbaren Schwellenwert-Alarmen (50%/80%/100%). Pipeline pausiert automatisch, wenn die Kosten das Budget ueberschreiten. |
-| **🔐 Reproduzierbarkeit** | SHA256-Pruefsummen fuer alle Stufenartefakte. Unveraenderliche Manifeste zur Verifikation. Mehrstufiges Undo mit versionierten Snapshots. |
-
----
-
-## 🧑‍✈️ Human-in-the-Loop Co-Pilot
-
-**AutoResearchClaw v0.4.0 fuehrt ein vollstaendiges Human-in-the-Loop (HITL)-System ein**, das die Pipeline von rein autonom zu einer kollaborativen Mensch-KI-Forschungsmaschine transformiert. Waehle dein Beteiligungsniveau:
-
-### Interventionsmodi
-
-| Modus | Befehl | Beschreibung |
-|-------|--------|-------------|
-| **Full Auto** | `--auto-approve` | Urspruengliches Verhalten — kein menschliches Eingreifen |
-| **Gate Only** | `--mode gate-only` | Pause an 3 Gate-Stufen (5, 9, 20) zur Genehmigung |
-| **Checkpoint** | `--mode checkpoint` | Pause an jeder Phasengrenze (8 Checkpoints) |
-| **Co-Pilot** | `--mode co-pilot` | Tiefe Kollaboration an kritischen Stufen, sonst automatisch |
-| **Step-by-Step** | `--mode step-by-step` | Pause nach jeder Stufe — lerne die Pipeline kennen |
-| **Express** | `--mode express` | Schnellpruefung — nur die 3 kritischsten Gates |
-
-### Co-Pilot-Workflow
-
-```
-Du: researchclaw run --topic "Quantenrauschen als neuronales Netzwerk-Regularisierung" --mode co-pilot
-
-Pipeline fuehrt Stufen 1-7 automatisch aus...
-
-  ┌─────────────────────────────────────────────────────────────┐
-  │  HITL | Stufe 08: HYPOTHESIS_GEN                            │
-  │  Post-Stage-Pruefung                                        │
-  │                                                             │
-  │  Erwaehnte Hypothesen: 3                                    │
-  │  Neuheitswert: 0.72 (moderat)                               │
-  │                                                             │
-  │  [a] Genehmigen  [r] Ablehnen  [e] Bearbeiten  [c] Kollaborieren │
-  │  [i] Anleitung einfuegen  [v] Ausgabe anzeigen  [q] Abbrechen    │
-  └─────────────────────────────────────────────────────────────┘
-
-Du: c  (kollaborativen Chat starten)
-Du: Hypothese 3 ist interessant, braucht aber Dropout/Label Smoothing als Baselines
-KI:  Aktualisiert — Dropout, Label Smoothing, MixUp, CutMix als Baselines hinzugefuegt...
-Du: genehmigen
-
-Pipeline setzt mit deiner verfeinerten Hypothese fort...
-```
-
-### CLI-Befehle
-
-```bash
-# Mit HITL-Modus starten
-researchclaw run --topic "..." --mode co-pilot
-
-# An eine pausierte Pipeline anhaengen (von einem anderen Terminal)
-researchclaw attach artifacts/rc-2026-xxx
-
-# Pipeline- und HITL-Status pruefen
-researchclaw status artifacts/rc-2026-xxx
-
-# Von einem anderen Terminal oder Skript genehmigen/ablehnen
-researchclaw approve artifacts/rc-2026-xxx --message "Sieht gut aus"
-researchclaw reject artifacts/rc-2026-xxx --reason "Wichtige Baseline fehlt"
-
-# Anleitung fuer eine Stufe einfuegen (auch bevor sie laeuft)
-researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "ResNet-50 als primaere Baseline verwenden"
-```
-
-### Hauptfaehigkeiten
-
-| Funktion | Beschreibung |
-|----------|-------------|
-| **Ideen-Workshop** | Hypothesen gemeinsam brainstormen, bewerten und verfeinern (Stufe 7-8) |
-| **Baseline-Navigator** | KI schlaegt Baselines vor + Mensch fuegt hinzu/entfernt + Reproduzierbarkeitscheckliste (Stufe 9) |
-| **Paper-Co-Writer** | Abschnittsweises Verfassen mit menschlicher Bearbeitung und KI-Feinschliff (Stufe 16-19) |
-| **SmartPause** | Konfidenzgesteuerte dynamische Pausierung — erkennt automatisch, wann menschlicher Input hilfreich waere |
-| **Behauptungsverifikation** | Inline-Faktencheck gegen gesammelte Literatur — markiert unbegruendete Behauptungen |
-| **Kostenbudget-Leitplanken** | Budgetueberwachung mit 50%/80%/100% Schwellenwert-Alarmen |
-| **Interventionslernen** | ALHF — lernt aus deinen Review-Mustern, um zukuenftige Pausen-Entscheidungen zu optimieren |
-| **Verzweigungserkundung** | Forke die Pipeline, um mehrere Hypothesen zu erkunden, vergleiche und fuehre die beste zusammen |
-| **Eskalationsrichtlinie** | Gestufte Benachrichtigung (Terminal → Slack → E-Mail → Auto-Halt) bei Unbeaufsichtigung |
-| **3 Adapter** | CLI (Terminal), WebSocket (Web-Dashboard), MCP (externe Agenten) |
-
-### Konfiguration
-
-```yaml
-# config.arc.yaml
-hitl:
-  enabled: true
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | co-pilot | custom
-  cost_budget_usd: 50.0              # Pausieren wenn Kosten das Budget ueberschreiten (0 = kein Limit)
-
-  notifications:
-    on_pause: true
-    on_quality_drop: true
-    channels: ["terminal"]            # terminal | slack | webhook
-
-  timeouts:
-    default_human_timeout_sec: 86400  # 24h Standard-Wartezeit
-    auto_proceed_on_timeout: false
-
-  collaboration:
-    max_chat_turns: 50
-    save_chat_history: true
-
-  # Stufenspezifische benutzerdefinierte Richtlinien (optional, fuer 'custom'-Modus)
-  stage_policies:
-    8: { require_approval: true, enable_collaboration: true }
-    9: { require_approval: true, allow_edit_output: true }
-```
-
-### Abwaertskompatibilitaet
-
-- **Standard: AUS.** Ohne `hitl.enabled: true` oder `--mode` verhaelt sich die Pipeline exakt wie zuvor.
-- **`--auto-approve` funktioniert weiterhin.** Es ueberschreibt den HITL-Modus.
-- **Alle 2.699 bestehenden Tests bestehen** mit vorhandenem HITL-Code.
-
----
-
-## 🧠 MetaClaw-Integration
-
-**AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = Eine Pipeline, die aus jedem Durchlauf lernt.**
-
-MetaClaw fuegt **durchlaufuebergreifenden Wissenstransfer** zu AutoResearchClaw hinzu. Wenn aktiviert, erfasst die Pipeline automatisch Erkenntnisse aus Fehlern und Warnungen, konvertiert sie in wiederverwendbare Skills und injiziert diese Skills in alle 23 Pipeline-Stufen bei nachfolgenden Durchlaeufen — damit dieselben Fehler nie wiederholt werden.
-
-### Funktionsweise
-
-```
-Durchlauf N wird ausgefuehrt → Fehler/Warnungen als Lektionen erfasst
-                      ↓
-          MetaClaw Lektion → Skill-Konvertierung
-                      ↓
-          arc-* Skill-Dateien in ~/.metaclaw/skills/ gespeichert
-                      ↓
-Durchlauf N+1 → build_overlay() injiziert Skills in jeden LLM-Prompt
-                      ↓
-          LLM vermeidet bekannte Fallstricke → hoehere Qualitaet, weniger Wiederholungen
-```
-
-### Schnelleinrichtung
-
-```bash
-# 1. MetaClaw installieren (falls nicht vorhanden)
-pip install metaclaw
-
-# 2. In der Konfiguration aktivieren
-```
-
-```yaml
-# config.arc.yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000"        # MetaClaw-Proxy (optional)
-  skills_dir: "~/.metaclaw/skills"          # Wo Skills gespeichert werden
-  fallback_url: "https://api.openai.com/v1" # Direkter LLM-Fallback
-  fallback_api_key: ""                      # API-Schluessel fuer Fallback-URL
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # Warnungen + Fehler konvertieren
-    max_skills_per_run: 3
-```
-
-```bash
-# 3. Wie gewohnt ausfuehren — MetaClaw arbeitet transparent
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-```
-
-Nach jedem Durchlauf kannst du `~/.metaclaw/skills/arc-*/SKILL.md` pruefen, um die erlernten Skills deiner Pipeline zu sehen.
-
-### Experimentergebnisse
-
-In kontrollierten A/B-Experimenten (gleiches Thema, gleiches LLM, gleiche Konfiguration):
-
-| Metrik | Baseline | Mit MetaClaw | Verbesserung |
-|--------|----------|--------------|--------------|
-| Stufen-Wiederholungsrate | 10.5% | 7.9% | **-24.8%** |
-| Anzahl REFINE-Zyklen | 2.0 | 1.2 | **-40.0%** |
-| Pipeline-Stufenabschluss | 18/19 | 19/19 | **+5.3%** |
-| Gesamtrobustheitswert (Komposit) | 0.714 | 0.845 | **+18.3%** |
-
-> Der Komposit-Robustheitswert ist ein gewichteter Durchschnitt aus Stufenabschlussrate (40%), Wiederholungsreduktion (30%) und REFINE-Zykluseffizienz (30%).
-
-### Abwaertskompatibilitaet
-
-- **Standard: AUS.** Wenn `metaclaw_bridge` fehlt oder `enabled: false`, verhaelt sich die Pipeline exakt wie zuvor.
-- **Keine neuen Abhaengigkeiten.** MetaClaw ist optional — die Kern-Pipeline funktioniert ohne.
-- **Alle 2.699 bestehenden Tests bestehen** mit dem Integrationscode.
-
----
-
-## 🧩 Skills-Bibliothek
-
-AutoResearchClaw unterstuetzt jetzt das Laden von **Open-Source- und benutzerdefinierten Skills**, um Ihr Forschungserlebnis weiter zu verbessern. Wir liefern ausserdem **20 vorinstallierte integrierte Skills** (wissenschaftliches Schreiben, Literatursuche, Chemie, Biologie und mehr) als sofort einsetzbare Referenzen mit, die von Anfang an ein hohes Mass an Flexibilitaet bieten. Deaktivieren Sie einen Skill, indem Sie `enabled: false` in seinen Frontmatter einfuegen.
-
-**Beispiele fuer integrierte Skills:**
-
-| Kategorie | Skill | Beschreibung |
-|-----------|-------|-------------|
-| **Schreiben** | `scientific-writing` | IMRAD-Struktur, Zitationsformatierung, Berichtsrichtlinien |
-| **Domaene** | `chemistry-rdkit` | Molekuelanalyse, SMILES, Fingerprints, Wirkstoffforschung |
-| **Experiment** | `literature-search` | Systematische Uebersicht, PRISMA-Methodik |
-
-> Alle 20 Skills anzeigen mit `researchclaw skills list`.
-
-### Eigene Skills laden
-
-```bash
-# Option 1: Skill installieren (projektuebergreifend persistent)
-researchclaw skills install /path/to/my-skill/
-
-# Option 2: SKILL.md ins Projekt legen
-mkdir -p .claude/skills/my-custom-skill
-# Dann eine SKILL.md mit YAML-Frontmatter erstellen (name, description, trigger-keywords, applicable-stages)
-
-# Option 3: Gemeinsame Skill-Verzeichnisse in config.arc.yaml konfigurieren
-# skills:
-#   custom_dirs:
-#     - /path/to/team-shared-skills
-```
-
-### Skills verwenden
-
-Skills werden automatisch geladen und in LLM-Prompts injiziert — keine manuelle Aktivierung noetig. Verwenden Sie die CLI zur Inspektion:
-
-```bash
-researchclaw skills list               # Alle geladenen Skills mit Quellen anzeigen
-researchclaw skills validate ./my-skill # SKILL.md-Format pruefen
-```
-
-Community-Skills durchsuchen: [K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills) (150+ wissenschaftliche Skills aus mehreren Disziplinen).
-
----
-
-## ⚙️ Konfigurationsreferenz
-
-<details>
-<summary>Klicken zum Aufklappen der vollstaendigen Konfigurationsreferenz</summary>
-
-```yaml
-# === Projekt ===
-project:
-  name: "my-research"              # Projektbezeichner
-  mode: "docs-first"               # docs-first | semi-auto | full-auto
-
-# === Forschung ===
-research:
-  topic: "..."                     # Forschungsthema (erforderlich)
-  domains: ["ml", "nlp"]           # Forschungsdomaenen fuer Literatursuche
-  daily_paper_count: 8             # Ziel-Paperzahl pro Suchabfrage
-  quality_threshold: 4.0           # Mindestqualitaetswert fuer Paper
-
-# === Laufzeit ===
-runtime:
-  timezone: "America/New_York"     # Fuer Zeitstempel
-  max_parallel_tasks: 3            # Limit gleichzeitiger Experimente
-  approval_timeout_hours: 12       # Gate-Stufen-Timeout
-  retry_limit: 2                   # Wiederholungsanzahl bei Stufenfehler
-
-# === LLM ===
-llm:
-  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
-  base_url: "https://..."          # API-Endpunkt (erforderlich fuer openai-compatible)
-  api_key_env: "OPENAI_API_KEY"    # Umgebungsvariable fuer API-Schluessel (erforderlich fuer openai-compatible)
-  api_key: ""                      # Oder Schluessel direkt eintragen
-  primary_model: "gpt-4o"          # Primaeres Modell
-  fallback_models: ["gpt-4o-mini"] # Fallback-Kette
-  s2_api_key: ""                   # Semantic Scholar API-Schluessel (optional, hoehere Rate-Limits)
-  acp:                             # Nur verwendet wenn provider: "acp"
-    agent: "claude"                # ACP-Agent-CLI-Befehl (claude, codex, gemini, etc.)
-    cwd: "."                       # Arbeitsverzeichnis fuer den Agenten
-
-# === Experiment ===
-experiment:
-  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
-  time_budget_sec: 300             # Max. Ausfuehrungszeit pro Durchlauf (Standard: 300s)
-  max_iterations: 10               # Max. Optimierungsiterationen
-  metric_key: "val_loss"           # Primaerer Metrikname
-  metric_direction: "minimize"     # minimize | maximize
-  sandbox:
-    python_path: ".venv/bin/python"
-    gpu_required: false
-    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
-    max_memory_mb: 4096
-  docker:
-    image: "researchclaw/experiment:latest"
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    auto_install_deps: true        # Automatische Import-Erkennung → requirements.txt
-  ssh_remote:
-    host: ""                       # GPU-Server-Hostname
-    gpu_ids: []                    # Verfuegbare GPU-IDs
-    remote_workdir: "/tmp/researchclaw_experiments"
-  opencode:                          # OpenCode Beast Mode (auto-installiert ueber `researchclaw setup`)
-    enabled: true                    # Hauptschalter (Standard: true)
-    auto: true                       # Auto-Ausloesung ohne Bestaetigung (Standard: true)
-    complexity_threshold: 0.2        # 0.0-1.0 — hoeher = nur bei komplexen Experimenten ausloesen
-    model: ""                        # Modell ueberschreiben (leer = llm.primary_model verwenden)
-    timeout_sec: 600                 # Max. Sekunden fuer OpenCode-Generierung
-    max_retries: 1                   # Wiederholungsanzahl bei Fehler
-    workspace_cleanup: true          # Temporaeren Workspace nach Sammlung entfernen
-  code_agent:                        # CodeAgent v2 — Mehrphasen-Codegenerierung
-    enabled: true                    # CodeAgent statt Legacy-Einzelprompt-Codegen verwenden
-    architecture_planning: true      # Tiefe Implementierungsblaupause vor dem Codieren generieren
-    sequential_generation: true      # Dateien einzeln nach Abhaengigkeits-DAG generieren
-    hard_validation: true            # AST-basierte Validierungs-Gates (blockiert identische Ablationen, hardcodierte Metriken)
-    hard_validation_max_repairs: 2   # Max. Reparaturversuche bei fehlgeschlagener Validierung
-    exec_fix_max_iterations: 3       # Ausfuehrungs-Reparaturversuche
-    exec_fix_timeout_sec: 60         # Timeout pro Reparaturversuch
-  benchmark_agent:                   # BenchmarkAgent — automatisierte Datensatz- & Baseline-Auswahl
-    enabled: true                    # 4-Agenten-Benchmark-Pipeline aktivieren (Surveyor→Selector→Acquirer→Validator)
-    enable_hf_search: true           # HuggingFace Datasets durchsuchen
-    enable_web_search: true          # Google Scholar nach Benchmarks durchsuchen
-    tier_limit: 2                    # Datensatz-Stufen-Filter (1=klein/gecacht, 2=mittel, 3=gross)
-    min_benchmarks: 1                # Mindestanzahl Datensaetze
-    min_baselines: 2                 # Mindestanzahl Baseline-Methoden
-  figure_agent:                      # FigureAgent — akademische Abbildungserstellung
-    enabled: true                    # 5-Agenten-Abbildungs-Pipeline aktivieren (Planner→CodeGen→Renderer→Critic→Integrator)
-    min_figures: 3                   # Mindestanzahl Abbildungen
-    max_figures: 8                   # Maximalanzahl Abbildungen
-    max_iterations: 3                # Kritik-gesteuerte Verfeinerungsiterationen
-    dpi: 300                         # Ausgabeaufloesung
-    strict_mode: false               # Pipeline bei fehlgeschlagener Abbildungserstellung abbrechen
-  repair:                            # Anti-Fabrikations-Experiment-Reparatur
-    enabled: true                    # Fehlgeschlagene Experimente automatisch diagnostizieren und reparieren
-    max_cycles: 3                    # Reparatur-Wiederholungsschleifen
-    min_completion_rate: 0.5         # >=50% Bedingungen muessen abgeschlossen sein
-    min_conditions: 2                # Mindestens 2 Bedingungen fuer gueltiges Experiment
-    use_opencode: true               # Reparaturen ueber OpenCode Beast Mode leiten
-
-# === Websuche (Optional) ===
-web_search:
-  enabled: true                      # Web-erweiterte Literatursuche aktivieren
-  tavily_api_key_env: "TAVILY_API_KEY"  # Tavily API-Schluessel Umgebungsvariable (optional)
-  enable_scholar: true               # Google Scholar-Suche
-  enable_pdf_extraction: true        # Text aus PDFs extrahieren
-  max_web_results: 10                # Max. Webergebnisse pro Abfrage
-
-# === Export ===
-export:
-  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
-  authors: "Anonymous"
-  bib_file: "references"
-
-# === Prompts ===
-prompts:
-  custom_file: ""                  # Pfad zur benutzerdefinierten Prompts-YAML (leer = Standardwerte)
-
-# === HITL Co-Pilot (NEU in v0.4.0) ===
-hitl:
-  enabled: false                     # Auf true setzen um HITL zu aktivieren
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
-  cost_budget_usd: 0.0              # Kostenlimit in USD (0 = kein Limit)
-  notifications:
-    on_pause: true                   # Benachrichtigung wenn Pipeline pausiert
-    on_quality_drop: true            # Benachrichtigung bei Qualitaetsproblemen
-    channels: ["terminal"]           # terminal | slack | webhook
-  timeouts:
-    default_human_timeout_sec: 86400 # Bis zu 24h auf menschlichen Input warten
-    auto_proceed_on_timeout: false   # Wenn true, automatisch genehmigen bei Timeout
-  collaboration:
-    max_chat_turns: 50               # Max. Turns pro Kollaborationssitzung
-    save_chat_history: true          # Chat-Protokolle speichern
-  stage_policies: {}                 # Stufenspezifische Ueberschreibungen (fuer 'custom'-Modus)
-
-# === Sicherheit ===
-security:
-  hitl_required_stages: [5, 9, 20] # Stufen, die menschliche Genehmigung erfordern
-  allow_publish_without_approval: false
-  redact_sensitive_logs: true
-
-# === Wissensdatenbank ===
-knowledge_base:
-  backend: "markdown"              # markdown | obsidian
-  root: "docs/kb"
-
-# === Benachrichtigungen ===
-notifications:
-  channel: "console"               # console | discord | slack
-  target: ""
-
-# === MetaClaw Bridge (Optional) ===
-metaclaw_bridge:
-  enabled: false                   # Auf true setzen fuer durchlaufuebergreifendes Lernen
-  proxy_url: "http://localhost:30000"  # MetaClaw-Proxy-URL
-  skills_dir: "~/.metaclaw/skills" # Wo arc-* Skills gespeichert werden
-  fallback_url: ""                 # Direkter LLM-Fallback wenn Proxy nicht erreichbar
-  fallback_api_key: ""             # API-Schluessel fuer Fallback-Endpunkt
-  lesson_to_skill:
-    enabled: true                  # Lektionen automatisch in Skills konvertieren
-    min_severity: "warning"        # Mindestschwere fuer Konvertierung
-    max_skills_per_run: 3          # Max. neue Skills pro Pipeline-Durchlauf
-  prm:                             # Process Reward Model Qualitaets-Gate (optional)
-    enabled: false                 # LLM-als-Juror zur Bewertung von Stufenausgaben verwenden
-    model: "gpt-5.4"              # PRM-Juror-Modell
-    votes: 3                       # Mehrheitsentscheidung-Anzahl
-    gate_stages: [5, 9, 15, 20]   # Stufen fuer PRM-Gates
-
-# === OpenClaw Bridge ===
-openclaw_bridge:
-  use_cron: false                  # Geplante Forschungsdurchlaeufe
-  use_message: false               # Fortschrittsbenachrichtigungen
-  use_memory: false                # Sitzungsuebergreifende Wissenspersistenz
-  use_sessions_spawn: false        # Parallele Sub-Sessions starten
-  use_web_fetch: false             # Live-Websuche
-  use_browser: false               # Browserbasierte Paper-Sammlung
-```
-
-</details>
-
----
-
-## 🙏 Danksagungen
-
-Inspiriert von:
-
-- 🔬 [AI Scientist](https://github.com/SakanaAI/AI-Scientist) (Sakana AI) — Pionier der automatisierten Forschung
-- 🧠 [AutoResearch](https://github.com/karpathy/autoresearch) (Andrej Karpathy) — End-to-End-Forschungsautomatisierung
-- 🌐 [FARS](https://analemma.ai/blog/introducing-fars/) (Analemma) — Fully Automated Research System
-
----
-
-## 📄 Lizenz
-
-MIT — siehe [LICENSE](../LICENSE) fuer Details.
-
----
-
-## 📌 Zitation
-
-Wenn du AutoResearchClaw nuetzlich findest, zitiere bitte:
-
-```bibtex
-@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
-      title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
-      author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
-      year={2026},
-      eprint={2605.20025},
-      archivePrefix={arXiv},
-      primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2605.20025},
-}
-```
-
-<p align="center">
-  <sub>Gebaut mit 🦞 vom AutoResearchClaw-Team</sub>
-</p>
diff --git a/docs/README_ES.md b/docs/README_ES.md
deleted file mode 100644
index 26a9c219..00000000
--- a/docs/README_ES.md
+++ /dev/null
@@ -1,790 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="700" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center"><b>Comparte una idea. Obten un articulo. Autonomo, Colaborativo & Auto-evolutivo.</b></h2>
-
-
-
-<p align="center">
-  <b><i><font size="5">Chatea con <a href="#-integracion-con-openclaw">OpenClaw</a>: "Investiga X" → hecho.</font></i></b>
-</p>
-
-<p align="center">
-  📄 <b>Nuestro articulo esta en arXiv — ¡ven a leerlo!</b> <a href="https://arxiv.org/abs/2605.20025"><i>AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration</i></a>
-</p>
-
-<p align="center">
-  <img src="../image/framework_v2.png" width="100%" alt="AutoResearchClaw Framework">
-</p>
-
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2605.20025"><img src="https://img.shields.io/badge/arXiv-2605.20025-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
-  <a href="https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ARC--Bench-yellow" alt="ARC-Bench on Hugging Face"></a>
-  <a href="../LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT License"></a>
-  <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" alt="Python 3.11+"></a>
-  <a href="#testing"><img src="https://img.shields.io/badge/Tests-2699%20passed-brightgreen?logo=pytest&logoColor=white" alt="2699 Tests Passed"></a>
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/GitHub-AutoResearchClaw-181717?logo=github" alt="GitHub"></a>
-  <a href="#-integracion-con-openclaw"><img src="https://img.shields.io/badge/OpenClaw-Compatible-ff4444?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6IiBmaWxsPSJ3aGl0ZSIvPjwvc3ZnPg==" alt="OpenClaw Compatible"></a>
-  <a href="https://discord.gg/u4ksqW5P"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
-</p>
-
-<p align="center">
-  <a href="../README.md">🇺🇸 English</a> ·
-  <a href="README_CN.md">🇨🇳 中文</a> ·
-  <a href="README_JA.md">🇯🇵 日本語</a> ·
-  <a href="README_KO.md">🇰🇷 한국어</a> ·
-  <a href="README_FR.md">🇫🇷 Français</a> ·
-  <a href="README_DE.md">🇩🇪 Deutsch</a> ·
-  <a href="README_ES.md">🇪🇸 Español</a> ·
-  <a href="README_PT.md">🇧🇷 Português</a> ·
-  <a href="README_RU.md">🇷🇺 Русский</a> ·
-  <a href="README_AR.md">🇸🇦 العربية</a>
-</p>
-
-<p align="center">
-  <a href="showcase/SHOWCASE.md">🏆 Galeria de articulos</a> · <a href="HITL_GUIDE.md">🧑‍✈️ Guia de Co-Piloto</a> · <a href="integration-guide.md">📖 Guia de integracion</a> · <a href="https://discord.gg/u4ksqW5P">💬 Comunidad Discord</a>
-</p>
-
----
-
-<table>
-<tr>
-<td width="18%">
-<a href="showcase/SHOWCASE.md"><img src="showcase/thumbnails/paper_I_random_matrix-01.png" width="120" alt="Sample Paper"/></a>
-</td>
-<td valign="middle">
-<b>🏆 Galeria de articulos generados</b><br><br>
-<b>8 articulos en 8 dominios</b> — matematicas, estadistica, biologia, computacion, NLP, RL, vision, robustez — generados de forma completamente autonoma o con guia de co-piloto Human-in-the-Loop.<br><br>
-<a href="showcase/SHOWCASE.md"><img src="https://img.shields.io/badge/View_Full_Showcase_→-All_8_Papers-d73a49?style=for-the-badge" alt="View Showcase"></a>
-</td>
-</tr>
-</table>
-
----
-
-> **🧪 Buscamos testers!** Prueba el pipeline con tu propia idea de investigacion — de cualquier campo — y [cuentanos que piensas](TESTER_GUIDE.md). Tu feedback da forma directamente a la proxima version. **[→ Testing Guide](TESTER_GUIDE.md)** | **[→ 中文测试指南](TESTER_GUIDE_CN.md)** | **[→ 日本語テストガイド](TESTER_GUIDE_JA.md)**
-
----
-
-## 🔥 News
-- **[05/19/2026]** **v0.5.0** — **Agentes de experimentacion multidominio + ARC-Bench** — Dos actualizaciones principales. **(1) Agentes de ejecucion especializados por dominio:** la etapa de experimentos (etapas 10–13) ahora va mas alla del sandbox de ML por defecto y enruta a agentes especializados por campo — **fisica de altas energias** (ColliderAgent: FeynRules → MadGraph5 → Delphes via la nube Magnus), **biologia** (modelado metabolico a escala genomica con COBRApy) y **estadistica** (agente de estudios de simulacion), con un ejecutor Docker generico para quimica/materiales. El pipeline selecciona automaticamente el ejecutor adecuado segun el dominio. **(2) ARC-Bench:** un benchmark de investigacion autonoma abierta de **55 temas** que cubre **ML (25), fisica de altas energias (10), cuantica (10), biologia (7) y estadistica (3)**, cada uno con un manifiesto y una rubrica de evaluacion (`experiments/arc_bench/`, y también en [🤗 Hugging Face](https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench)). **[→ Guia de integracion de dominios](DOMAIN_INTEGRATION_GUIDE.md)**
-- **[04/01/2026]** **v0.4.0** — **Sistema Co-Piloto Human-in-the-Loop** — AutoResearchClaw ya no es puramente autonomo. El nuevo sistema HITL agrega 6 modos de intervencion (`full-auto`, `gate-only`, `checkpoint`, `step-by-step`, `co-pilot`, `custom`), politicas por etapa y colaboracion profunda humano-IA. Incluye: Taller de Ideas para co-creacion de hipotesis, Navegador de Baselines para revision del diseno experimental, Co-Escritor de Articulos para redaccion colaborativa, SmartPause (intervencion dinamica basada en confianza), aprendizaje de intervencion ALHF, verificacion de afirmaciones anti-alucinacion, guardias de presupuesto, ramificacion del pipeline para exploracion paralela de hipotesis, y comandos CLI (`attach`/`status`/`approve`/`reject`/`guide`). **[→ Guia HITL completa](HITL_GUIDE.md)**
-- **[03/30/2026]** **Carga Flexible de Habilidades** — AutoResearchClaw ahora soporta la carga de habilidades de codigo abierto y personalizadas de cualquier disciplina para mejorar aun mas tu experiencia de investigacion. Se incluyen 20 habilidades precargadas como referencias listas para usar, cubriendo redaccion cientifica, diseno experimental, quimica, biologia y mas — incluyendo una habilidad de evolucion agente [A-Evolve](https://github.com/A-EVO-Lab/a-evolve) contribuida por la comunidad. Carga las tuyas via `researchclaw skills install` o coloca un `SKILL.md` en `.claude/skills/`. Ver [Biblioteca de Habilidades](#-biblioteca-de-habilidades).
-- **[03/22/2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) — **Soporte multiplataforma + estabilidad mayor** — AutoResearchClaw ahora funciona con cualquier agente compatible con ACP (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) y soporta plataformas de mensajeria (Discord, Telegram, Lark, WeChat) via el puente OpenClaw. Nuevo backend de generacion de codigo CLI-agent que delega las Stages 10 y 13 a agentes CLI externos con control de presupuesto y gestion de timeouts. Incluye sistema anti-fabricacion (VerifiedRegistry + bucle de diagnostico y reparacion), 100+ correcciones de bugs, refactorizacion modular del executor, auto-deteccion de `--resume`, endurecimiento de reintentos LLM y correcciones de la comunidad.
-
-<details>
-<summary>Versiones anteriores</summary>
-
-- **[03/18/2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) — **OpenCode Beast Mode + Community Contributions** — New "Beast Mode" routes complex code generation to [OpenCode](https://github.com/anomalyco/opencode) with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
-- **[03/17/2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) — **MetaClaw Integration** — AutoResearchClaw now supports [MetaClaw](https://github.com/aiming-lab/MetaClaw) cross-run learning: pipeline failures → structured lessons → reusable skills, injected into all 23 stages. **+18.3%** robustness in controlled experiments. Opt-in (`metaclaw_bridge.enabled: true`), fully backward-compatible. See [Integration Guide](#-integracion-metaclaw).
-- **[03/16/2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) — Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
-- **[03/15/2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) — We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.
-
-</details>
-
----
-
-## ⚡ Un comando. Un articulo.
-
-```bash
-# Totalmente autonomo — sin intervencion humana
-pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve
-
-# Modo Co-Piloto — colabora con la IA en puntos de decision clave
-researchclaw run --topic "Your research idea here" --mode co-pilot
-```
-
-
----
-
-## 🤔 Que es esto?
-
-**Tu lo piensas. AutoResearchClaw lo escribe. Tu guias las decisiones clave.**
-
-Proporciona un tema de investigacion — recibe un articulo academico completo con literatura real de OpenAlex, Semantic Scholar y arXiv, experimentos en sandbox adaptados al hardware (deteccion automatica GPU/MPS/CPU), analisis estadistico, revision multi-agentes, y LaTeX listo para conferencia orientado a NeurIPS/ICML/ICLR. Ejecutalo completamente autonomo, o usa el **modo Co-Piloto** para guiar a la IA en puntos de decision criticos — elige direcciones de investigacion, revisa disenos experimentales y co-escribe el articulo. Sin referencias alucinadas.
-
-<table>
-<tr><td>📄</td><td><code>paper_draft.md</code></td><td>Articulo academico completo (Introduccion, Trabajo relacionado, Metodo, Experimentos, Resultados, Conclusion)</td></tr>
-<tr><td>📐</td><td><code>paper.tex</code></td><td>LaTeX listo para conferencia (plantillas NeurIPS / ICLR / ICML)</td></tr>
-<tr><td>📚</td><td><code>references.bib</code></td><td>Referencias BibTeX reales de OpenAlex, Semantic Scholar y arXiv — auto-depuradas para coincidir con las citas en linea</td></tr>
-<tr><td>🔍</td><td><code>verification_report.json</code></td><td>Verificacion de integridad + relevancia de citas en 4 capas (arXiv, CrossRef, DataCite, LLM)</td></tr>
-<tr><td>🧪</td><td><code>experiment runs/</code></td><td>Codigo generado + resultados en sandbox + metricas JSON estructuradas</td></tr>
-<tr><td>📊</td><td><code>charts/</code></td><td>Graficos de comparacion de condiciones auto-generados con barras de error e intervalos de confianza</td></tr>
-<tr><td>📝</td><td><code>reviews.md</code></td><td>Revision por pares multi-agente con verificacion de consistencia metodologia-evidencia</td></tr>
-<tr><td>🧬</td><td><code>evolution/</code></td><td>Lecciones de auto-aprendizaje extraidas de cada ejecucion</td></tr>
-<tr><td>📦</td><td><code>deliverables/</code></td><td>Todos los entregables finales en una sola carpeta — listos para compilar en Overleaf</td></tr>
-</table>
-
-El pipeline se ejecuta **de principio a fin** — completamente autonomo o con colaboracion human-in-the-loop. Cuando los experimentos fallan, se auto-repara. Cuando las hipotesis no se sostienen, pivotea. Cuando las citas son falsas, las elimina. Cuando quieres dirigir, se pausa y escucha.
-
-🌍 **Ejecutalo en cualquier lugar.** AutoResearchClaw no esta atado a una sola plataforma. Usalo de forma independiente por CLI, conectalo a [OpenClaw](https://github.com/openclaw/openclaw), o integralo con cualquier agente compatible con ACP — 🤖 Claude Code, 💻 Codex CLI, 🐙 Copilot CLI, ♊ Gemini CLI, 🌙 Kimi CLI, y mas. Gracias al puente de mensajeria de OpenClaw, puedes iniciar una investigacion completa desde 💬 Discord, ✈️ Telegram, 🐦 Lark (飞书), 💚 WeChat, o cualquier plataforma que tu equipo ya utilice. Un tema de entrada, un paper de salida — sin importar donde lo escribas.
-
----
-
-## 🚀 Inicio rapido
-
-```bash
-# 1. Clonar e instalar
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-python3 -m venv .venv && source .venv/bin/activate
-pip install -e .
-
-# 2. Setup (interactivo — instala OpenCode beast mode, verifica Docker/LaTeX)
-researchclaw setup
-
-# 3. Configurar
-researchclaw init          # Interactivo: elegir proveedor LLM, crea config.arc.yaml
-# O manualmente: cp config.researchclaw.example.yaml config.arc.yaml
-
-# 4. Ejecutar
-export OPENAI_API_KEY="sk-..."
-researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve
-```
-
-Salida → `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` — LaTeX listo para compilar, BibTeX, codigo experimental, graficos.
-
-<details>
-<summary>📝 Configuracion minima requerida</summary>
-
-```yaml
-project:
-  name: "my-research"
-
-research:
-  topic: "Your research topic here"
-
-llm:
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-4o"
-  fallback_models: ["gpt-4o-mini"]
-
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python"
-```
-
-</details>
-
----
-
-## 🧠 Que lo hace diferente
-
-| Capacidad | Como funciona |
-|-----------|--------------|
-| **🧑‍✈️ Modo Co-Piloto** | 6 modos de intervencion — desde completamente autonomo hasta paso a paso. Guia a la IA en decisiones criticas (hipotesis, baselines, redaccion del articulo) o dejala correr libre. SmartPause auto-detecta cuando la entrada humana ayudaria. |
-| **🔄 Bucle PIVOT / REFINE** | La etapa 15 decide de forma autonoma: PROCEED, REFINE (ajustar parametros) o PIVOT (nueva direccion). Artefactos auto-versionados. |
-| **🤖 Debate multi-agente** | La generacion de hipotesis, el analisis de resultados y la revision por pares utilizan cada uno debate estructurado multi-perspectiva. |
-| **🧬 Auto-aprendizaje** | Lecciones extraidas por ejecucion (justificacion de decisiones, advertencias de ejecucion, anomalias de metricas) con decaimiento temporal de 30 dias. Las ejecuciones futuras aprenden de errores pasados. |
-| **📚 Base de conocimiento** | Cada ejecucion construye una KB estructurada en 6 categorias (decisiones, experimentos, hallazgos, literatura, preguntas, revisiones). |
-| **🛡️ Vigilante Sentinel** | Monitor de calidad en segundo plano: deteccion NaN/Inf, consistencia articulo-evidencia, puntuacion de relevancia de citas, guardia anti-fabricacion. |
-| **🔍 Verificacion de afirmaciones** | Verificacion de hechos en linea: extrae afirmaciones del texto generado por IA y las cruza con la literatura recopilada. Marca citas infundadas y numeros fabricados. |
-| **🌿 Exploracion de ramas** | Bifurca el pipeline para explorar multiples direcciones de investigacion simultaneamente, compara resultados lado a lado y fusiona el mejor camino. |
-
----
-
-## 🦞 Integracion con OpenClaw
-
-<table>
-<tr>
-
-**AutoResearchClaw es un servicio compatible con [OpenClaw](https://github.com/openclaw/openclaw).** Instalalo en OpenClaw y lanza investigacion autonoma con un solo mensaje — o usalo de forma independiente via CLI, Claude Code o cualquier asistente de programacion con IA.
-
-</tr>
-</table>
-
-### 🚀 Uso con OpenClaw (Recomendado)
-
-Si ya usas [OpenClaw](https://github.com/openclaw/openclaw) como tu asistente de IA:
-
-```
-1️⃣  Comparte la URL del repositorio de GitHub con OpenClaw
-2️⃣  OpenClaw lee automaticamente RESEARCHCLAW_AGENTS.md → comprende el pipeline
-3️⃣  Di: "Research [tu tema]"
-4️⃣  Listo — OpenClaw clona, instala, configura, ejecuta y devuelve los resultados
-```
-
-**Eso es todo.** OpenClaw se encarga de `git clone`, `pip install`, configuracion y ejecucion del pipeline automaticamente. Tu solo chateas.
-
-<details>
-<summary>💡 Que sucede internamente</summary>
-
-1. OpenClaw lee `RESEARCHCLAW_AGENTS.md` → aprende el rol de orquestador de investigacion
-2. OpenClaw lee `README.md` → comprende la instalacion y la estructura del pipeline
-3. OpenClaw copia `config.researchclaw.example.yaml` → `config.yaml`
-4. Solicita tu clave API del LLM (o usa tu variable de entorno)
-5. Ejecuta `pip install -e .` + `researchclaw run --topic "..." --auto-approve`
-6. Devuelve el articulo, LaTeX, experimentos y citas
-
-</details>
-
-### 🔌 Bridge de OpenClaw (Avanzado)
-
-Para una integracion mas profunda, AutoResearchClaw incluye un **sistema de adaptadores bridge** con 6 capacidades opcionales:
-
-```yaml
-# config.arc.yaml
-openclaw_bridge:
-  use_cron: true              # ⏰ Ejecuciones de investigacion programadas
-  use_message: true           # 💬 Notificaciones de progreso (Discord/Slack/Telegram)
-  use_memory: true            # 🧠 Persistencia de conocimiento entre sesiones
-  use_sessions_spawn: true    # 🔀 Generar sub-sesiones paralelas para etapas concurrentes
-  use_web_fetch: true         # 🌐 Busqueda web en vivo durante la revision de literatura
-  use_browser: false          # 🖥️ Recopilacion de articulos basada en navegador
-```
-
-Cada flag activa un protocolo de adaptador tipado. Cuando OpenClaw proporciona estas capacidades, los adaptadores las consumen sin cambios en el codigo. Consulta [`integration-guide.md`](integration-guide.md) para mas detalles.
-
-### ACP (Agent Client Protocol)
-
-AutoResearchClaw puede usar **cualquier agente de programacion compatible con ACP** como backend LLM — sin necesidad de claves API. El agente se comunica via [acpx](https://github.com/openclaw/acpx), manteniendo una sola sesion persistente a traves de las 23 etapas del pipeline.
-
-| Agente | Comando | Notas |
-|--------|---------|-------|
-| Claude Code | `claude` | Anthropic |
-| Codex CLI | `codex` | OpenAI |
-| Copilot CLI | `gh` | GitHub |
-| Gemini CLI | `gemini` | Google |
-| OpenCode | `opencode` | SST |
-| Kimi CLI | `kimi` | Moonshot |
-
-```yaml
-# config.yaml — ejemplo ACP
-llm:
-  provider: "acp"
-  acp:
-    agent: "claude"   # Cualquier comando CLI de agente compatible con ACP
-    cwd: "."          # Directorio de trabajo para el agente
-  # No se necesita base_url ni api_key — el agente gestiona su propia autenticacion.
-```
-
-```bash
-# Solo ejecuta — el agente usa sus propias credenciales
-researchclaw run --config config.yaml --topic "Your research idea" --auto-approve
-```
-
-### 🛠️ Otras formas de ejecucion
-
-| Metodo | Como |
-|--------|------|
-| **CLI independiente** | `researchclaw run --topic "..." --auto-approve` (autonomo) o `--mode co-pilot` (colaborativo) |
-| **API de Python** | `from researchclaw.pipeline import Runner; Runner(config).run()` |
-| **Claude Code** | Lee `RESEARCHCLAW_CLAUDE.md` — solo di *"Run research on [tema]"* |
-| **Copilot CLI** | `researchclaw run --topic "..."` con `llm.acp.agent: "gh"` |
-| **OpenCode** | Lee `.claude/skills/` — la misma interfaz en lenguaje natural |
-| **Cualquier CLI de IA** | Proporciona `RESEARCHCLAW_AGENTS.md` como contexto → el agente se auto-configura |
-
----
-
-## 🔬 Pipeline: 23 etapas, 8 fases
-
-```
-Fase A: Alcance de investigacion     Fase E: Ejecucion de experimentos
-  1. TOPIC_INIT                        12. EXPERIMENT_RUN
-  2. PROBLEM_DECOMPOSE                 13. ITERATIVE_REFINE  ← auto-reparacion
-
-Fase B: Descubrimiento de literatura Fase F: Analisis y decision
-  3. SEARCH_STRATEGY                   14. RESULT_ANALYSIS    ← multi-agente
-  4. LITERATURE_COLLECT  ← API real    15. RESEARCH_DECISION  ← PIVOT/REFINE
-  5. LITERATURE_SCREEN   [compuerta]
-  6. KNOWLEDGE_EXTRACT                 Fase G: Redaccion del articulo
-                                       16. PAPER_OUTLINE
-Fase C: Sintesis de conocimiento       17. PAPER_DRAFT
-  7. SYNTHESIS                         18. PEER_REVIEW        ← verif. evidencia
-  8. HYPOTHESIS_GEN    ← debate        19. PAPER_REVISION
-
-Fase D: Diseno experimental          Fase H: Finalizacion
-  9. EXPERIMENT_DESIGN   [compuerta]   20. QUALITY_GATE      [compuerta]
- 10. CODE_GENERATION                   21. KNOWLEDGE_ARCHIVE
- 11. RESOURCE_PLANNING                 22. EXPORT_PUBLISH     ← LaTeX
-                                       23. CITATION_VERIFY    ← verif. relevancia
-```
-
-> Las **etapas con compuerta** (5, 9, 20) se pausan para aprobacion humana o se auto-aprueban con `--auto-approve`. Al rechazar, el pipeline retrocede.
-
-> **Modo Co-Piloto** (`--mode co-pilot`): Colaboracion profunda humano-IA en las Etapas 7-8 (Taller de Ideas), Etapa 9 (Navegador de Baselines) y Etapas 16-17 (Co-Escritor de Articulos). Las demas etapas se auto-ejecutan con monitoreo SmartPause.
-
-> **Bucles de decision**: La etapa 15 puede activar REFINE (→ Etapa 13) o PIVOT (→ Etapa 8), con versionado automatico de artefactos.
-
-<details>
-<summary>📋 Que hace cada fase</summary>
-
-| Fase | Que sucede |
-|------|-----------|
-| **A: Alcance** | El LLM descompone el tema en un arbol de problemas estructurado con preguntas de investigacion |
-| **A+: Hardware** | Deteccion automatica de GPU (NVIDIA CUDA / Apple MPS / solo CPU), advierte si el hardware local es limitado, adapta la generacion de codigo en consecuencia |
-| **B: Literatura** | Busqueda multi-fuente (OpenAlex → Semantic Scholar → arXiv) de articulos reales, filtrado por relevancia, extraccion de fichas de conocimiento |
-| **C: Sintesis** | Agrupa hallazgos, identifica brechas de investigacion, genera hipotesis comprobables mediante debate multi-agente |
-| **D: Diseno** | Disena plan experimental, genera Python ejecutable adaptado al hardware (nivel de GPU → seleccion de paquetes), estima necesidades de recursos |
-| **E: Ejecucion** | Ejecuta experimentos en sandbox, detecta NaN/Inf y errores en tiempo de ejecucion, auto-repara codigo mediante reparacion LLM dirigida |
-| **F: Analisis** | Analisis multi-agente de resultados; decision autonoma PROCEED / REFINE / PIVOT con justificacion |
-| **G: Redaccion** | Esquema → redaccion seccion por seccion (5,000-6,500 palabras) → revision por pares (con consistencia metodologia-evidencia) → revision con guardia de longitud |
-| **H: Finalizacion** | Compuerta de calidad, archivado de conocimiento, exportacion LaTeX con plantilla de conferencia, verificacion de integridad + relevancia de citas |
-
-</details>
-
----
-
-## ✨ Caracteristicas principales
-
-| Caracteristica | Descripcion |
-|----------------|------------|
-| **📚 Literatura multi-fuente** | Articulos reales de OpenAlex, Semantic Scholar y arXiv — expansion de consultas, deduplicacion, circuit breaker con degradacion gradual |
-| **🔍 Verificacion de citas en 4 capas** | Verificacion de arXiv ID → DOI CrossRef/DataCite → coincidencia de titulo Semantic Scholar → puntuacion de relevancia LLM. Referencias alucinadas auto-eliminadas. |
-| **🖥️ Ejecucion adaptada al hardware** | Deteccion automatica de GPU (NVIDIA CUDA / Apple MPS / solo CPU) y adaptacion de la generacion de codigo, imports y escala experimental |
-| **🦾 OpenCode Beast Mode** | Los experimentos complejos se enrutan automaticamente a [OpenCode](https://github.com/anomalyco/opencode) — genera proyectos multi-archivo con arquitecturas personalizadas, bucles de entrenamiento y estudios de ablacion. Instalacion via `researchclaw setup`. |
-| **🧪 Experimentos en sandbox** | Codigo validado por AST, harness inmutable, fallo rapido NaN/Inf, reparacion auto-curativa, refinamiento iterativo (hasta 10 rondas), captura de resultados parciales |
-| **📝 Redaccion de calidad conferencia** | Plantillas NeurIPS/ICML/ICLR, redaccion seccion por seccion (5,000-6,500 palabras), guardia anti-fabricacion, guardia de longitud en revision, enforcement anti-disclaimer |
-| **📐 Cambio de plantilla** | `neurips_2025`, `iclr_2026`, `icml_2026` — Markdown → LaTeX con formulas, tablas, figuras, referencias cruzadas, `\cite{}` |
-| **🛡️ Anti-fabricacion** | VerifiedRegistry impone datos experimentales de verdad fundamental en los articulos. Auto-diagnostica experimentos fallidos y los repara antes de escribir. Numeros no verificados sanitizados. |
-| **🚦 Compuertas de calidad** | 3 compuertas con intervencion humana posible (etapas 5, 9, 20) con retroceso. Omitir con `--auto-approve`. |
-| **🧑‍✈️ Co-Piloto HITL** | 6 modos de intervencion con politicas por etapa. Taller de Ideas, Navegador de Baselines, Co-Escritor de Articulos para colaboracion profunda. SmartPause, guardias de presupuesto, politicas de escalacion y aprendizaje de intervencion para seguridad en produccion. Adaptadores CLI/WebSocket/MCP. |
-| **💰 Guardias de presupuesto** | Monitoreo de costos con alertas de umbral configurables (50%/80%/100%). El pipeline se auto-pausa cuando el costo excede el presupuesto. |
-| **🔐 Reproducibilidad** | Checksums SHA256 para todos los artefactos de etapa. Manifiestos inmutables para verificacion. Deshacer multi-nivel con snapshots versionados. |
-
----
-
-## 🧑‍✈️ Co-Piloto Human-in-the-Loop
-
-**AutoResearchClaw v0.4.0 introduce un sistema completo Human-in-the-Loop (HITL)** que transforma el pipeline de puramente autonomo a un motor de investigacion colaborativo humano-IA. Elige tu nivel de participacion:
-
-### Modos de intervencion
-
-| Modo | Comando | Que hace |
-|------|---------|----------|
-| **Full Auto** | `--auto-approve` | Comportamiento original — sin intervencion humana |
-| **Gate Only** | `--mode gate-only` | Pausa en las 3 etapas con compuerta (5, 9, 20) para aprobacion |
-| **Checkpoint** | `--mode checkpoint` | Pausa en cada limite de fase (8 checkpoints) |
-| **Co-Pilot** | `--mode co-pilot` | Colaboracion profunda en etapas criticas, auto en el resto |
-| **Step-by-Step** | `--mode step-by-step` | Pausa despues de cada etapa — aprende el pipeline |
-| **Express** | `--mode express` | Revision rapida — solo las 3 compuertas mas criticas |
-
-### Flujo de trabajo Co-Piloto
-
-```
-Tu: researchclaw run --topic "Ruido cuantico como regularizacion de redes neuronales" --mode co-pilot
-
-El pipeline ejecuta las Etapas 1-7 automaticamente...
-
-  ┌─────────────────────────────────────────────────────────────┐
-  │  HITL | Etapa 08: HYPOTHESIS_GEN                            │
-  │  Revision post-etapa                                        │
-  │                                                             │
-  │  Hipotesis mencionadas: 3                                   │
-  │  Puntuacion de novedad: 0.72 (moderada)                     │
-  │                                                             │
-  │  [a] Aprobar  [r] Rechazar  [e] Editar  [c] Colaborar      │
-  │  [i] Inyectar guia  [v] Ver salida  [q] Abortar            │
-  └─────────────────────────────────────────────────────────────┘
-
-Tu: c  (iniciar chat colaborativo)
-Tu: La Hipotesis 3 es interesante pero necesita Dropout/Label Smoothing como baselines
-IA:  Actualizado — se agregaron Dropout, Label Smoothing, MixUp, CutMix como baselines...
-Tu: aprobar
-
-El pipeline continua con tu hipotesis refinada...
-```
-
-### Comandos CLI
-
-```bash
-# Iniciar con modo HITL
-researchclaw run --topic "..." --mode co-pilot
-
-# Conectarse a un pipeline pausado (desde otra terminal)
-researchclaw attach artifacts/rc-2026-xxx
-
-# Verificar el estado del pipeline y HITL
-researchclaw status artifacts/rc-2026-xxx
-
-# Aprobar/rechazar desde otra terminal o script
-researchclaw approve artifacts/rc-2026-xxx --message "LGTM"
-researchclaw reject artifacts/rc-2026-xxx --reason "Falta baseline clave"
-
-# Inyectar guia para una etapa (incluso antes de que se ejecute)
-researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "Usar ResNet-50 como baseline principal"
-```
-
-### Capacidades clave
-
-| Caracteristica | Descripcion |
-|----------------|------------|
-| **Taller de Ideas** | Lluvia de ideas, evaluacion y refinamiento de hipotesis de forma colaborativa (Etapa 7-8) |
-| **Navegador de Baselines** | La IA sugiere baselines + el humano agrega/elimina + checklist de reproducibilidad (Etapa 9) |
-| **Co-Escritor de Articulos** | Redaccion seccion por seccion con edicion humana y pulido por IA (Etapa 16-19) |
-| **SmartPause** | Pausa dinamica basada en confianza — auto-detecta cuando la entrada humana ayudaria |
-| **Verificacion de afirmaciones** | Verificacion de hechos en linea contra la literatura recopilada — marca afirmaciones infundadas |
-| **Guardias de presupuesto** | Monitoreo de costos con alertas de umbral al 50%/80%/100% |
-| **Aprendizaje de intervencion** | ALHF — aprende de tus patrones de revision para optimizar futuras decisiones de pausa |
-| **Exploracion de ramas** | Bifurca el pipeline para explorar multiples hipotesis, compara y fusiona la mejor |
-| **Politica de escalacion** | Notificacion escalonada (terminal → Slack → email → auto-parada) cuando esta desatendido |
-| **3 Adaptadores** | CLI (terminal), WebSocket (panel web), MCP (agentes externos) |
-
-### Configuracion
-
-```yaml
-# config.arc.yaml
-hitl:
-  enabled: true
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | co-pilot | custom
-  cost_budget_usd: 50.0              # Pausar cuando el costo exceda el presupuesto (0 = sin limite)
-
-  notifications:
-    on_pause: true
-    on_quality_drop: true
-    channels: ["terminal"]            # terminal | slack | webhook
-
-  timeouts:
-    default_human_timeout_sec: 86400  # 24h de espera por defecto
-    auto_proceed_on_timeout: false
-
-  collaboration:
-    max_chat_turns: 50
-    save_chat_history: true
-
-  # Politicas personalizadas por etapa (opcional, para modo 'custom')
-  stage_policies:
-    8: { require_approval: true, enable_collaboration: true }
-    9: { require_approval: true, allow_edit_output: true }
-```
-
-### Retrocompatibilidad
-
-- **Por defecto: DESACTIVADO.** Sin `hitl.enabled: true` o `--mode`, el pipeline se comporta exactamente como antes.
-- **`--auto-approve` sigue funcionando.** Anula el modo HITL.
-- **Los 2,699 tests existentes pasan** con el codigo HITL presente.
-
----
-
-## 🧠 Integracion MetaClaw
-
-**AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = Un pipeline que aprende de cada ejecucion.**
-
-MetaClaw agrega **transferencia de conocimiento entre ejecuciones** a AutoResearchClaw. Cuando esta habilitado, el pipeline captura automaticamente lecciones de fallos y advertencias, las convierte en habilidades reutilizables, e inyecta esas habilidades en las 23 etapas del pipeline en ejecuciones posteriores — para que los mismos errores nunca se repitan.
-
-### Como funciona
-
-```
-Ejecucion N se ejecuta → fallos/advertencias capturados como Lecciones
-                      ↓
-          MetaClaw Leccion → conversion a Habilidad
-                      ↓
-          Archivos de habilidades arc-* almacenados en ~/.metaclaw/skills/
-                      ↓
-Ejecucion N+1 → build_overlay() inyecta habilidades en cada prompt LLM
-                      ↓
-          El LLM evita trampas conocidas → mayor calidad, menos reintentos
-```
-
-### Configuracion rapida
-
-```bash
-# 1. Instalar MetaClaw (si no esta instalado)
-pip install metaclaw
-
-# 2. Habilitar en tu configuracion
-```
-
-```yaml
-# config.arc.yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000"        # Proxy MetaClaw (opcional)
-  skills_dir: "~/.metaclaw/skills"          # Donde se almacenan las habilidades
-  fallback_url: "https://api.openai.com/v1" # Fallback directo al LLM
-  fallback_api_key: ""                      # Clave API para la URL de fallback
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # Convertir advertencias + errores
-    max_skills_per_run: 3
-```
-
-```bash
-# 3. Ejecuta como siempre — MetaClaw funciona de forma transparente
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-```
-
-Despues de cada ejecucion, revisa `~/.metaclaw/skills/arc-*/SKILL.md` para ver las habilidades que tu pipeline ha aprendido.
-
-### Resultados experimentales
-
-En experimentos controlados A/B (mismo tema, mismo LLM, misma configuracion):
-
-| Metrica | Linea base | Con MetaClaw | Mejora |
-|---------|------------|--------------|--------|
-| Tasa de reintento de etapas | 10.5% | 7.9% | **-24.8%** |
-| Conteo de ciclos REFINE | 2.0 | 1.2 | **-40.0%** |
-| Completacion de etapas del pipeline | 18/19 | 19/19 | **+5.3%** |
-| Puntuacion de robustez global (compuesta) | 0.714 | 0.845 | **+18.3%** |
-
-> La puntuacion de robustez compuesta es un promedio ponderado de la tasa de completacion de etapas (40%), reduccion de reintentos (30%) y eficiencia de ciclos REFINE (30%).
-
-### Retrocompatibilidad
-
-- **Por defecto: DESACTIVADO.** Si `metaclaw_bridge` esta ausente o `enabled: false`, el pipeline se comporta exactamente como antes.
-- **Sin nuevas dependencias.** MetaClaw es opcional — el pipeline base funciona sin el.
-- **Los 2,699 tests existentes pasan** con el codigo de integracion presente.
-
----
-
-## 🧩 Biblioteca de Habilidades
-
-AutoResearchClaw ahora soporta la carga de **habilidades de codigo abierto y personalizadas** para mejorar aun mas tu experiencia de investigacion. Tambien incluimos **20 habilidades integradas precargadas** (redaccion cientifica, busqueda de literatura, quimica, biologia y mas) como referencias listas para usar, ofreciendo un alto grado de flexibilidad desde el primer momento. Desactiva cualquier habilidad agregando `enabled: false` a su frontmatter.
-
-**Habilidades integradas de ejemplo:**
-
-| Categoria | Habilidad | Descripcion |
-|-----------|-----------|-------------|
-| **Redaccion** | `scientific-writing` | Estructura IMRAD, formato de citas, guias de reporte |
-| **Dominio** | `chemistry-rdkit` | Analisis molecular, SMILES, fingerprints, descubrimiento de farmacos |
-| **Experimento** | `literature-search` | Revision sistematica, metodologia PRISMA |
-
-> Ver las 20 habilidades con `researchclaw skills list`.
-
-### Carga tus propias habilidades
-
-```bash
-# Opcion 1: Instalar una habilidad (persiste entre proyectos)
-researchclaw skills install /path/to/my-skill/
-
-# Opcion 2: Coloca un SKILL.md en el proyecto
-mkdir -p .claude/skills/my-custom-skill
-# Luego crea un SKILL.md con frontmatter YAML (name, description, trigger-keywords, applicable-stages)
-
-# Opcion 3: Configura directorios de habilidades compartidos en config.arc.yaml
-# skills:
-#   custom_dirs:
-#     - /path/to/team-shared-skills
-```
-
-### Uso de habilidades
-
-Las habilidades se cargan e inyectan en los prompts del LLM automaticamente — no se necesita activacion manual. Usa el CLI para inspeccionar:
-
-```bash
-researchclaw skills list               # Muestra todas las habilidades cargadas con sus fuentes
-researchclaw skills validate ./my-skill # Verifica el formato de SKILL.md
-```
-
-Explora habilidades de la comunidad: [K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills) (150+ habilidades cientificas en multiples disciplinas).
-
----
-
-## ⚙️ Referencia de configuracion
-
-<details>
-<summary>Haz clic para expandir la referencia completa de configuracion</summary>
-
-```yaml
-# === Proyecto ===
-project:
-  name: "my-research"              # Identificador del proyecto
-  mode: "docs-first"               # docs-first | semi-auto | full-auto
-
-# === Investigacion ===
-research:
-  topic: "..."                     # Tema de investigacion (requerido)
-  domains: ["ml", "nlp"]           # Dominios de investigacion para busqueda de literatura
-  daily_paper_count: 8             # Articulos objetivo por consulta de busqueda
-  quality_threshold: 4.0           # Puntuacion minima de calidad para articulos
-
-# === Tiempo de ejecucion ===
-runtime:
-  timezone: "America/New_York"     # Para marcas de tiempo
-  max_parallel_tasks: 3            # Limite de experimentos concurrentes
-  approval_timeout_hours: 12       # Timeout de etapas con compuerta
-  retry_limit: 2                   # Numero de reintentos por fallo de etapa
-
-# === LLM ===
-llm:
-  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
-  base_url: "https://..."          # Endpoint de API (requerido para openai-compatible)
-  api_key_env: "OPENAI_API_KEY"    # Variable de entorno para la clave API (requerido para openai-compatible)
-  api_key: ""                      # O codifica la clave aqui directamente
-  primary_model: "gpt-4o"          # Modelo principal
-  fallback_models: ["gpt-4o-mini"] # Cadena de fallback
-  s2_api_key: ""                   # Clave API de Semantic Scholar (opcional, mayores limites de tasa)
-  acp:                             # Solo se usa cuando provider: "acp"
-    agent: "claude"                # Comando CLI del agente ACP (claude, codex, gemini, etc.)
-    cwd: "."                       # Directorio de trabajo para el agente
-
-# === Experimento ===
-experiment:
-  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
-  time_budget_sec: 300             # Tiempo maximo de ejecucion por corrida (por defecto: 300s)
-  max_iterations: 10               # Maximo de iteraciones de optimizacion
-  metric_key: "val_loss"           # Nombre de la metrica principal
-  metric_direction: "minimize"     # minimize | maximize
-  sandbox:
-    python_path: ".venv/bin/python"
-    gpu_required: false
-    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
-    max_memory_mb: 4096
-  docker:
-    image: "researchclaw/experiment:latest"
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    auto_install_deps: true        # Deteccion automatica de imports → requirements.txt
-  ssh_remote:
-    host: ""                       # Nombre de host del servidor GPU
-    gpu_ids: []                    # IDs de GPU disponibles
-    remote_workdir: "/tmp/researchclaw_experiments"
-  opencode:                          # OpenCode Beast Mode (auto-instalado via `researchclaw setup`)
-    enabled: true                    # Interruptor principal (por defecto: true)
-    auto: true                       # Auto-activacion sin confirmacion (por defecto: true)
-    complexity_threshold: 0.2        # 0.0-1.0 — mas alto = solo se activa para experimentos complejos
-    model: ""                        # Modelo a forzar (vacio = usa llm.primary_model)
-    timeout_sec: 600                 # Segundos maximos para generacion OpenCode
-    max_retries: 1                   # Numero de reintentos por fallo
-    workspace_cleanup: true          # Eliminar workspace temporal despues de recoleccion
-  code_agent:                        # CodeAgent v2 — generacion de codigo multi-fase
-    enabled: true                    # Usar CodeAgent en vez del codegen legacy de un solo prompt
-    architecture_planning: true      # Generar blueprint de implementacion profunda antes de codificar
-    sequential_generation: true      # Generar archivos uno a uno siguiendo el DAG de dependencias
-    hard_validation: true            # Validacion AST (bloquea ablaciones identicas, metricas hardcodeadas)
-    hard_validation_max_repairs: 2   # Max intentos de reparacion cuando la validacion falla
-    exec_fix_max_iterations: 3       # Intentos de correccion de ejecucion en bucle
-    exec_fix_timeout_sec: 60         # Timeout por intento de exec-fix
-  benchmark_agent:                   # BenchmarkAgent — seleccion automatizada de datasets y baselines
-    enabled: true                    # Habilitar pipeline de 4 agentes (Surveyor→Selector→Acquirer→Validator)
-    enable_hf_search: true           # Buscar en HuggingFace Datasets
-    enable_web_search: true          # Buscar en Google Scholar para benchmarks
-    tier_limit: 2                    # Filtrado de nivel de dataset (1=pequeno/cache, 2=medio, 3=grande)
-    min_benchmarks: 1                # Minimo de datasets requeridos
-    min_baselines: 2                 # Minimo de metodos baseline requeridos
-  figure_agent:                      # FigureAgent — generacion de figuras academicas
-    enabled: true                    # Habilitar pipeline de 5 agentes (Planner→CodeGen→Renderer→Critic→Integrator)
-    min_figures: 3                   # Minimo de figuras a generar
-    max_figures: 8                   # Maximo de figuras
-    max_iterations: 3                # Iteraciones de refinamiento dirigidas por el Critic
-    dpi: 300                         # Resolucion de salida
-    strict_mode: false               # Fallar pipeline si la generacion de figuras falla
-  repair:                            # Reparacion de experimentos anti-fabricacion
-    enabled: true                    # Auto-diagnosticar y reparar experimentos fallidos
-    max_cycles: 3                    # Bucles de reintento de reparacion
-    min_completion_rate: 0.5         # >=50% de condiciones deben completarse para continuar
-    min_conditions: 2                # Al menos 2 condiciones para un experimento valido
-    use_opencode: true               # Enrutar reparaciones a traves de OpenCode Beast Mode
-
-# === Busqueda web (Opcional) ===
-web_search:
-  enabled: true                      # Habilitar busqueda de literatura aumentada por web
-  tavily_api_key_env: "TAVILY_API_KEY"  # Variable de entorno para clave API de Tavily (opcional)
-  enable_scholar: true               # Busqueda en Google Scholar
-  enable_pdf_extraction: true        # Extraer texto de PDFs
-  max_web_results: 10                # Maximo de resultados web por consulta
-
-# === Exportacion ===
-export:
-  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
-  authors: "Anonymous"
-  bib_file: "references"
-
-# === Prompts ===
-prompts:
-  custom_file: ""                  # Ruta a YAML de prompts personalizados (vacio = valores por defecto)
-
-# === Co-Piloto HITL (NUEVO en v0.4.0) ===
-hitl:
-  enabled: false                     # Establecer en true para habilitar HITL
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
-  cost_budget_usd: 0.0              # Limite de costo en USD (0 = sin limite)
-  notifications:
-    on_pause: true                   # Notificar cuando el pipeline se pausa
-    on_quality_drop: true            # Notificar por problemas de calidad
-    channels: ["terminal"]           # terminal | slack | webhook
-  timeouts:
-    default_human_timeout_sec: 86400 # Esperar hasta 24h por entrada humana
-    auto_proceed_on_timeout: false   # Si es true, auto-aprobar al expirar timeout
-  collaboration:
-    max_chat_turns: 50               # Max turnos por sesion de colaboracion
-    save_chat_history: true          # Persistir registros de chat
-  stage_policies: {}                 # Overrides por etapa (para modo 'custom')
-
-# === Seguridad ===
-security:
-  hitl_required_stages: [5, 9, 20] # Etapas que requieren aprobacion humana
-  allow_publish_without_approval: false
-  redact_sensitive_logs: true
-
-# === Base de conocimiento ===
-knowledge_base:
-  backend: "markdown"              # markdown | obsidian
-  root: "docs/kb"
-
-# === Notificaciones ===
-notifications:
-  channel: "console"               # console | discord | slack
-  target: ""
-
-# === Puente MetaClaw (Opcional) ===
-metaclaw_bridge:
-  enabled: false                   # Establecer en true para habilitar aprendizaje entre ejecuciones
-  proxy_url: "http://localhost:30000"  # URL del proxy MetaClaw
-  skills_dir: "~/.metaclaw/skills" # Donde se almacenan las habilidades arc-*
-  fallback_url: ""                 # Fallback directo al LLM cuando el proxy esta caido
-  fallback_api_key: ""             # Clave API para el endpoint de fallback
-  lesson_to_skill:
-    enabled: true                  # Convertir lecciones en habilidades automaticamente
-    min_severity: "warning"        # Severidad minima para conversion
-    max_skills_per_run: 3          # Max de nuevas habilidades por ejecucion del pipeline
-  prm:                             # Process Reward Model compuerta de calidad (opcional)
-    enabled: false                 # Usar LLM-como-juez para puntuar salidas de etapas
-    model: "gpt-5.4"              # Modelo juez PRM
-    votes: 3                       # Conteo de voto mayoritario
-    gate_stages: [5, 9, 15, 20]   # Etapas donde aplicar compuertas PRM
-
-# === Bridge de OpenClaw ===
-openclaw_bridge:
-  use_cron: false                  # Ejecuciones de investigacion programadas
-  use_message: false               # Notificaciones de progreso
-  use_memory: false                # Persistencia de conocimiento entre sesiones
-  use_sessions_spawn: false        # Generar sub-sesiones paralelas
-  use_web_fetch: false             # Busqueda web en vivo
-  use_browser: false               # Recopilacion de articulos basada en navegador
-```
-
-</details>
-
----
-
-## 🙏 Agradecimientos
-
-Inspirado por:
-
-- 🔬 [AI Scientist](https://github.com/SakanaAI/AI-Scientist) (Sakana AI) — Pionero en investigacion automatizada
-- 🧠 [AutoResearch](https://github.com/karpathy/autoresearch) (Andrej Karpathy) — Automatizacion de investigacion de principio a fin
-- 🌐 [FARS](https://analemma.ai/blog/introducing-fars/) (Analemma) — Sistema de investigacion completamente automatizado
-
----
-
-## 📄 Licencia
-
-MIT — consulta [LICENSE](../LICENSE) para mas detalles.
-
----
-
-## 📌 Citacion
-
-Si encuentras AutoResearchClaw util, por favor cita:
-
-```bibtex
-@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
-      title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
-      author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
-      year={2026},
-      eprint={2605.20025},
-      archivePrefix={arXiv},
-      primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2605.20025},
-}
-```
-
-<p align="center">
-  <sub>Construido con 🦞 por el equipo de AutoResearchClaw</sub>
-</p>
diff --git a/docs/README_FR.md b/docs/README_FR.md
deleted file mode 100644
index 8a42a0af..00000000
--- a/docs/README_FR.md
+++ /dev/null
@@ -1,790 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="700" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center"><b>Discutez une idee. Obtenez un article. Autonome, Collaboratif & Auto-evolutif.</b></h2>
-
-
-
-<p align="center">
-  <b><i><font size="5">Discutez avec <a href="#-integration-openclaw">OpenClaw</a> : "Recherche X" → termine.</font></i></b>
-</p>
-
-<p align="center">
-  📄 <b>Notre article est sur arXiv — venez le lire !</b> <a href="https://arxiv.org/abs/2605.20025"><i>AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration</i></a>
-</p>
-
-<p align="center">
-  <img src="../image/framework_v2.png" width="100%" alt="AutoResearchClaw Framework">
-</p>
-
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2605.20025"><img src="https://img.shields.io/badge/arXiv-2605.20025-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
-  <a href="https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ARC--Bench-yellow" alt="ARC-Bench on Hugging Face"></a>
-  <a href="../LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT License"></a>
-  <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" alt="Python 3.11+"></a>
-  <a href="#testing"><img src="https://img.shields.io/badge/Tests-2699%20passed-brightgreen?logo=pytest&logoColor=white" alt="2699 Tests Passed"></a>
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/GitHub-AutoResearchClaw-181717?logo=github" alt="GitHub"></a>
-  <a href="#-integration-openclaw"><img src="https://img.shields.io/badge/OpenClaw-Compatible-ff4444?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6IiBmaWxsPSJ3aGl0ZSIvPjwvc3ZnPg==" alt="OpenClaw Compatible"></a>
-  <a href="https://discord.gg/u4ksqW5P"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
-</p>
-
-<p align="center">
-  <a href="../README.md">🇺🇸 English</a> ·
-  <a href="README_CN.md">🇨🇳 中文</a> ·
-  <a href="README_JA.md">🇯🇵 日本語</a> ·
-  <a href="README_KO.md">🇰🇷 한국어</a> ·
-  <a href="README_FR.md">🇫🇷 Français</a> ·
-  <a href="README_DE.md">🇩🇪 Deutsch</a> ·
-  <a href="README_ES.md">🇪🇸 Español</a> ·
-  <a href="README_PT.md">🇧🇷 Português</a> ·
-  <a href="README_RU.md">🇷🇺 Русский</a> ·
-  <a href="README_AR.md">🇸🇦 العربية</a>
-</p>
-
-<p align="center">
-  <a href="showcase/SHOWCASE.md">🏆 Vitrine des articles</a> · <a href="HITL_GUIDE.md">🧑‍✈️ Guide Co-Pilote</a> · <a href="integration-guide.md">📖 Guide d'integration</a> · <a href="https://discord.gg/u4ksqW5P">💬 Communaute Discord</a>
-</p>
-
----
-
-<table>
-<tr>
-<td width="18%">
-<a href="showcase/SHOWCASE.md"><img src="showcase/thumbnails/paper_I_random_matrix-01.png" width="120" alt="Sample Paper"/></a>
-</td>
-<td valign="middle">
-<b>🏆 Vitrine des articles generes</b><br><br>
-<b>8 articles couvrant 8 domaines</b> — mathematiques, statistiques, biologie, informatique, NLP, RL, vision, robustesse — generes de maniere entierement autonome ou avec guidage co-pilote Human-in-the-Loop.<br><br>
-<a href="showcase/SHOWCASE.md"><img src="https://img.shields.io/badge/View_Full_Showcase_→-All_8_Papers-d73a49?style=for-the-badge" alt="View Showcase"></a>
-</td>
-</tr>
-</table>
-
----
-
-> **🧪 Nous recherchons des testeurs !** Essayez le pipeline avec votre propre idee de recherche — dans n'importe quel domaine — et [dites-nous ce que vous en pensez](TESTER_GUIDE.md). Vos retours faconnent directement la prochaine version. **[→ Testing Guide](TESTER_GUIDE.md)** | **[→ 中文测试指南](TESTER_GUIDE_CN.md)** | **[→ 日本語テストガイド](TESTER_GUIDE_JA.md)**
-
----
-
-## 🔥 News
-- **[05/19/2026]** **v0.5.0** — **Agents d'experience multi-domaines + ARC-Bench** — Deux mises a jour majeures. **(1) Agents d'execution specialises par domaine :** l'etape d'experience (etapes 10 a 13) ne se limite plus au bac a sable ML par defaut et s'oriente vers des agents specialises selon le domaine — **physique des hautes energies** (ColliderAgent : FeynRules → MadGraph5 → Delphes via le cloud Magnus), **biologie** (modelisation metabolique a l'echelle du genome avec COBRApy) et **statistiques** (agent d'etudes de simulation), avec un executeur Docker generique pour la chimie/les materiaux. Le pipeline selectionne automatiquement le bon executeur selon le domaine. **(2) ARC-Bench :** un benchmark de recherche autonome ouvert de **55 sujets** couvrant **ML (25), physique des hautes energies (10), quantique (10), biologie (7) et statistiques (3)**, chacun fourni avec un manifeste et une grille de notation (`experiments/arc_bench/`, et aussi sur [🤗 Hugging Face](https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench)). **[→ Guide d'integration des domaines](DOMAIN_INTEGRATION_GUIDE.md)**
-- **[04/01/2026]** **v0.4.0** — **Systeme Co-Pilote Human-in-the-Loop** — AutoResearchClaw n'est plus purement autonome. Le nouveau systeme HITL ajoute 6 modes d'intervention (`full-auto`, `gate-only`, `checkpoint`, `step-by-step`, `co-pilot`, `custom`), des politiques par etape, et une collaboration profonde humain-IA. Inclut : Atelier d'Idees pour la co-creation d'hypotheses, Navigateur de References pour la revue de conception experimentale, Co-Redacteur d'Article pour la redaction collaborative, SmartPause (intervention dynamique guidee par la confiance), apprentissage d'intervention ALHF, verification anti-hallucination des affirmations, garde-fous de budget, ramification du pipeline pour l'exploration parallele d'hypotheses, et commandes CLI (`attach`/`status`/`approve`/`reject`/`guide`). **[→ Guide HITL complet](HITL_GUIDE.md)**
-- **[03/30/2026]** **Chargement flexible de competences** — AutoResearchClaw supporte desormais le chargement de competences open-source et personnalisees depuis n'importe quelle discipline pour enrichir votre experience de recherche. 20 competences pre-chargees sont incluses comme references pretes a l'emploi, couvrant la redaction scientifique, la conception experimentale, la chimie, la biologie, et plus encore — incluant une competence d'evolution agentique [A-Evolve](https://github.com/A-EVO-Lab/a-evolve) contribuee par la communaute. Chargez les votres via `researchclaw skills install` ou deposez un `SKILL.md` dans `.claude/skills/`. Voir [Bibliotheque de competences](#-bibliotheque-de-competences).
-- **[03/22/2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) — **Support multiplateforme + stabilite majeure** — AutoResearchClaw fonctionne desormais avec tout agent compatible ACP (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) et supporte les plateformes de messagerie (Discord, Telegram, Lark, WeChat) via le pont OpenClaw. Nouveau backend de generation de code CLI-agent qui delegue les Stages 10 et 13 a des agents CLI externes avec controle de budget et gestion des timeouts. Inclut le systeme anti-fabrication (VerifiedRegistry + boucle diagnostic/reparation), 100+ corrections de bugs, refactoring modulaire de l'executor, auto-detection `--resume`, renforcement des retries LLM, et corrections communautaires.
-
-<details>
-<summary>Versions antérieures</summary>
-
-- **[03/18/2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) — **OpenCode Beast Mode + Community Contributions** — New "Beast Mode" routes complex code generation to [OpenCode](https://github.com/anomalyco/opencode) with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
-- **[03/17/2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) — **MetaClaw Integration** — AutoResearchClaw now supports [MetaClaw](https://github.com/aiming-lab/MetaClaw) cross-run learning: pipeline failures → structured lessons → reusable skills, injected into all 23 stages. **+18.3%** robustness in controlled experiments. Opt-in (`metaclaw_bridge.enabled: true`), fully backward-compatible. See [Integration Guide](#-integration-metaclaw).
-- **[03/16/2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) — Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
-- **[03/15/2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) — We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.
-
-</details>
-
----
-
-## ⚡ Une commande. Un article.
-
-```bash
-# Entierement autonome — aucune intervention humaine
-pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve
-
-# Mode Co-Pilote — collaborez avec l'IA aux points de decision cles
-researchclaw run --topic "Your research idea here" --mode co-pilot
-```
-
-
----
-
-## 🤔 De quoi s'agit-il ?
-
-**Vous y pensez. AutoResearchClaw l'ecrit. Vous guidez les decisions cles.**
-
-Donnez un sujet de recherche — recevez un article academique complet avec de la vraie litterature provenant d'OpenAlex, Semantic Scholar et arXiv, des experiences en sandbox adaptees au materiel (detection automatique GPU/MPS/CPU), une analyse statistique, une relecture multi-agents, et du LaTeX pret pour les conferences ciblant NeurIPS/ICML/ICLR. Executez-le en mode entierement autonome, ou utilisez le **mode Co-Pilote** pour guider l'IA aux points de decision critiques — choisissez les directions de recherche, revisez les conceptions experimentales, et co-redigez l'article. Aucune reference hallucinee.
-
-<table>
-<tr><td>📄</td><td><code>paper_draft.md</code></td><td>Article academique complet (Introduction, Travaux connexes, Methode, Experiences, Resultats, Conclusion)</td></tr>
-<tr><td>📐</td><td><code>paper.tex</code></td><td>LaTeX pret pour les conferences (templates NeurIPS / ICLR / ICML)</td></tr>
-<tr><td>📚</td><td><code>references.bib</code></td><td>References BibTeX reelles provenant d'OpenAlex, Semantic Scholar et arXiv — auto-elaguees pour correspondre aux citations dans le texte</td></tr>
-<tr><td>🔍</td><td><code>verification_report.json</code></td><td>Verification d'integrite et de pertinence des citations sur 4 couches (arXiv, CrossRef, DataCite, LLM)</td></tr>
-<tr><td>🧪</td><td><code>experiment runs/</code></td><td>Code genere + resultats sandbox + metriques JSON structurees</td></tr>
-<tr><td>📊</td><td><code>charts/</code></td><td>Graphiques de comparaison de conditions auto-generes avec barres d'erreur et intervalles de confiance</td></tr>
-<tr><td>📝</td><td><code>reviews.md</code></td><td>Relecture multi-agents avec verification de coherence methodologie-preuves</td></tr>
-<tr><td>🧬</td><td><code>evolution/</code></td><td>Lecons d'auto-apprentissage extraites de chaque execution</td></tr>
-<tr><td>📦</td><td><code>deliverables/</code></td><td>Tous les livrables finaux dans un seul dossier — pret a compiler pour Overleaf</td></tr>
-</table>
-
-Le pipeline s'execute **de bout en bout** — entierement autonome ou avec collaboration human-in-the-loop. Quand les experiences echouent, il s'auto-repare. Quand les hypotheses ne tiennent pas, il pivote. Quand les citations sont fausses, il les supprime. Quand vous voulez intervenir, il se met en pause et ecoute.
-
-🌍 **Utilisable partout.** AutoResearchClaw n'est pas verrouille sur une seule plateforme. Utilisez-le en CLI autonome, connectez-le a [OpenClaw](https://github.com/openclaw/openclaw), ou integrez-le avec n'importe quel agent compatible ACP — 🤖 Claude Code, 💻 Codex CLI, 🐙 Copilot CLI, ♊ Gemini CLI, 🌙 Kimi CLI, et bien d'autres. Grace au pont de messagerie d'OpenClaw, vous pouvez lancer une recherche complete depuis 💬 Discord, ✈️ Telegram, 🐦 Lark (飞书), 💚 WeChat, ou la plateforme que votre equipe utilise deja. Un sujet en entree, un article en sortie — peu importe d'ou vous l'envoyez.
-
----
-
-## 🚀 Demarrage rapide
-
-```bash
-# 1. Cloner & installer
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-python3 -m venv .venv && source .venv/bin/activate
-pip install -e .
-
-# 2. Setup (interactif — installe OpenCode beast mode, verifie Docker/LaTeX)
-researchclaw setup
-
-# 3. Configurer
-researchclaw init          # Interactif : choisir le fournisseur LLM, cree config.arc.yaml
-# Ou manuellement : cp config.researchclaw.example.yaml config.arc.yaml
-
-# 4. Executer
-export OPENAI_API_KEY="sk-..."
-researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve
-```
-
-Sortie → `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` — LaTeX pret a compiler, BibTeX, code d'experience, graphiques.
-
-<details>
-<summary>📝 Configuration minimale requise</summary>
-
-```yaml
-project:
-  name: "my-research"
-
-research:
-  topic: "Your research topic here"
-
-llm:
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-4o"
-  fallback_models: ["gpt-4o-mini"]
-
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python"
-```
-
-</details>
-
----
-
-## 🧠 Ce qui le distingue
-
-| Capacite | Fonctionnement |
-|----------|---------------|
-| **🧑‍✈️ Mode Co-Pilote** | 6 modes d'intervention — du mode entierement autonome au pas-a-pas. Guidez l'IA aux decisions critiques (hypotheses, references, redaction) ou laissez-la faire. SmartPause detecte automatiquement quand une intervention humaine serait utile. |
-| **🔄 Boucle PIVOT / REFINE** | L'etape 15 decide de maniere autonome : PROCEED, REFINE (ajuster les parametres) ou PIVOT (nouvelle direction). Artefacts auto-versionnes. |
-| **🤖 Debat multi-agents** | La generation d'hypotheses, l'analyse de resultats et la relecture par les pairs utilisent chacune un debat structure multi-perspectives. |
-| **🧬 Auto-apprentissage** | Lecons extraites a chaque execution (justification des decisions, avertissements d'execution, anomalies de metriques) avec decroissance temporelle a 30 jours. Les executions futures apprennent des erreurs passees. |
-| **📚 Base de connaissances** | Chaque execution construit une KB structuree couvrant 6 categories (decisions, experiences, resultats, litterature, questions, relectures). |
-| **🛡️ Sentinel Watchdog** | Moniteur de qualite en arriere-plan : detection NaN/Inf, coherence article-preuves, score de pertinence des citations, protection anti-fabrication. |
-| **🔍 Verification des affirmations** | Verification factuelle en ligne : extrait les affirmations du texte genere par l'IA et les recoupe avec la litterature collectee. Signale les citations non fondees et les chiffres fabriques. |
-| **🌿 Exploration par ramification** | Dupliquez le pipeline pour explorer plusieurs directions de recherche simultanement, comparez les resultats cote a cote, et fusionnez le meilleur chemin. |
-
----
-
-## 🦞 Integration OpenClaw
-
-<table>
-<tr>
-
-**AutoResearchClaw est un service compatible [OpenClaw](https://github.com/openclaw/openclaw).** Installez-le dans OpenClaw et lancez une recherche autonome avec un seul message — ou utilisez-le de maniere autonome via CLI, Claude Code, ou tout assistant de codage IA.
-
-</tr>
-</table>
-
-### 🚀 Utilisation avec OpenClaw (recommande)
-
-Si vous utilisez deja [OpenClaw](https://github.com/openclaw/openclaw) comme assistant IA :
-
-```
-1️⃣  Partagez l'URL du depot GitHub avec OpenClaw
-2️⃣  OpenClaw lit automatiquement RESEARCHCLAW_AGENTS.md → comprend le pipeline
-3️⃣  Dites : "Research [votre sujet]"
-4️⃣  C'est fait — OpenClaw clone, installe, configure, execute et renvoie les resultats
-```
-
-**C'est tout.** OpenClaw gere `git clone`, `pip install`, la configuration et l'execution du pipeline automatiquement. Vous n'avez qu'a discuter.
-
-<details>
-<summary>💡 Ce qui se passe en coulisses</summary>
-
-1. OpenClaw lit `RESEARCHCLAW_AGENTS.md` → apprend le role d'orchestrateur de recherche
-2. OpenClaw lit `README.md` → comprend l'installation et la structure du pipeline
-3. OpenClaw copie `config.researchclaw.example.yaml` → `config.yaml`
-4. Demande votre cle API LLM (ou utilise votre variable d'environnement)
-5. Execute `pip install -e .` + `researchclaw run --topic "..." --auto-approve`
-6. Renvoie l'article, le LaTeX, les experiences et les citations
-
-</details>
-
-### 🔌 Pont OpenClaw (avance)
-
-Pour une integration plus poussee, AutoResearchClaw inclut un **systeme d'adaptateurs pont** avec 6 fonctionnalites optionnelles :
-
-```yaml
-# config.arc.yaml
-openclaw_bridge:
-  use_cron: true              # ⏰ Executions de recherche planifiees
-  use_message: true           # 💬 Notifications de progression (Discord/Slack/Telegram)
-  use_memory: true            # 🧠 Persistance des connaissances inter-sessions
-  use_sessions_spawn: true    # 🔀 Lancement de sous-sessions paralleles pour les etapes concurrentes
-  use_web_fetch: true         # 🌐 Recherche web en direct pendant la revue de litterature
-  use_browser: false          # 🖥️ Collecte d'articles via navigateur
-```
-
-Chaque option active un protocole d'adaptateur type. Quand OpenClaw fournit ces fonctionnalites, les adaptateurs les consomment sans modification de code. Voir [`integration-guide.md`](integration-guide.md) pour tous les details.
-
-### ACP (Agent Client Protocol)
-
-AutoResearchClaw peut utiliser **n'importe quel agent de codage compatible ACP** comme backend LLM — sans cle API requise. L'agent communique via [acpx](https://github.com/openclaw/acpx), en maintenant une session persistante unique a travers les 23 etapes du pipeline.
-
-| Agent | Commande | Notes |
-|-------|----------|-------|
-| Claude Code | `claude` | Anthropic |
-| Codex CLI | `codex` | OpenAI |
-| Copilot CLI | `gh` | GitHub |
-| Gemini CLI | `gemini` | Google |
-| OpenCode | `opencode` | SST |
-| Kimi CLI | `kimi` | Moonshot |
-
-```yaml
-# config.yaml — exemple ACP
-llm:
-  provider: "acp"
-  acp:
-    agent: "claude"   # N'importe quel agent CLI compatible ACP
-    cwd: "."          # Repertoire de travail pour l'agent
-  # Pas besoin de base_url ou api_key — l'agent gere sa propre authentification.
-```
-
-```bash
-# Executez simplement — l'agent utilise ses propres identifiants
-researchclaw run --config config.yaml --topic "Your research idea" --auto-approve
-```
-
-### 🛠️ Autres methodes d'execution
-
-| Methode | Comment |
-|---------|---------|
-| **CLI autonome** | `researchclaw run --topic "..." --auto-approve` (autonome) ou `--mode co-pilot` (collaboratif) |
-| **API Python** | `from researchclaw.pipeline import Runner; Runner(config).run()` |
-| **Claude Code** | Lit `RESEARCHCLAW_CLAUDE.md` — dites simplement *"Run research on [sujet]"* |
-| **Copilot CLI** | `researchclaw run --topic "..."` avec `llm.acp.agent: "gh"` |
-| **OpenCode** | Lit `.claude/skills/` — meme interface en langage naturel |
-| **Tout CLI IA** | Fournissez `RESEARCHCLAW_AGENTS.md` comme contexte → l'agent s'auto-initialise |
-
----
-
-## 🔬 Pipeline : 23 etapes, 8 phases
-
-```
-Phase A : Cadrage de la recherche     Phase E : Execution des experiences
-  1. TOPIC_INIT                         12. EXPERIMENT_RUN
-  2. PROBLEM_DECOMPOSE                  13. ITERATIVE_REFINE  ← auto-reparation
-
-Phase B : Decouverte de litterature   Phase F : Analyse et decision
-  3. SEARCH_STRATEGY                    14. RESULT_ANALYSIS    ← multi-agents
-  4. LITERATURE_COLLECT  ← API reelle   15. RESEARCH_DECISION  ← PIVOT/REFINE
-  5. LITERATURE_SCREEN   [porte]
-  6. KNOWLEDGE_EXTRACT                  Phase G : Redaction de l'article
-                                        16. PAPER_OUTLINE
-Phase C : Synthese des connaissances    17. PAPER_DRAFT
-  7. SYNTHESIS                          18. PEER_REVIEW        ← verif. preuves
-  8. HYPOTHESIS_GEN    ← debat          19. PAPER_REVISION
-
-Phase D : Conception experimentale    Phase H : Finalisation
-  9. EXPERIMENT_DESIGN   [porte]        20. QUALITY_GATE      [porte]
- 10. CODE_GENERATION                    21. KNOWLEDGE_ARCHIVE
- 11. RESOURCE_PLANNING                  22. EXPORT_PUBLISH     ← LaTeX
-                                        23. CITATION_VERIFY    ← verif. pertinence
-```
-
-> **Etapes de validation** (5, 9, 20) : pause pour approbation humaine ou approbation automatique avec `--auto-approve`. En cas de rejet, le pipeline revient en arriere.
-
-> **Mode Co-Pilote** (`--mode co-pilot`) : Collaboration profonde humain-IA aux etapes 7-8 (Atelier d'Idees), etape 9 (Navigateur de References), et etapes 16-17 (Co-Redacteur d'Article). Les autres etapes s'executent automatiquement avec surveillance SmartPause.
-
-> **Boucles de decision** : l'etape 15 peut declencher REFINE (→ etape 13) ou PIVOT (→ etape 8), avec versionnement automatique des artefacts.
-
-<details>
-<summary>📋 Ce que fait chaque phase</summary>
-
-| Phase | Ce qui se passe |
-|-------|-----------------|
-| **A : Cadrage** | Le LLM decompose le sujet en un arbre de problemes structure avec des questions de recherche |
-| **A+ : Materiel** | Detection automatique du GPU (NVIDIA CUDA / Apple MPS / CPU uniquement), avertissement si le materiel local est limite, adaptation de la generation de code en consequence |
-| **B : Litterature** | Recherche multi-sources (OpenAlex → Semantic Scholar → arXiv) de vrais articles, filtrage par pertinence, extraction de fiches de connaissances |
-| **C : Synthese** | Regroupement des resultats, identification des lacunes de recherche, generation d'hypotheses testables via debat multi-agents |
-| **D : Conception** | Conception du plan experimental, generation de Python executable adapte au materiel (niveau GPU → selection de packages), estimation des besoins en ressources |
-| **E : Execution** | Execution des experiences en sandbox, detection de NaN/Inf et bugs d'execution, auto-reparation du code via reparation ciblee par LLM |
-| **F : Analyse** | Analyse multi-agents des resultats ; decision autonome PROCEED / REFINE / PIVOT avec justification |
-| **G : Redaction** | Plan → redaction section par section (5 000-6 500 mots) → relecture (avec verification de coherence methodologie-preuves) → revision avec controle de longueur |
-| **H : Finalisation** | Porte qualite, archivage des connaissances, export LaTeX avec template de conference, verification d'integrite et de pertinence des citations |
-
-</details>
-
----
-
-## ✨ Fonctionnalites cles
-
-| Fonctionnalite | Description |
-|----------------|------------|
-| **📚 Litterature multi-sources** | Vrais articles depuis OpenAlex, Semantic Scholar et arXiv — expansion de requetes, deduplication, disjoncteur avec degradation gracieuse |
-| **🔍 Verification des citations en 4 couches** | Verification arXiv ID → DOI CrossRef/DataCite → correspondance de titre Semantic Scholar → score de pertinence LLM. References hallucinées auto-supprimees. |
-| **🖥️ Execution adaptee au materiel** | Detection automatique du GPU (NVIDIA CUDA / Apple MPS / CPU uniquement) et adaptation de la generation de code, des imports et de l'echelle experimentale |
-| **🦾 OpenCode Beast Mode** | Les experiences complexes sont automatiquement dirigees vers [OpenCode](https://github.com/anomalyco/opencode) — genere des projets multi-fichiers avec architectures personnalisees, boucles d'entrainement et etudes d'ablation. Installation via `researchclaw setup`. |
-| **🧪 Experiences en sandbox** | Code valide par AST, harnais immuable, echec rapide NaN/Inf, reparation auto-guerison, raffinement iteratif (jusqu'a 10 tours), capture de resultats partiels |
-| **📝 Redaction de qualite conference** | Templates NeurIPS/ICML/ICLR, redaction section par section (5 000-6 500 mots), protection anti-fabrication, controle de longueur en revision, application anti-clause de non-responsabilite |
-| **📐 Changement de template** | `neurips_2025`, `iclr_2026`, `icml_2026` — Markdown → LaTeX avec formules, tableaux, figures, references croisees, `\cite{}` |
-| **🛡️ Anti-fabrication** | VerifiedRegistry impose les donnees experimentales de verite terrain dans les articles. Diagnostic automatique des experiences echouees et reparation avant la redaction. Chiffres non verifies assainis. |
-| **🚦 Portes qualite** | 3 portes human-in-the-loop (etapes 5, 9, 20) avec retour en arriere. A passer avec `--auto-approve`. |
-| **🧑‍✈️ Co-Pilote HITL** | 6 modes d'intervention avec politiques par etape. Atelier d'Idees, Navigateur de References, Co-Redacteur d'Article pour une collaboration approfondie. SmartPause, garde-fous de budget, politiques d'escalade et apprentissage d'intervention pour la securite en production. Adaptateurs CLI/WebSocket/MCP. |
-| **💰 Garde-fous de budget** | Surveillance des couts avec alertes a seuils configurables (50%/80%/100%). Le pipeline se met automatiquement en pause lorsque le cout depasse le budget. |
-| **🔐 Reproductibilite** | Checksums SHA256 pour tous les artefacts d'etape. Manifestes immuables pour la verification. Annulation multi-niveaux avec snapshots versionnes. |
-
----
-
-## 🧑‍✈️ Co-Pilote Human-in-the-Loop
-
-**AutoResearchClaw v0.4.0 introduit un systeme complet Human-in-the-Loop (HITL)** qui transforme le pipeline d'un mode purement autonome en un moteur de recherche collaboratif humain-IA. Choisissez votre niveau d'implication :
-
-### Modes d'intervention
-
-| Mode | Commande | Description |
-|------|----------|------------|
-| **Full Auto** | `--auto-approve` | Comportement original — aucune intervention humaine |
-| **Gate Only** | `--mode gate-only` | Pause aux 3 etapes de validation (5, 9, 20) pour approbation |
-| **Checkpoint** | `--mode checkpoint` | Pause a chaque frontiere de phase (8 checkpoints) |
-| **Co-Pilot** | `--mode co-pilot` | Collaboration approfondie aux etapes critiques, auto ailleurs |
-| **Step-by-Step** | `--mode step-by-step` | Pause apres chaque etape — pour decouvrir le pipeline |
-| **Express** | `--mode express` | Revue rapide — seulement les 3 portes les plus critiques |
-
-### Flux de travail Co-Pilote
-
-```
-You: researchclaw run --topic "Quantum noise as neural network regularization" --mode co-pilot
-
-Le pipeline execute les etapes 1-7 automatiquement...
-
-  ┌─────────────────────────────────────────────────────────────┐
-  │  HITL | Stage 08: HYPOTHESIS_GEN                            │
-  │  Revue post-etape                                           │
-  │                                                             │
-  │  Hypotheses mentionnees : 3                                 │
-  │  Score de nouveaute : 0.72 (modere)                         │
-  │                                                             │
-  │  [a] Approuver  [r] Rejeter  [e] Editer  [c] Collaborer    │
-  │  [i] Injecter un guidage  [v] Voir la sortie  [q] Annuler  │
-  └─────────────────────────────────────────────────────────────┘
-
-You: c  (demarrer un chat collaboratif)
-You: L'hypothese 3 est interessante mais il faut Dropout/Label Smoothing comme references
-AI:  Mis a jour — ajout de Dropout, Label Smoothing, MixUp, CutMix comme references...
-You: approve
-
-Le pipeline continue avec votre hypothese affinee...
-```
-
-### Commandes CLI
-
-```bash
-# Demarrer en mode HITL
-researchclaw run --topic "..." --mode co-pilot
-
-# S'attacher a un pipeline en pause (depuis un autre terminal)
-researchclaw attach artifacts/rc-2026-xxx
-
-# Verifier l'etat du pipeline et du HITL
-researchclaw status artifacts/rc-2026-xxx
-
-# Approuver/rejeter depuis un autre terminal ou script
-researchclaw approve artifacts/rc-2026-xxx --message "LGTM"
-researchclaw reject artifacts/rc-2026-xxx --reason "Reference cle manquante"
-
-# Injecter un guidage pour une etape (meme avant son execution)
-researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "Utiliser ResNet-50 comme reference principale"
-```
-
-### Capacites cles
-
-| Fonctionnalite | Description |
-|----------------|------------|
-| **Atelier d'Idees** | Brainstorming, evaluation et affinement collaboratif des hypotheses (etapes 7-8) |
-| **Navigateur de References** | L'IA suggere des references + l'humain ajoute/supprime + checklist de reproductibilite (etape 9) |
-| **Co-Redacteur d'Article** | Redaction section par section avec edition humaine et polissage IA (etapes 16-19) |
-| **SmartPause** | Pause dynamique guidee par la confiance — detecte automatiquement quand une intervention humaine serait utile |
-| **Verification des affirmations** | Verification factuelle en ligne contre la litterature collectee — signale les affirmations non fondees |
-| **Garde-fous de budget** | Surveillance des couts avec alertes a seuils 50%/80%/100% |
-| **Apprentissage d'intervention** | ALHF — apprend de vos habitudes de revision pour optimiser les futures decisions de pause |
-| **Exploration par ramification** | Dupliquer le pipeline pour explorer plusieurs hypotheses, comparer, fusionner la meilleure |
-| **Politique d'escalade** | Notification a niveaux (terminal → Slack → email → arret auto) en cas d'absence |
-| **3 adaptateurs** | CLI (terminal), WebSocket (tableau de bord web), MCP (agents externes) |
-
-### Configuration
-
-```yaml
-# config.arc.yaml
-hitl:
-  enabled: true
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | co-pilot | custom
-  cost_budget_usd: 50.0              # Pause quand le cout depasse le budget (0 = pas de limite)
-
-  notifications:
-    on_pause: true
-    on_quality_drop: true
-    channels: ["terminal"]            # terminal | slack | webhook
-
-  timeouts:
-    default_human_timeout_sec: 86400  # 24h d'attente par defaut
-    auto_proceed_on_timeout: false
-
-  collaboration:
-    max_chat_turns: 50
-    save_chat_history: true
-
-  # Politiques personnalisees par etape (optionnel, pour le mode 'custom')
-  stage_policies:
-    8: { require_approval: true, enable_collaboration: true }
-    9: { require_approval: true, allow_edit_output: true }
-```
-
-### Retrocompatibilite
-
-- **Par defaut : DESACTIVE.** Sans `hitl.enabled: true` ou `--mode`, le pipeline se comporte exactement comme avant.
-- **`--auto-approve` fonctionne toujours.** Il prend le pas sur le mode HITL.
-- **Les 2 699 tests existants passent** avec le code HITL present.
-
----
-
-## 🧠 Integration MetaClaw
-
-**AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = Un pipeline qui apprend de chaque execution.**
-
-MetaClaw ajoute le **transfert de connaissances inter-executions** a AutoResearchClaw. Lorsqu'il est active, le pipeline capture automatiquement les lecons des echecs et avertissements, les convertit en competences reutilisables, et injecte ces competences dans les 23 etapes du pipeline lors des executions suivantes — pour ne jamais repeter les memes erreurs.
-
-### Fonctionnement
-
-```
-Execution N s'execute → echecs/avertissements captures comme Lecons
-                      ↓
-          MetaClaw Lecon → conversion en Competence
-                      ↓
-          Fichiers de competences arc-* stockes dans ~/.metaclaw/skills/
-                      ↓
-Execution N+1 → build_overlay() injecte les competences dans chaque prompt LLM
-                      ↓
-          Le LLM evite les pieges connus → meilleure qualite, moins de tentatives
-```
-
-### Configuration rapide
-
-```bash
-# 1. Installer MetaClaw (si ce n'est pas deja fait)
-pip install metaclaw
-
-# 2. Activer dans votre configuration
-```
-
-```yaml
-# config.arc.yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000"        # Proxy MetaClaw (optionnel)
-  skills_dir: "~/.metaclaw/skills"          # Ou les competences sont stockees
-  fallback_url: "https://api.openai.com/v1" # Repli direct vers le LLM
-  fallback_api_key: ""                      # Cle API pour l'URL de repli
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # Convertir avertissements + erreurs
-    max_skills_per_run: 3
-```
-
-```bash
-# 3. Executez comme d'habitude — MetaClaw fonctionne de maniere transparente
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-```
-
-Apres chaque execution, verifiez `~/.metaclaw/skills/arc-*/SKILL.md` pour voir les competences que votre pipeline a apprises.
-
-### Resultats experimentaux
-
-Dans des experiences controlees A/B (meme sujet, meme LLM, meme configuration) :
-
-| Metrique | Reference | Avec MetaClaw | Amelioration |
-|----------|-----------|---------------|--------------|
-| Taux de relance des etapes | 10.5% | 7.9% | **-24.8%** |
-| Nombre de cycles REFINE | 2.0 | 1.2 | **-40.0%** |
-| Completion des etapes du pipeline | 18/19 | 19/19 | **+5.3%** |
-| Score de robustesse global (composite) | 0.714 | 0.845 | **+18.3%** |
-
-> Le score de robustesse composite est une moyenne ponderee du taux de completion des etapes (40%), de la reduction des tentatives (30%) et de l'efficacite des cycles REFINE (30%).
-
-### Retrocompatibilite
-
-- **Par defaut : DESACTIVE.** Si `metaclaw_bridge` est absent ou `enabled: false`, le pipeline se comporte exactement comme avant.
-- **Aucune nouvelle dependance.** MetaClaw est optionnel — le pipeline de base fonctionne sans.
-- **Les 2 699 tests existants passent** avec le code d'integration present.
-
----
-
-## 🧩 Bibliotheque de competences
-
-AutoResearchClaw supporte desormais le chargement de **competences open-source et personnalisees** pour enrichir votre experience de recherche. Nous livrons egalement **20 competences integrees pre-chargees** (redaction scientifique, recherche documentaire, chimie, biologie, et plus) comme references pretes a l'emploi, offrant un haut degre de flexibilite des l'installation. Desactivez n'importe quelle competence en ajoutant `enabled: false` a son frontmatter.
-
-**Exemples de competences integrees :**
-
-| Categorie | Competence | Description |
-|-----------|------------|------------|
-| **Redaction** | `scientific-writing` | Structure IMRAD, formatage des citations, directives de rapport |
-| **Domaine** | `chemistry-rdkit` | Analyse moleculaire, SMILES, empreintes digitales, decouverte de medicaments |
-| **Experience** | `literature-search` | Revue systematique, methodologie PRISMA |
-
-> Voir les 20 competences avec `researchclaw skills list`.
-
-### Charger vos propres competences
-
-```bash
-# Option 1 : Installer une competence (persiste entre les projets)
-researchclaw skills install /path/to/my-skill/
-
-# Option 2 : Deposer un SKILL.md dans le projet
-mkdir -p .claude/skills/my-custom-skill
-# Puis creer un SKILL.md avec frontmatter YAML (name, description, trigger-keywords, applicable-stages)
-
-# Option 3 : Configurer des repertoires de competences partages dans config.arc.yaml
-# skills:
-#   custom_dirs:
-#     - /path/to/team-shared-skills
-```
-
-### Utilisation des competences
-
-Les competences sont chargees et injectees dans les prompts LLM automatiquement — aucune activation manuelle necessaire. Utilisez le CLI pour inspecter :
-
-```bash
-researchclaw skills list               # Afficher toutes les competences chargees avec leurs sources
-researchclaw skills validate ./my-skill # Verifier le format du SKILL.md
-```
-
-Parcourir les competences communautaires : [K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills) (150+ competences scientifiques couvrant plusieurs disciplines).
-
----
-
-## ⚙️ Reference de configuration
-
-<details>
-<summary>Cliquez pour afficher la reference complete de configuration</summary>
-
-```yaml
-# === Projet ===
-project:
-  name: "my-research"              # Identifiant du projet
-  mode: "docs-first"               # docs-first | semi-auto | full-auto
-
-# === Recherche ===
-research:
-  topic: "..."                     # Sujet de recherche (requis)
-  domains: ["ml", "nlp"]           # Domaines de recherche pour la revue de litterature
-  daily_paper_count: 8             # Nombre cible d'articles par requete de recherche
-  quality_threshold: 4.0           # Score qualite minimum pour les articles
-
-# === Execution ===
-runtime:
-  timezone: "America/New_York"     # Pour les horodatages
-  max_parallel_tasks: 3            # Limite d'experiences concurrentes
-  approval_timeout_hours: 12       # Timeout des etapes de validation
-  retry_limit: 2                   # Nombre de tentatives en cas d'echec d'etape
-
-# === LLM ===
-llm:
-  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
-  base_url: "https://..."          # Point d'acces API (requis pour openai-compatible)
-  api_key_env: "OPENAI_API_KEY"    # Variable d'env pour la cle API (requis pour openai-compatible)
-  api_key: ""                      # Ou cle en dur ici
-  primary_model: "gpt-4o"          # Modele principal
-  fallback_models: ["gpt-4o-mini"] # Chaine de repli
-  s2_api_key: ""                   # Cle API Semantic Scholar (optionnel, limites de debit plus elevees)
-  acp:                             # Utilise uniquement quand provider: "acp"
-    agent: "claude"                # Commande CLI de l'agent ACP (claude, codex, gemini, etc.)
-    cwd: "."                       # Repertoire de travail pour l'agent
-
-# === Experience ===
-experiment:
-  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
-  time_budget_sec: 300             # Temps d'execution max par lancement (defaut : 300s)
-  max_iterations: 10               # Iterations d'optimisation max
-  metric_key: "val_loss"           # Nom de la metrique principale
-  metric_direction: "minimize"     # minimize | maximize
-  sandbox:
-    python_path: ".venv/bin/python"
-    gpu_required: false
-    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
-    max_memory_mb: 4096
-  docker:
-    image: "researchclaw/experiment:latest"
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    auto_install_deps: true        # Detection auto des imports → requirements.txt
-  ssh_remote:
-    host: ""                       # Nom d'hote du serveur GPU
-    gpu_ids: []                    # Identifiants GPU disponibles
-    remote_workdir: "/tmp/researchclaw_experiments"
-  opencode:                          # OpenCode Beast Mode (auto-installe via `researchclaw setup`)
-    enabled: true                    # Interrupteur principal (defaut : true)
-    auto: true                       # Declenchement auto sans confirmation (defaut : true)
-    complexity_threshold: 0.2        # 0.0-1.0 — plus eleve = ne se declenche que pour les experiences complexes
-    model: ""                        # Modele a forcer (vide = utilise llm.primary_model)
-    timeout_sec: 600                 # Duree max en secondes pour la generation OpenCode
-    max_retries: 1                   # Nombre de tentatives en cas d'echec
-    workspace_cleanup: true          # Supprimer l'espace de travail temporaire apres collecte
-  code_agent:                        # CodeAgent v2 — generation de code multi-phases
-    enabled: true                    # Utiliser CodeAgent au lieu de la generation mono-prompt heritee
-    architecture_planning: true      # Generer un plan d'implementation detaille avant le codage
-    sequential_generation: true      # Generer les fichiers un par un selon le DAG de dependances
-    hard_validation: true            # Portes de validation basees sur AST (bloque les ablations identiques, metriques codees en dur)
-    hard_validation_max_repairs: 2   # Tentatives de reparation max en cas d'echec de validation
-    exec_fix_max_iterations: 3       # Tentatives de correction dans la boucle d'execution
-    exec_fix_timeout_sec: 60         # Timeout par tentative de correction d'execution
-  benchmark_agent:                   # BenchmarkAgent — selection automatisee de jeux de donnees et references
-    enabled: true                    # Activer le pipeline de benchmark a 4 agents (Surveyor→Selector→Acquirer→Validator)
-    enable_hf_search: true           # Rechercher dans HuggingFace Datasets
-    enable_web_search: true          # Rechercher des benchmarks sur Google Scholar
-    tier_limit: 2                    # Filtrage par niveau de jeu de donnees (1=petit/cache, 2=moyen, 3=grand)
-    min_benchmarks: 1                # Nombre minimum de jeux de donnees requis
-    min_baselines: 2                 # Nombre minimum de methodes de reference requises
-  figure_agent:                      # FigureAgent — generation de figures academiques
-    enabled: true                    # Activer le pipeline de figures a 5 agents (Planner→CodeGen→Renderer→Critic→Integrator)
-    min_figures: 3                   # Nombre minimum de figures a generer
-    max_figures: 8                   # Nombre maximum de figures
-    max_iterations: 3                # Iterations de raffinement guidees par le Critic
-    dpi: 300                         # Resolution de sortie
-    strict_mode: false               # Echouer le pipeline si la generation de figures echoue
-  repair:                            # Reparation d'experiences anti-fabrication
-    enabled: true                    # Diagnostiquer et reparer automatiquement les experiences echouees
-    max_cycles: 3                    # Boucles de reparation
-    min_completion_rate: 0.5         # >=50% des conditions doivent etre completees pour continuer
-    min_conditions: 2                # Au moins 2 conditions pour une experience valide
-    use_opencode: true               # Acheminer les reparations via OpenCode Beast Mode
-
-# === Recherche Web (Optionnel) ===
-web_search:
-  enabled: true                      # Activer la recherche de litterature augmentee par le web
-  tavily_api_key_env: "TAVILY_API_KEY"  # Variable d'env pour la cle API Tavily (optionnel)
-  enable_scholar: true               # Recherche Google Scholar
-  enable_pdf_extraction: true        # Extraire le texte des PDF
-  max_web_results: 10                # Resultats web max par requete
-
-# === Export ===
-export:
-  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
-  authors: "Anonymous"
-  bib_file: "references"
-
-# === Prompts ===
-prompts:
-  custom_file: ""                  # Chemin vers un YAML de prompts personnalises (vide = defauts)
-
-# === Co-Pilote HITL (NOUVEAU dans v0.4.0) ===
-hitl:
-  enabled: false                     # Mettre a true pour activer le HITL
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
-  cost_budget_usd: 0.0              # Limite de cout en USD (0 = pas de limite)
-  notifications:
-    on_pause: true                   # Notifier quand le pipeline se met en pause
-    on_quality_drop: true            # Notifier en cas de baisse de qualite
-    channels: ["terminal"]           # terminal | slack | webhook
-  timeouts:
-    default_human_timeout_sec: 86400 # Attendre jusqu'a 24h pour une reponse humaine
-    auto_proceed_on_timeout: false   # Si true, approuver automatiquement au timeout
-  collaboration:
-    max_chat_turns: 50               # Max de tours par session de collaboration
-    save_chat_history: true          # Persister les logs de chat
-  stage_policies: {}                 # Surcharges par etape (pour le mode 'custom')
-
-# === Securite ===
-security:
-  hitl_required_stages: [5, 9, 20] # Etapes necessitant une approbation humaine
-  allow_publish_without_approval: false
-  redact_sensitive_logs: true
-
-# === Base de connaissances ===
-knowledge_base:
-  backend: "markdown"              # markdown | obsidian
-  root: "docs/kb"
-
-# === Notifications ===
-notifications:
-  channel: "console"               # console | discord | slack
-  target: ""
-
-# === Pont MetaClaw (Optionnel) ===
-metaclaw_bridge:
-  enabled: false                   # Mettre a true pour activer l'apprentissage inter-executions
-  proxy_url: "http://localhost:30000"  # URL du proxy MetaClaw
-  skills_dir: "~/.metaclaw/skills" # Ou les competences arc-* sont stockees
-  fallback_url: ""                 # Repli direct vers le LLM quand le proxy est indisponible
-  fallback_api_key: ""             # Cle API pour le point d'acces de repli
-  lesson_to_skill:
-    enabled: true                  # Conversion automatique des lecons en competences
-    min_severity: "warning"        # Severite minimum pour la conversion
-    max_skills_per_run: 3          # Max de nouvelles competences par execution
-  prm:                             # Porte qualite Process Reward Model (optionnel)
-    enabled: false                 # Utiliser LLM-as-judge pour noter les sorties d'etape
-    model: "gpt-5.4"              # Modele juge PRM
-    votes: 3                       # Nombre de votes majoritaires
-    gate_stages: [5, 9, 15, 20]   # Etapes auxquelles appliquer les portes PRM
-
-# === Pont OpenClaw ===
-openclaw_bridge:
-  use_cron: false                  # Executions de recherche planifiees
-  use_message: false               # Notifications de progression
-  use_memory: false                # Persistance des connaissances inter-sessions
-  use_sessions_spawn: false        # Lancement de sous-sessions paralleles
-  use_web_fetch: false             # Recherche web en direct
-  use_browser: false               # Collecte d'articles via navigateur
-```
-
-</details>
-
----
-
-## 🙏 Remerciements
-
-Inspire par :
-
-- 🔬 [AI Scientist](https://github.com/SakanaAI/AI-Scientist) (Sakana AI) — Pionnier de la recherche automatisee
-- 🧠 [AutoResearch](https://github.com/karpathy/autoresearch) (Andrej Karpathy) — Automatisation de la recherche de bout en bout
-- 🌐 [FARS](https://analemma.ai/blog/introducing-fars/) (Analemma) — Systeme de recherche entierement automatise
-
----
-
-## 📄 Licence
-
-MIT — voir [LICENSE](../LICENSE) pour les details.
-
----
-
-## 📌 Citation
-
-Si vous trouvez AutoResearchClaw utile, veuillez citer :
-
-```bibtex
-@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
-      title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
-      author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
-      year={2026},
-      eprint={2605.20025},
-      archivePrefix={arXiv},
-      primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2605.20025},
-}
-```
-
-<p align="center">
-  <sub>Construit avec 🦞 par l'equipe AutoResearchClaw</sub>
-</p>
diff --git a/docs/README_JA.md b/docs/README_JA.md
deleted file mode 100644
index f8a3852d..00000000
--- a/docs/README_JA.md
+++ /dev/null
@@ -1,790 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="700" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center"><b>アイデアを話す。論文を手に入れる。自律的、協調的 & 自己進化。</b></h2>
-
-
-
-<p align="center">
-  <b><i><font size="5"><a href="#openclaw-統合">OpenClaw</a> にチャットするだけ：「Xを研究して」→ 完了。</font></i></b>
-</p>
-
-<p align="center">
-  📄 <b>私たちの論文が arXiv で公開されました — ぜひお読みください！</b> <a href="https://arxiv.org/abs/2605.20025"><i>AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration</i></a>
-</p>
-
-<p align="center">
-  <img src="../image/framework_v2.png" width="100%" alt="AutoResearchClaw Framework">
-</p>
-
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2605.20025"><img src="https://img.shields.io/badge/arXiv-2605.20025-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
-  <a href="https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ARC--Bench-yellow" alt="ARC-Bench on Hugging Face"></a>
-  <a href="../LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT License"></a>
-  <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" alt="Python 3.11+"></a>
-  <a href="#テスト"><img src="https://img.shields.io/badge/Tests-2699%20passed-brightgreen?logo=pytest&logoColor=white" alt="2699 Tests Passed"></a>
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/GitHub-AutoResearchClaw-181717?logo=github" alt="GitHub"></a>
-  <a href="#openclaw-統合"><img src="https://img.shields.io/badge/OpenClaw-Compatible-ff4444?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6IiBmaWxsPSJ3aGl0ZSIvPjwvc3ZnPg==" alt="OpenClaw Compatible"></a>
-  <a href="https://discord.gg/u4ksqW5P"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
-</p>
-
-<p align="center">
-  <a href="../README.md">🇺🇸 English</a> ·
-  <a href="README_CN.md">🇨🇳 中文</a> ·
-  <a href="README_JA.md">🇯🇵 日本語</a> ·
-  <a href="README_KO.md">🇰🇷 한국어</a> ·
-  <a href="README_FR.md">🇫🇷 Français</a> ·
-  <a href="README_DE.md">🇩🇪 Deutsch</a> ·
-  <a href="README_ES.md">🇪🇸 Español</a> ·
-  <a href="README_PT.md">🇧🇷 Português</a> ·
-  <a href="README_RU.md">🇷🇺 Русский</a> ·
-  <a href="README_AR.md">🇸🇦 العربية</a>
-</p>
-
-<p align="center">
-  <a href="showcase/SHOWCASE.md">🏆 論文ショーケース</a> · <a href="HITL_GUIDE.md">🧑‍✈️ コパイロットガイド</a> · <a href="integration-guide.md">📖 統合ガイド</a> · <a href="https://discord.gg/u4ksqW5P">💬 Discordコミュニティ</a>
-</p>
-
----
-
-<table>
-<tr>
-<td width="18%">
-<a href="showcase/SHOWCASE.md"><img src="showcase/thumbnails/paper_I_random_matrix-01.png" width="120" alt="Sample Paper"/></a>
-</td>
-<td valign="middle">
-<b>🏆 生成論文ショーケース</b><br><br>
-<b>8つの分野にわたる8本の論文</b> — 数学、統計、生物学、コンピューティング、NLP、RL、ビジョン、ロバスト性 — 完全自律生成、またはHuman-in-the-Loopコパイロットガイダンスによる。<br><br>
-<a href="showcase/SHOWCASE.md"><img src="https://img.shields.io/badge/View_Full_Showcase_→-All_8_Papers-d73a49?style=for-the-badge" alt="View Showcase"></a>
-</td>
-</tr>
-</table>
-
----
-
-> **🧪 テスターを募集しています！** あなた自身の研究アイデアで — どの分野からでも — パイプラインをお試しください。[ご意見をお聞かせください](TESTER_GUIDE.md)。あなたのフィードバックが次のバージョンに直接反映されます。 **[→ Testing Guide](TESTER_GUIDE.md)** | **[→ 中文测试指南](TESTER_GUIDE_CN.md)** | **[→ 日本語テストガイド](TESTER_GUIDE_JA.md)**
-
----
-
-## 🔥 News
-- **[05/19/2026]** **v0.5.0** — **マルチドメイン実験エージェント + ARC-Bench** — 2 つの主要アップデート。**(1) ドメイン特化型実行エージェント：** 実験ステージ（ステージ 10〜13）は、デフォルトの ML サンドボックスを超えて分野ごとの専門エージェントにルーティングされます——**高エネルギー物理**（ColliderAgent：FeynRules → MadGraph5 → Delphes、Magnus クラウド経由）、**生物学**（COBRApy ゲノムスケール代謝モデリング）、**統計学**（シミュレーション研究エージェント）。化学・材料は汎用 Docker エグゼキューターが担当します。パイプラインは研究領域から適切なエグゼキューターを自動選択します。**(2) ARC-Bench：** **55 トピック**のオープンエンド自律研究ベンチマーク。**ML（25）、高エネルギー物理（10）、量子（10）、生物（7）、統計（3）** を対象とし、各トピックにマニフェストと採点ルーブリックが付属します（`experiments/arc_bench/`、さらに [🤗 Hugging Face](https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench) でも公開）。**[→ ドメイン統合ガイド](DOMAIN_INTEGRATION_GUIDE.md)**
-- **[04/01/2026]** **v0.4.0** — **Human-in-the-Loop コパイロットシステム** — AutoResearchClawは完全自律だけではなくなりました。新しいHITLシステムにより、6つの介入モード（`full-auto`、`gate-only`、`checkpoint`、`step-by-step`、`co-pilot`、`custom`）、ステージごとのポリシー、人間とAIの深い協調が追加されます。仮説の共同作成のためのアイデアワークショップ、実験設計レビューのためのベースラインナビゲーター、協調的ドラフト作成のための論文コライター、SmartPause（信頼度駆動の動的介入）、ALHF介入学習、反幻覚クレーム検証、コスト予算ガードレール、並列仮説探索のためのパイプラインブランチ、CLIコマンド（`attach`/`status`/`approve`/`reject`/`guide`）を含みます。**[→ 完全HITLガイド](HITL_GUIDE.md)**
-- **[03/30/2026]** **フレキシブルスキルローディング** — AutoResearchClawは、研究体験をさらに向上させるために、オープンソースおよびカスタムスキルのロードに対応しました。科学的ライティング、実験設計、化学、生物学などをカバーする20のプリロードスキルがすぐに使えるリファレンスとして含まれており、コミュニティ提供の[A-Evolve](https://github.com/A-EVO-Lab/a-evolve)エージェント進化スキルも含まれています。`researchclaw skills install`でインストールするか、`.claude/skills/`に`SKILL.md`を配置してください。[スキルライブラリ](#-スキルライブラリ)を参照。
-- **[03/22/2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) — **クロスプラットフォーム対応 + 安定性大幅向上** — ACP互換AIエージェントバックエンド（Claude Code、Codex CLI、Copilot CLI、Gemini CLI、Kimi CLI）に対応し、OpenClawブリッジ経由でメッセージングプラットフォーム（Discord、Telegram、Lark、WeChat）もサポート。新しいCLIエージェントコード生成バックエンドにより、ステージ10と13を外部CLIエージェントに委任し、予算制御とタイムアウト管理に対応。反データ捏造システム（VerifiedRegistry + 実験診断・修復ループ）、100件以上のバグ修正、モジュラーexecutorリファクタリング、`--resume`自動検出、LLMリトライ強化、コミュニティ報告の修正を含む。
-
-<details>
-<summary>過去のリリース</summary>
-
-- **[03/18/2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) — **OpenCode Beast Mode + Community Contributions** — New "Beast Mode" routes complex code generation to [OpenCode](https://github.com/anomalyco/opencode) with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
-- **[03/17/2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) — **MetaClaw Integration** — AutoResearchClaw now supports [MetaClaw](https://github.com/aiming-lab/MetaClaw) cross-run learning: pipeline failures → structured lessons → reusable skills, injected into all 23 stages. **+18.3%** robustness in controlled experiments. Opt-in (`metaclaw_bridge.enabled: true`), fully backward-compatible. See [Integration Guide](#-metaclaw-integration).
-- **[03/16/2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) — Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
-- **[03/15/2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) — We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.
-
-</details>
-
----
-
-## ⚡ ワンコマンド。ワンペーパー。
-
-```bash
-# 完全自律 — 人間の介入なし
-pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve
-
-# コパイロットモード — 重要な意思決定ポイントでAIと協調
-researchclaw run --topic "Your research idea here" --mode co-pilot
-```
-
-
----
-
-## 🤔 これは何？
-
-**あなたが考える。AutoResearchClawが書く。重要な判断はあなたが導く。**
-
-研究トピックを入力するだけで — OpenAlex、Semantic Scholar、arXivからの実際の文献、ハードウェア対応のサンドボックス実験（GPU/MPS/CPUを自動検出）、統計分析、マルチエージェント査読、NeurIPS/ICML/ICLR対応の学会グレードLaTeXを含む完全な学術論文が得られます。完全自律で実行するか、**コパイロットモード**を使って重要な意思決定ポイントでAIを導きます — 研究方向の選択、実験設計のレビュー、論文の共同執筆が可能です。幻覚された参考文献なし。
-
-<table>
-<tr><td>📄</td><td><code>paper_draft.md</code></td><td>完全な学術論文（序論、関連研究、手法、実験、結果、結論）</td></tr>
-<tr><td>📐</td><td><code>paper.tex</code></td><td>学会対応LaTeX（NeurIPS / ICLR / ICMLテンプレート）</td></tr>
-<tr><td>📚</td><td><code>references.bib</code></td><td>OpenAlex、Semantic Scholar、arXivからの実際のBibTeX参考文献 — 本文中の引用に合わせて自動整理</td></tr>
-<tr><td>🔍</td><td><code>verification_report.json</code></td><td>4層の引用整合性 + 関連性検証（arXiv、CrossRef、DataCite、LLM）</td></tr>
-<tr><td>🧪</td><td><code>experiment runs/</code></td><td>生成されたコード + サンドボックス実行結果 + 構造化JSONメトリクス</td></tr>
-<tr><td>📊</td><td><code>charts/</code></td><td>誤差棒と信頼区間付きの条件比較チャートを自動生成</td></tr>
-<tr><td>📝</td><td><code>reviews.md</code></td><td>手法-証拠の一貫性チェック付きマルチエージェント査読</td></tr>
-<tr><td>🧬</td><td><code>evolution/</code></td><td>各実行から抽出された自己学習の教訓</td></tr>
-<tr><td>📦</td><td><code>deliverables/</code></td><td>すべての最終成果物を1フォルダに集約 — Overleafですぐにコンパイル可能</td></tr>
-</table>
-
-パイプラインは**エンドツーエンドで実行**されます — 完全自律、またはhuman-in-the-loopの協調で。実験が失敗すれば自己修復します。仮説が成り立たなければ方向転換します。引用が偽物なら削除します。あなたが舵を取りたいときは、一時停止して待ちます。
-
-🌍 **どこでも実行可能。** AutoResearchClaw は特定のプラットフォームに縛られません。CLI でスタンドアロン実行、[OpenClaw](https://github.com/openclaw/openclaw) に接続、または ACP 互換の AI エージェント —— 🤖 Claude Code、💻 Codex CLI、🐙 Copilot CLI、♊ Gemini CLI、🌙 Kimi CLI など —— と連携できます。さらに OpenClaw のメッセージブリッジにより、💬 Discord、✈️ Telegram、🐦 Lark（飛書）、💚 WeChat など、チームが普段使っているプラットフォームから研究を開始できます。トピックを入力すれば、論文が出力されます —— どこからでも。
-
----
-
-## 🚀 クイックスタート
-
-```bash
-# 1. クローン & インストール
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-python3 -m venv .venv && source .venv/bin/activate
-pip install -e .
-
-# 2. セットアップ（対話式 — OpenCode Beast Modeのインストール、Docker/LaTeXの確認）
-researchclaw setup
-
-# 3. 設定
-researchclaw init          # 対話式：LLMプロバイダーを選択、config.arc.yamlを作成
-# または手動：cp config.researchclaw.example.yaml config.arc.yaml
-
-# 4. 実行
-export OPENAI_API_KEY="sk-..."
-researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve
-```
-
-出力先 → `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` — コンパイル可能なLaTeX、BibTeX、実験コード、チャート。
-
-<details>
-<summary>📝 最小限の必要設定</summary>
-
-```yaml
-project:
-  name: "my-research"
-
-research:
-  topic: "Your research topic here"
-
-llm:
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-4o"
-  fallback_models: ["gpt-4o-mini"]
-
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python"
-```
-
-</details>
-
----
-
-## 🧠 他と何が違うのか
-
-| 機能 | 仕組み |
-|------|--------|
-| **🧑‍✈️ コパイロットモード** | 6つの介入モード — 完全自律からステップバイステップまで。重要な判断（仮説、ベースライン、論文執筆）でAIを導くか、自由に実行させます。SmartPauseが人間の入力が有益な場面を自動検出。 |
-| **🔄 PIVOT / REFINE ループ** | ステージ15が自律的に判定：PROCEED、REFINE（パラメータ調整）、またはPIVOT（新方向）。成果物は自動バージョン管理。 |
-| **🤖 マルチエージェント討論** | 仮説生成、結果分析、査読のそれぞれで構造化された多視点討論を実施。 |
-| **🧬 自己学習** | 各実行から教訓を抽出（判定根拠、ランタイム警告、メトリクス異常）、30日の時間減衰付き。将来の実行が過去のミスから学習。 |
-| **📚 知識ベース** | 各実行で6カテゴリ（判定、実験、発見、文献、質問、レビュー）にわたる構造化知識ベースを構築。 |
-| **🛡️ Sentinel Watchdog** | バックグラウンド品質モニター：NaN/Inf検出、論文-証拠の一貫性、引用関連性スコアリング、捏造防止ガード。 |
-| **🔍 クレーム検証** | インラインファクトチェック：AI生成テキストからクレームを抽出し、収集した文献と照合。根拠のない引用や捏造された数値をフラグ。 |
-| **🌿 ブランチ探索** | パイプラインをフォークして複数の研究方向を同時に探索し、結果を並べて比較し、最良のパスをマージ。 |
-
----
-
-## 🦞 OpenClaw統合
-
-<table>
-<tr>
-
-**AutoResearchClawは[OpenClaw](https://github.com/openclaw/openclaw)互換サービスです。** OpenClawにインストールして、メッセージ1つで自律研究を開始できます — CLI、Claude Code、その他のAIコーディングアシスタントを使ってスタンドアロンでも利用可能です。
-
-</tr>
-</table>
-
-### 🚀 OpenClawで使う（推奨）
-
-[OpenClaw](https://github.com/openclaw/openclaw)をすでにAIアシスタントとしてお使いの場合：
-
-```
-1️⃣  GitHubリポジトリのURLをOpenClawに共有
-2️⃣  OpenClawがRESEARCHCLAW_AGENTS.mdを自動読み込み → パイプラインを理解
-3️⃣  「Research [あなたのトピック]」と話しかける
-4️⃣  完了 — OpenClawがクローン、インストール、設定、実行、結果の返却まですべて自動実行
-```
-
-**以上です。** OpenClawが`git clone`、`pip install`、設定、パイプライン実行を自動的に処理します。チャットするだけです。
-
-<details>
-<summary>💡 内部で何が起きているか</summary>
-
-1. OpenClawが`RESEARCHCLAW_AGENTS.md`を読み取り → 研究オーケストレーターの役割を学習
-2. OpenClawが`README.md`を読み取り → インストールとパイプライン構造を理解
-3. OpenClawが`config.researchclaw.example.yaml` → `config.yaml`にコピー
-4. LLMのAPIキーを要求（または環境変数を使用）
-5. `pip install -e .` + `researchclaw run --topic "..." --auto-approve`を実行
-6. 論文、LaTeX、実験、引用を返却
-
-</details>
-
-### 🔌 OpenClaw Bridge（上級）
-
-より深い統合のために、AutoResearchClawには6つのオプション機能を備えた**ブリッジアダプターシステム**が含まれています：
-
-```yaml
-# config.arc.yaml
-openclaw_bridge:
-  use_cron: true              # ⏰ スケジュール実行
-  use_message: true           # 💬 進捗通知（Discord/Slack/Telegram）
-  use_memory: true            # 🧠 セッション間の知識永続化
-  use_sessions_spawn: true    # 🔀 並列サブセッションの生成
-  use_web_fetch: true         # 🌐 文献レビュー中のライブWeb検索
-  use_browser: false          # 🖥️ ブラウザベースの論文収集
-```
-
-各フラグは型付きアダプタープロトコルをアクティブにします。OpenClawがこれらの機能を提供する場合、アダプターはコード変更なしにそれらを利用します。詳細は[`integration-guide.md`](integration-guide.md)をご覧ください。
-
-### ACP (Agent Client Protocol)
-
-AutoResearchClawは**任意のACP互換コーディングエージェント**をLLMバックエンドとして使用できます — APIキーは不要です。エージェントは[acpx](https://github.com/openclaw/acpx)を介して通信し、全23パイプラインステージにわたって単一の永続セッションを維持します。
-
-| エージェント | コマンド | 備考 |
-|-------------|---------|------|
-| Claude Code | `claude` | Anthropic |
-| Codex CLI | `codex` | OpenAI |
-| Copilot CLI | `gh` | GitHub |
-| Gemini CLI | `gemini` | Google |
-| OpenCode | `opencode` | SST |
-| Kimi CLI | `kimi` | Moonshot |
-
-```yaml
-# config.yaml — ACP例
-llm:
-  provider: "acp"
-  acp:
-    agent: "claude"   # 任意のACP互換エージェントCLIコマンド
-    cwd: "."          # エージェントの作業ディレクトリ
-  # base_urlやapi_keyは不要 — エージェントが独自の認証を処理します。
-```
-
-```bash
-# そのまま実行 — エージェントは独自の認証情報を使用
-researchclaw run --config config.yaml --topic "Your research idea" --auto-approve
-```
-
-### 🛠️ その他の実行方法
-
-| 方法 | 手順 |
-|------|------|
-| **スタンドアロンCLI** | `researchclaw run --topic "..." --auto-approve`（自律）または `--mode co-pilot`（協調） |
-| **Python API** | `from researchclaw.pipeline import Runner; Runner(config).run()` |
-| **Claude Code** | `RESEARCHCLAW_CLAUDE.md`を読み取り — *「Run research on [トピック]」*と言うだけ |
-| **Copilot CLI** | `researchclaw run --topic "..."` で `llm.acp.agent: "gh"` を使用 |
-| **OpenCode** | `.claude/skills/`を読み取り — 同じ自然言語インターフェース |
-| **任意のAI CLI** | `RESEARCHCLAW_AGENTS.md`をコンテキストとして提供 → エージェントが自動ブートストラップ |
-
----
-
-## 🔬 パイプライン：23ステージ、8フェーズ
-
-```
-フェーズ A: 研究スコーピング          フェーズ E: 実験実行
-  1. TOPIC_INIT                      12. EXPERIMENT_RUN
-  2. PROBLEM_DECOMPOSE               13. ITERATIVE_REFINE  ← 自己修復
-
-フェーズ B: 文献探索                フェーズ F: 分析と判定
-  3. SEARCH_STRATEGY                 14. RESULT_ANALYSIS    ← マルチエージェント
-  4. LITERATURE_COLLECT  ← 実API    15. RESEARCH_DECISION  ← PIVOT/REFINE
-  5. LITERATURE_SCREEN   [ゲート]
-  6. KNOWLEDGE_EXTRACT               フェーズ G: 論文執筆
-                                     16. PAPER_OUTLINE
-フェーズ C: 知識統合                  17. PAPER_DRAFT
-  7. SYNTHESIS                       18. PEER_REVIEW        ← 証拠チェック
-  8. HYPOTHESIS_GEN    ← 討論        19. PAPER_REVISION
-
-フェーズ D: 実験設計               フェーズ H: 最終処理
-  9. EXPERIMENT_DESIGN   [ゲート]     20. QUALITY_GATE      [ゲート]
- 10. CODE_GENERATION                 21. KNOWLEDGE_ARCHIVE
- 11. RESOURCE_PLANNING               22. EXPORT_PUBLISH     ← LaTeX
-                                     23. CITATION_VERIFY    ← 関連性チェック
-```
-
-> **ゲートステージ**（5, 9, 20）は人間の承認を待つか、`--auto-approve`で自動承認されます。却下時にはパイプラインがロールバックします。
-
-> **コパイロットモード**（`--mode co-pilot`）：ステージ7-8（アイデアワークショップ）、ステージ9（ベースラインナビゲーター）、ステージ16-17（論文コライター）で人間とAIの深い協調を実現。その他のステージはSmartPauseモニタリング下で自動実行。
-
-> **判定ループ**: ステージ15はREFINE（→ ステージ13）またはPIVOT（→ ステージ8）をトリガーでき、成果物のバージョン管理が自動的に行われます。
-
-<details>
-<summary>📋 各フェーズの詳細</summary>
-
-| フェーズ | 処理内容 |
-|---------|----------|
-| **A: スコーピング** | LLMがトピックを研究質問を含む構造化された問題ツリーに分解 |
-| **A+: ハードウェア** | GPU（NVIDIA CUDA / Apple MPS / CPUのみ）を自動検出、ローカルハードウェアが限定的な場合は警告、コード生成を適応 |
-| **B: 文献** | マルチソース検索（OpenAlex → Semantic Scholar → arXiv）で実際の論文を取得、関連性でスクリーニング、知識カードを抽出 |
-| **C: 統合** | 発見事項をクラスタリング、研究ギャップを特定、マルチエージェント討論で検証可能な仮説を生成 |
-| **D: 設計** | 実験計画を設計、ハードウェア対応の実行可能Python（GPUティア→パッケージ選択）を生成、リソース需要を推定 |
-| **E: 実行** | サンドボックスで実験を実行、NaN/Infとランタイムバグを検出、LLMによる的確な修復で自己修復 |
-| **F: 分析** | マルチエージェントによる結果分析；根拠付きの自律的PROCEED / REFINE / PIVOT判定 |
-| **G: 執筆** | アウトライン → セクション別ドラフト（5,000〜6,500語）→ 査読（手法-証拠の一貫性付き）→ 文字数ガード付き改訂 |
-| **H: 最終処理** | 品質ゲート、知識アーカイブ、学会テンプレート付きLaTeXエクスポート、引用の整合性 + 関連性検証 |
-
-</details>
-
----
-
-## ✨ 主な機能
-
-| 機能 | 説明 |
-|------|------|
-| **📚 マルチソース文献** | OpenAlex、Semantic Scholar、arXivからの実際の論文 — クエリ拡張、重複排除、三状態サーキットブレーカーとグレースフルデグラデーション |
-| **🔍 4層引用検証** | arXiv IDチェック → CrossRef/DataCite DOI → Semantic Scholarタイトルマッチ → LLM関連性スコアリング。幻覚された参考文献は自動削除。 |
-| **🖥️ ハードウェア対応実行** | GPU（NVIDIA CUDA / Apple MPS / CPUのみ）を自動検出し、コード生成、インポート、実験スケールを適応 |
-| **🦾 OpenCode Beast Mode** | 複雑な実験を自動的に[OpenCode](https://github.com/anomalyco/opencode)にルーティング — カスタムアーキテクチャ、トレーニングループ、アブレーション研究を含むマルチファイルプロジェクトを生成。`researchclaw setup`でインストール。 |
-| **🧪 サンドボックス実験** | AST検証済みコード、不変ハーネス、NaN/Inf早期停止、自己修復、反復的改良（最大10ラウンド）、部分結果の保持 |
-| **📝 学会グレード執筆** | NeurIPS/ICML/ICLRテンプレート、セクション別ドラフト（5,000〜6,500語）、捏造防止ガード、改訂文字数ガード、免責事項抑制 |
-| **📐 テンプレート切り替え** | `neurips_2025`、`iclr_2026`、`icml_2026` — Markdown → LaTeX（数式、表、図、相互参照、`\cite{}`対応） |
-| **🛡️ 捏造防止** | VerifiedRegistryが論文中で検証済みの実験データの使用を強制。失敗した実験を自動診断し、執筆前に修復。未検証の数値はサニタイズ。 |
-| **🚦 品質ゲート** | 3つのHuman-in-the-loopゲート（ステージ5, 9, 20）、ロールバック対応。`--auto-approve`でスキップ。 |
-| **🧑‍✈️ HITLコパイロット** | 6つの介入モードとステージごとのポリシー。アイデアワークショップ、ベースラインナビゲーター、論文コライターで深い協調を実現。SmartPause、コストガードレール、エスカレーションポリシー、介入学習でプロダクション環境の安全性を確保。CLI/WebSocket/MCPアダプター。 |
-| **💰 コストガードレール** | 設定可能な閾値アラート（50%/80%/100%）付きの予算モニタリング。コストが予算を超えるとパイプラインが自動一時停止。 |
-| **🔐 再現性** | 全ステージ成果物のSHA256チェックサム。検証のための不変マニフェスト。バージョン付きスナップショットによるマルチレベルのアンドゥ。 |
-
----
-
-## 🧑‍✈️ Human-in-the-Loop コパイロット
-
-**AutoResearchClaw v0.4.0は完全なHuman-in-the-Loop（HITL）システムを導入し**、パイプラインを純粋な自律実行から人間とAIの協調的研究エンジンに変革します。関与のレベルを選択してください：
-
-### 介入モード
-
-| モード | コマンド | 機能 |
-|--------|---------|------|
-| **完全自動** | `--auto-approve` | 従来の動作 — 人間の介入なし |
-| **ゲートのみ** | `--mode gate-only` | 3つのゲートステージ（5, 9, 20）で承認のため一時停止 |
-| **チェックポイント** | `--mode checkpoint` | 各フェーズ境界で一時停止（8つのチェックポイント） |
-| **コパイロット** | `--mode co-pilot` | 重要なステージで深い協調、その他は自動 |
-| **ステップバイステップ** | `--mode step-by-step` | 各ステージ後に一時停止 — パイプラインを学習 |
-| **エクスプレス** | `--mode express` | クイックレビュー — 最も重要な3つのゲートのみ |
-
-### コパイロットワークフロー
-
-```
-You: researchclaw run --topic "量子ノイズによるニューラルネットワーク正則化" --mode co-pilot
-
-パイプラインがステージ1-7を自動実行...
-
-  ┌─────────────────────────────────────────────────────────────┐
-  │  HITL | Stage 08: HYPOTHESIS_GEN                            │
-  │  Post-stage review                                          │
-  │                                                             │
-  │  Hypotheses mentioned: 3                                    │
-  │  Novelty score: 0.72 (moderate)                             │
-  │                                                             │
-  │  [a] Approve  [r] Reject  [e] Edit  [c] Collaborate         │
-  │  [i] Inject guidance  [v] View output  [q] Abort            │
-  └─────────────────────────────────────────────────────────────┘
-
-You: c  (協調チャットを開始)
-You: 仮説3は興味深いが、Dropout/Label Smoothingをベースラインに追加すべき
-AI:  更新しました — Dropout、Label Smoothing、MixUp、CutMixをベースラインに追加...
-You: approve
-
-あなたの改良した仮説でパイプラインが続行...
-```
-
-### CLIコマンド
-
-```bash
-# HITLモードで開始
-researchclaw run --topic "..." --mode co-pilot
-
-# 一時停止中のパイプラインにアタッチ（別のターミナルから）
-researchclaw attach artifacts/rc-2026-xxx
-
-# パイプラインとHITLのステータスを確認
-researchclaw status artifacts/rc-2026-xxx
-
-# 別のターミナルやスクリプトから承認/却下
-researchclaw approve artifacts/rc-2026-xxx --message "LGTM"
-researchclaw reject artifacts/rc-2026-xxx --reason "重要なベースラインが不足"
-
-# ステージへのガイダンスを注入（実行前でも可能）
-researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "ResNet-50をプライマリベースラインとして使用"
-```
-
-### 主要機能
-
-| 機能 | 説明 |
-|------|------|
-| **アイデアワークショップ** | 仮説の共同ブレインストーミング、評価、改良（ステージ7-8） |
-| **ベースラインナビゲーター** | AIがベースラインを提案 + 人間が追加/削除 + 再現性チェックリスト（ステージ9） |
-| **論文コライター** | セクション別ドラフトで人間の編集とAIのポリッシュ（ステージ16-19） |
-| **SmartPause** | 信頼度駆動の動的一時停止 — 人間の入力が有益な場面を自動検出 |
-| **クレーム検証** | 収集した文献に対するインラインファクトチェック — 根拠のないクレームをフラグ |
-| **コストガードレール** | 50%/80%/100%閾値アラート付き予算モニタリング |
-| **介入学習** | ALHF — レビューパターンから学習して将来の一時停止判断を最適化 |
-| **ブランチ探索** | パイプラインをフォークして複数の仮説を探索、比較、最良をマージ |
-| **エスカレーションポリシー** | 無人時の段階的通知（ターミナル → Slack → メール → 自動停止） |
-| **3つのアダプター** | CLI（ターミナル）、WebSocket（Webダッシュボード）、MCP（外部エージェント） |
-
-### 設定
-
-```yaml
-# config.arc.yaml
-hitl:
-  enabled: true
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | co-pilot | custom
-  cost_budget_usd: 50.0              # コストが予算を超えたら一時停止（0 = 制限なし）
-
-  notifications:
-    on_pause: true
-    on_quality_drop: true
-    channels: ["terminal"]            # terminal | slack | webhook
-
-  timeouts:
-    default_human_timeout_sec: 86400  # デフォルト24時間待機
-    auto_proceed_on_timeout: false
-
-  collaboration:
-    max_chat_turns: 50
-    save_chat_history: true
-
-  # ステージごとのカスタムポリシー（オプション、'custom'モード用）
-  stage_policies:
-    8: { require_approval: true, enable_collaboration: true }
-    9: { require_approval: true, allow_edit_output: true }
-```
-
-### 後方互換性
-
-- **デフォルト: オフ。** `hitl.enabled: true`または`--mode`なしでは、パイプラインは以前と全く同じように動作します。
-- **`--auto-approve`は引き続き動作。** HITLモードをオーバーライドします。
-- **既存の2,699テストすべてがパス**（HITLコードを含む）。
-
----
-
-## 🧠 MetaClaw統合
-
-**AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = すべての実行から学習するパイプライン。**
-
-MetaClawはAutoResearchClawに**クロスラン知識転移**を追加します。有効にすると、パイプラインは失敗や警告から自動的に教訓を抽出し、再利用可能なスキルに変換し、後続の実行で全23ステージに注入します — 同じ過ちを二度と繰り返しません。
-
-### 仕組み
-
-```
-Run N executes → failures/warnings captured as Lessons
-                      ↓
-          MetaClaw Lesson → Skill conversion
-                      ↓
-          arc-* Skill files stored in ~/.metaclaw/skills/
-                      ↓
-Run N+1 → build_overlay() injects skills into every LLM prompt
-                      ↓
-          LLM avoids known pitfalls → higher quality, fewer retries
-```
-
-### クイックセットアップ
-
-```bash
-# 1. MetaClawをインストール（未インストールの場合）
-pip install metaclaw
-
-# 2. 設定で有効化
-```
-
-```yaml
-# config.arc.yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000"        # MetaClawプロキシ（オプション）
-  skills_dir: "~/.metaclaw/skills"          # スキルの保存場所
-  fallback_url: "https://api.openai.com/v1" # 直接LLMフォールバック
-  fallback_api_key: ""                      # フォールバックURLのAPIキー
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # warning + errorを変換
-    max_skills_per_run: 3
-```
-
-```bash
-# 3. 通常通り実行 — MetaClawは透過的に動作
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-```
-
-各実行後、`~/.metaclaw/skills/arc-*/SKILL.md`を確認して、パイプラインが学習したスキルを確認できます。
-
-### 実験結果
-
-対照A/B実験（同じトピック、同じLLM、同じ設定）：
-
-| メトリクス | ベースライン | MetaClaw使用時 | 改善 |
-|-----------|------------|---------------|------|
-| ステージリトライ率 | 10.5% | 7.9% | **-24.8%** |
-| Refineサイクル数 | 2.0 | 1.2 | **-40.0%** |
-| パイプラインステージ完了率 | 18/19 | 19/19 | **+5.3%** |
-| 総合ロバスト性スコア（複合） | 0.714 | 0.845 | **+18.3%** |
-
-> 複合ロバスト性スコアは、ステージ完了率（40%）、リトライ削減（30%）、Refineサイクル効率（30%）の加重平均です。
-
-### 後方互換性
-
-- **デフォルト: オフ。** `metaclaw_bridge`が存在しないか`enabled: false`の場合、パイプラインは以前と全く同じように動作します。
-- **新しい依存関係なし。** MetaClawはオプションです — コアパイプラインはMetaClawなしで動作します。
-- **既存の2,699テストすべてがパス**（統合コードを含む）。
-
----
-
-## 🧩 スキルライブラリ
-
-AutoResearchClawは、研究体験をさらに向上させるために**オープンソースおよびカスタムスキル**のロードに対応しました。また、**20のプリロード組み込みスキル**（科学的ライティング、文献検索、化学、生物学など）をすぐに使えるリファレンスとして搭載しており、高い柔軟性を提供します。スキルのフロントマターに`enabled: false`を追加することで無効化できます。
-
-**組み込みスキルの例：**
-
-| カテゴリ | スキル | 説明 |
-|----------|--------|------|
-| **ライティング** | `scientific-writing` | IMRAD構造、引用フォーマット、報告ガイドライン |
-| **ドメイン** | `chemistry-rdkit` | 分子分析、SMILES、フィンガープリント、創薬 |
-| **実験** | `literature-search` | 体系的レビュー、PRISMAメソドロジー |
-
-> 全20スキルは`researchclaw skills list`で確認できます。
-
-### カスタムスキルのロード
-
-```bash
-# オプション1: スキルをインストール（プロジェクト間で永続化）
-researchclaw skills install /path/to/my-skill/
-
-# オプション2: プロジェクトにSKILL.mdを配置
-mkdir -p .claude/skills/my-custom-skill
-# YAMLフロントマター（name, description, trigger-keywords, applicable-stages）付きのSKILL.mdを作成
-
-# オプション3: config.arc.yamlで共有スキルディレクトリを設定
-# skills:
-#   custom_dirs:
-#     - /path/to/team-shared-skills
-```
-
-### スキルの使用
-
-スキルは自動的にロードされLLMプロンプトに注入されます — 手動でのアクティベーションは不要です。CLIで確認：
-
-```bash
-researchclaw skills list               # ロード済みスキルをソース付きで表示
-researchclaw skills validate ./my-skill # SKILL.mdのフォーマットをチェック
-```
-
-コミュニティスキルを閲覧: [K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills)（150以上の科学スキル、複数の分野にわたる）。
-
----
-
-## ⚙️ 設定リファレンス
-
-<details>
-<summary>クリックして設定リファレンスの全体を展開</summary>
-
-```yaml
-# === プロジェクト ===
-project:
-  name: "my-research"              # プロジェクト識別子
-  mode: "docs-first"               # docs-first | semi-auto | full-auto
-
-# === 研究 ===
-research:
-  topic: "..."                     # 研究トピック（必須）
-  domains: ["ml", "nlp"]           # 文献検索の研究ドメイン
-  daily_paper_count: 8             # 検索クエリあたりの目標論文数
-  quality_threshold: 4.0           # 論文の最小品質スコア
-
-# === ランタイム ===
-runtime:
-  timezone: "America/New_York"     # タイムスタンプ用
-  max_parallel_tasks: 3            # 同時実験数の上限
-  approval_timeout_hours: 12       # ゲートステージのタイムアウト
-  retry_limit: 2                   # ステージ失敗時のリトライ回数
-
-# === LLM ===
-llm:
-  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
-  base_url: "https://..."          # APIエンドポイント（openai-compatible必須）
-  api_key_env: "OPENAI_API_KEY"    # APIキーの環境変数（openai-compatible必須）
-  api_key: ""                      # またはここにキーを直接記入
-  primary_model: "gpt-4o"          # プライマリモデル
-  fallback_models: ["gpt-4o-mini"] # フォールバックチェーン
-  s2_api_key: ""                   # Semantic Scholar APIキー（オプション、レート制限緩和）
-  acp:                             # provider: "acp" の場合のみ使用
-    agent: "claude"                # ACP Agent CLIコマンド（claude, codex, gemini等）
-    cwd: "."                       # エージェントの作業ディレクトリ
-
-# === 実験 ===
-experiment:
-  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
-  time_budget_sec: 300             # 実行あたりの最大実行時間（デフォルト: 300秒）
-  max_iterations: 10               # 最大最適化反復回数
-  metric_key: "val_loss"           # プライマリメトリクス名
-  metric_direction: "minimize"     # minimize | maximize
-  sandbox:
-    python_path: ".venv/bin/python"
-    gpu_required: false
-    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
-    max_memory_mb: 4096
-  docker:
-    image: "researchclaw/experiment:latest"
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    auto_install_deps: true        # importを自動検出 → requirements.txt
-  ssh_remote:
-    host: ""                       # GPUサーバーのホスト名
-    gpu_ids: []                    # 利用可能なGPU ID
-    remote_workdir: "/tmp/researchclaw_experiments"
-  opencode:                          # OpenCode Beast Mode（`researchclaw setup`で自動インストール）
-    enabled: true                    # マスタースイッチ（デフォルト: true）
-    auto: true                       # 確認なしで自動トリガー（デフォルト: true）
-    complexity_threshold: 0.2        # 0.0-1.0 — 高い = 複雑な実験のみトリガー
-    model: ""                        # モデルのオーバーライド（空 = llm.primary_modelを使用）
-    timeout_sec: 600                 # OpenCode生成の最大秒数
-    max_retries: 1                   # 失敗時のリトライ回数
-    workspace_cleanup: true          # 収集後に一時ワークスペースを削除
-  code_agent:                        # CodeAgent v2 — 多段階コード生成
-    enabled: true                    # レガシー単一プロンプトの代わりにCodeAgentを使用
-    architecture_planning: true      # コーディング前に詳細な実装設計図を生成
-    sequential_generation: true      # 依存関係DAGに従いファイルを1つずつ生成
-    hard_validation: true            # ASTベースのバリデーションゲート（同一アブレーション、ハードコードメトリクスをブロック）
-    hard_validation_max_repairs: 2   # バリデーション失敗時の最大修復試行回数
-    exec_fix_max_iterations: 3       # 実行ループ内修正の試行回数
-    exec_fix_timeout_sec: 60         # 実行修正1回あたりのタイムアウト
-  benchmark_agent:                   # BenchmarkAgent — 自動データセット＆ベースライン選択
-    enabled: true                    # 4エージェントベンチマークパイプラインを有効化（Surveyor→Selector→Acquirer→Validator）
-    enable_hf_search: true           # HuggingFace Datasetsを検索
-    enable_web_search: true          # Google Scholarでベンチマークを検索
-    tier_limit: 2                    # データセットティアフィルタリング（1=小/キャッシュ, 2=中, 3=大）
-    min_benchmarks: 1                # 必要最小データセット数
-    min_baselines: 2                 # 必要最小ベースライン手法数
-  figure_agent:                      # FigureAgent — 学術図表生成
-    enabled: true                    # 5エージェント図表パイプラインを有効化（Planner→CodeGen→Renderer→Critic→Integrator）
-    min_figures: 3                   # 生成する最小図表数
-    max_figures: 8                   # 最大図表数
-    max_iterations: 3                # Critic駆動の改良イテレーション数
-    dpi: 300                         # 出力解像度
-    strict_mode: false               # 図表生成失敗時にパイプラインを停止するか
-  repair:                            # 捏造防止の実験修復
-    enabled: true                    # 失敗した実験を自動診断・修復
-    max_cycles: 3                    # 修復リトライループ数
-    min_completion_rate: 0.5         # 続行するには50%以上の条件が完了する必要あり
-    min_conditions: 2                # 有効な実験に最低2条件が必要
-    use_opencode: true               # 修復をOpenCode Beast Mode経由でルーティング
-
-# === Web検索（オプション）===
-web_search:
-  enabled: true                      # Web拡張文献検索を有効化
-  tavily_api_key_env: "TAVILY_API_KEY"  # Tavily APIキーの環境変数（オプション）
-  enable_scholar: true               # Google Scholar検索
-  enable_pdf_extraction: true        # PDFからテキストを抽出
-  max_web_results: 10                # クエリあたりの最大Web検索結果数
-
-# === エクスポート ===
-export:
-  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
-  authors: "Anonymous"
-  bib_file: "references"
-
-# === プロンプト ===
-prompts:
-  custom_file: ""                  # カスタムプロンプトYAMLのパス（空 = デフォルト）
-
-# === HITL コパイロット（v0.4.0新機能）===
-hitl:
-  enabled: false                     # trueに設定してHITLを有効化
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
-  cost_budget_usd: 0.0              # USD単位のコスト制限（0 = 制限なし）
-  notifications:
-    on_pause: true                   # パイプライン一時停止時に通知
-    on_quality_drop: true            # 品質問題時に通知
-    channels: ["terminal"]           # terminal | slack | webhook
-  timeouts:
-    default_human_timeout_sec: 86400 # 人間の入力を最大24時間待機
-    auto_proceed_on_timeout: false   # trueの場合、タイムアウト時に自動承認
-  collaboration:
-    max_chat_turns: 50               # 協調セッションあたりの最大ターン数
-    save_chat_history: true          # チャットログを永続化
-  stage_policies: {}                 # ステージごとのオーバーライド（'custom'モード用）
-
-# === セキュリティ ===
-security:
-  hitl_required_stages: [5, 9, 20] # 人間の承認が必要なステージ
-  allow_publish_without_approval: false
-  redact_sensitive_logs: true
-
-# === 知識ベース ===
-knowledge_base:
-  backend: "markdown"              # markdown | obsidian
-  root: "docs/kb"
-
-# === 通知 ===
-notifications:
-  channel: "console"               # console | discord | slack
-  target: ""
-
-# === MetaClaw Bridge（オプション）===
-metaclaw_bridge:
-  enabled: false                   # trueに設定してクロスラン学習を有効化
-  proxy_url: "http://localhost:30000"  # MetaClawプロキシURL
-  skills_dir: "~/.metaclaw/skills" # arc-*スキルの保存場所
-  fallback_url: ""                 # プロキシがダウン時の直接LLMフォールバック
-  fallback_api_key: ""             # フォールバックエンドポイントのAPIキー
-  lesson_to_skill:
-    enabled: true                  # 教訓をスキルに自動変換
-    min_severity: "warning"        # 変換する最小重大度
-    max_skills_per_run: 3          # パイプライン実行あたりの最大新規スキル数
-  prm:                             # プロセス報酬モデル品質ゲート（オプション）
-    enabled: false                 # LLM-as-judgeでステージ出力をスコアリング
-    model: "gpt-5.4"              # PRMジャッジモデル
-    votes: 3                       # 多数決投票数
-    gate_stages: [5, 9, 15, 20]   # PRMゲートを適用するステージ
-
-# === OpenClaw Bridge ===
-openclaw_bridge:
-  use_cron: false                  # スケジュール研究実行
-  use_message: false               # 進捗通知
-  use_memory: false                # セッション間の知識永続化
-  use_sessions_spawn: false        # 並列サブセッションの生成
-  use_web_fetch: false             # ライブWeb検索
-  use_browser: false               # ブラウザベースの論文収集
-```
-
-</details>
-
----
-
-## 🙏 謝辞
-
-以下のプロジェクトに着想を得ています：
-
-- 🔬 [AI Scientist](https://github.com/SakanaAI/AI-Scientist) (Sakana AI) — 自動研究のパイオニア
-- 🧠 [AutoResearch](https://github.com/karpathy/autoresearch) (Andrej Karpathy) — エンドツーエンドの研究自動化
-- 🌐 [FARS](https://analemma.ai/blog/introducing-fars/) (Analemma) — 完全自動研究システム
-
----
-
-## 📄 ライセンス
-
-MIT — 詳細は[LICENSE](../LICENSE)をご覧ください。
-
----
-
-## 📌 引用
-
-AutoResearchClawが役に立った場合は、以下を引用してください：
-
-```bibtex
-@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
-      title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
-      author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
-      year={2026},
-      eprint={2605.20025},
-      archivePrefix={arXiv},
-      primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2605.20025},
-}
-```
-
-<p align="center">
-  <sub>Built with 🦞 by the AutoResearchClaw team</sub>
-</p>
diff --git a/docs/README_KO.md b/docs/README_KO.md
deleted file mode 100644
index 42cab85f..00000000
--- a/docs/README_KO.md
+++ /dev/null
@@ -1,754 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="700" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center"><b>아이디어를 말하다. 논문을 받다. 자율적, 협력적 & 자기 진화.</b></h2>
-
-
-
-<p align="center">
-  <b><i><font size="5"><a href="#openclaw-통합">OpenClaw</a>에 채팅하세요: "X 연구해줘" → 완료.</font></i></b>
-</p>
-
-<p align="center">
-  📄 <b>저희 논문이 arXiv에 공개되었습니다 — 꼭 읽어보세요!</b> <a href="https://arxiv.org/abs/2605.20025"><i>AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration</i></a>
-</p>
-
-<p align="center">
-  <img src="../image/framework_v2.png" width="100%" alt="AutoResearchClaw Framework">
-</p>
-
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2605.20025"><img src="https://img.shields.io/badge/arXiv-2605.20025-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
-  <a href="https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ARC--Bench-yellow" alt="ARC-Bench on Hugging Face"></a>
-  <a href="../LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT License"></a>
-  <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" alt="Python 3.11+"></a>
-  <a href="#테스트"><img src="https://img.shields.io/badge/Tests-2699%20passed-brightgreen?logo=pytest&logoColor=white" alt="2699 Tests Passed"></a>
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/GitHub-AutoResearchClaw-181717?logo=github" alt="GitHub"></a>
-  <a href="#openclaw-통합"><img src="https://img.shields.io/badge/OpenClaw-Compatible-ff4444?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6IiBmaWxsPSJ3aGl0ZSIvPjwvc3ZnPg==" alt="OpenClaw Compatible"></a>
-  <a href="https://discord.gg/u4ksqW5P"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
-</p>
-
-<p align="center">
-  <a href="../README.md">🇺🇸 English</a> ·
-  <a href="README_CN.md">🇨🇳 中文</a> ·
-  <a href="README_JA.md">🇯🇵 日本語</a> ·
-  <a href="README_KO.md">🇰🇷 한국어</a> ·
-  <a href="README_FR.md">🇫🇷 Français</a> ·
-  <a href="README_DE.md">🇩🇪 Deutsch</a> ·
-  <a href="README_ES.md">🇪🇸 Español</a> ·
-  <a href="README_PT.md">🇧🇷 Português</a> ·
-  <a href="README_RU.md">🇷🇺 Русский</a> ·
-  <a href="README_AR.md">🇸🇦 العربية</a>
-</p>
-
-<p align="center">
-  <a href="showcase/SHOWCASE.md">🏆 논문 쇼케이스</a> · <a href="HITL_GUIDE.md">🧑‍✈️ 코파일럿 가이드</a> · <a href="integration-guide.md">📖 통합 가이드</a> · <a href="https://discord.gg/u4ksqW5P">💬 Discord 커뮤니티</a>
-</p>
-
----
-
-<table>
-<tr>
-<td width="18%">
-<a href="showcase/SHOWCASE.md"><img src="showcase/thumbnails/paper_I_random_matrix-01.png" width="120" alt="Sample Paper"/></a>
-</td>
-<td valign="middle">
-<b>🏆 생성된 논문 쇼케이스</b><br><br>
-<b>8개 분야에 걸친 8편의 논문</b> — 수학, 통계, 생물학, 컴퓨팅, NLP, RL, 비전, 견고성 — 완전 자율 생성 또는 Human-in-the-Loop 코파일럿 가이던스 활용.<br><br>
-<a href="showcase/SHOWCASE.md"><img src="https://img.shields.io/badge/View_Full_Showcase_→-All_8_Papers-d73a49?style=for-the-badge" alt="View Showcase"></a>
-</td>
-</tr>
-</table>
-
----
-
-> **🧪 테스터를 모집합니다!** 여러분의 연구 아이디어로 — 어떤 분야든 — 파이프라인을 시험해 보시고 [의견을 들려주세요](TESTER_GUIDE.md). 여러분의 피드백이 다음 버전에 직접 반영됩니다. **[→ Testing Guide](TESTER_GUIDE.md)** | **[→ 中文测试指南](TESTER_GUIDE_CN.md)** | **[→ 日本語テストガイド](TESTER_GUIDE_JA.md)**
-
----
-
-## 🔥 News
-- **[05/19/2026]** **v0.5.0** — **멀티 도메인 실험 에이전트 + ARC-Bench** — 두 가지 주요 업데이트. **(1) 도메인 특화 실행 에이전트:** 실험 단계(10~13단계)가 기본 ML 샌드박스를 넘어 분야별 전문 에이전트로 라우팅됩니다 — **고에너지 물리**(ColliderAgent: FeynRules → MadGraph5 → Delphes, Magnus 클라우드 경유), **생물학**(COBRApy 게놈 규모 대사 모델링), **통계학**(시뮬레이션 연구 에이전트). 화학/재료는 범용 Docker 실행기가 담당합니다. 파이프라인은 연구 도메인에 따라 적절한 실행기를 자동 선택합니다. **(2) ARC-Bench:** **55개 주제**의 개방형 자율 연구 벤치마크로 **ML(25), 고에너지 물리(10), 양자(10), 생물(7), 통계(3)**를 포괄하며, 각 주제마다 매니페스트와 채점 루브릭이 포함됩니다 (`experiments/arc_bench/`, 그리고 [🤗 Hugging Face](https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench)에서도 제공). **[→ 도메인 통합 가이드](DOMAIN_INTEGRATION_GUIDE.md)**
-- **[04/01/2026]** **v0.4.0** — **Human-in-the-Loop 코파일럿 시스템** — AutoResearchClaw는 더 이상 순수 자율 시스템이 아닙니다. 새로운 HITL 시스템은 6가지 개입 모드(`full-auto`, `gate-only`, `checkpoint`, `step-by-step`, `co-pilot`, `custom`), 단계별 정책, 깊은 인간-AI 협업을 추가합니다. 포함 사항: 가설 공동 창작을 위한 아이디어 워크숍, 실험 설계 검토를 위한 베이스라인 내비게이터, 협력적 작성을 위한 논문 코라이터, SmartPause(신뢰도 기반 동적 개입), ALHF 개입 학습, 반환각 클레임 검증, 비용 예산 가드레일, 병렬 가설 탐색을 위한 파이프라인 분기, CLI 명령어(`attach`/`status`/`approve`/`reject`/`guide`). **[→ 전체 HITL 가이드](HITL_GUIDE.md)**
-- **[03/30/2026]** **유연한 스킬 로딩** — AutoResearchClaw는 이제 모든 분야의 오픈소스 및 커스텀 스킬을 로딩하여 연구 경험을 더욱 향상시킬 수 있습니다. 과학적 글쓰기, 실험 설계, 화학, 생물학 등을 포괄하는 20개의 사전 로드된 스킬이 즉시 사용 가능한 참고자료로 포함되어 있으며, 커뮤니티가 기여한 [A-Evolve](https://github.com/A-EVO-Lab/a-evolve) 에이전트 진화 스킬도 포함됩니다. `researchclaw skills install`로 직접 로드하거나 `.claude/skills/`에 `SKILL.md`를 추가하세요. [스킬 라이브러리](#-스킬-라이브러리) 참조.
-- **[03/22/2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) — **크로스 플랫폼 지원 + 주요 안정성 개선** — ACP 호환 AI 에이전트 백엔드(Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) 지원 및 OpenClaw 브릿지를 통한 메시징 플랫폼(Discord, Telegram, Lark, WeChat) 지원 추가. 새로운 CLI-agent 코드 생성 백엔드가 Stage 10 및 13을 외부 CLI 에이전트에 위임하며, 예산 제어 및 타임아웃 관리를 지원. 반데이터 조작 시스템(VerifiedRegistry + 실험 진단 및 복구 루프), 100건 이상의 버그 수정, 모듈러 executor 리팩토링, `--resume` 자동 감지, LLM 재시도 강화, 커뮤니티 보고 수정 포함.
-
-<details>
-<summary>이전 릴리스</summary>
-
-- **[03/18/2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) — **OpenCode Beast Mode + Community Contributions** — New "Beast Mode" routes complex code generation to [OpenCode](https://github.com/anomalyco/opencode) with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
-- **[03/17/2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) — **MetaClaw Integration** — AutoResearchClaw now supports [MetaClaw](https://github.com/aiming-lab/MetaClaw) cross-run learning: pipeline failures → structured lessons → reusable skills, injected into all 23 stages. **+18.3%** robustness in controlled experiments. Opt-in (`metaclaw_bridge.enabled: true`), fully backward-compatible. See [Integration Guide](#-metaclaw-integration).
-- **[03/16/2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) — Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
-- **[03/15/2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) — We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.
-
-</details>
-
----
-
-## ⚡ 하나의 명령. 하나의 논문.
-
-```bash
-# 완전 자율 — 인간 개입 없음
-pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve
-
-# 코파일럿 모드 — 주요 의사결정 지점에서 AI와 협업
-researchclaw run --topic "Your research idea here" --mode co-pilot
-```
-
-
----
-
-## 🤔 이것은 무엇인가요?
-
-**당신이 생각하면, AutoResearchClaw가 씁니다. 당신이 핵심 결정을 안내합니다.**
-
-연구 주제를 입력하면 — OpenAlex, Semantic Scholar, arXiv의 실제 문헌, 하드웨어 인식 샌드박스 실험 (GPU/MPS/CPU 자동 감지), 통계 분석, 멀티 에이전트 피어 리뷰, NeurIPS/ICML/ICLR 대상 학회 수준 LaTeX를 포함한 완전한 학술 논문을 받을 수 있습니다. 완전 자율로 실행하거나, **코파일럿 모드**를 사용하여 중요한 의사결정 지점에서 AI를 안내하세요 — 연구 방향 선택, 실험 설계 검토, 논문 공동 작성. 환각된 참고문헌이 없습니다.
-
-<table>
-<tr><td>📄</td><td><code>paper_draft.md</code></td><td>완성된 학술 논문 (서론, 관련 연구, 방법론, 실험, 결과, 결론)</td></tr>
-<tr><td>📐</td><td><code>paper.tex</code></td><td>학회 제출용 LaTeX (NeurIPS / ICLR / ICML 템플릿)</td></tr>
-<tr><td>📚</td><td><code>references.bib</code></td><td>OpenAlex, Semantic Scholar, arXiv에서 가져온 실제 BibTeX 참고문헌 — 인라인 인용과 일치하도록 자동 정리</td></tr>
-<tr><td>🔍</td><td><code>verification_report.json</code></td><td>4계층 인용 무결성 + 관련성 검증 (arXiv, CrossRef, DataCite, LLM)</td></tr>
-<tr><td>🧪</td><td><code>experiment runs/</code></td><td>생성된 코드 + 샌드박스 결과 + 구조화된 JSON 메트릭</td></tr>
-<tr><td>📊</td><td><code>charts/</code></td><td>오차 막대와 신뢰 구간이 포함된 자동 생성 조건 비교 차트</td></tr>
-<tr><td>📝</td><td><code>reviews.md</code></td><td>방법론-증거 일관성 검사를 포함한 멀티 에이전트 피어 리뷰</td></tr>
-<tr><td>🧬</td><td><code>evolution/</code></td><td>각 실행에서 추출된 자기 학습 교훈</td></tr>
-<tr><td>📦</td><td><code>deliverables/</code></td><td>모든 최종 산출물을 하나의 폴더에 — Overleaf에 바로 컴파일 가능</td></tr>
-</table>
-
-파이프라인은 **처음부터 끝까지** 실행됩니다 — 완전 자율 또는 Human-in-the-Loop 협업. 실험이 실패하면 자가 복구합니다. 가설이 성립하지 않으면 방향을 전환합니다. 인용이 가짜면 삭제합니다. 당신이 조향하고 싶을 때, 파이프라인이 멈추고 경청합니다.
-
-🌍 **어디서든 실행 가능.** AutoResearchClaw는 특정 플랫폼에 종속되지 않습니다. CLI로 독립 실행하거나, [OpenClaw](https://github.com/openclaw/openclaw)에 연결하거나, ACP 호환 AI 에이전트 —— 🤖 Claude Code, 💻 Codex CLI, 🐙 Copilot CLI, ♊ Gemini CLI, 🌙 Kimi CLI 등 —— 와 연동할 수 있습니다. OpenClaw의 메시지 브릿지 덕분에 💬 Discord, ✈️ Telegram, 🐦 Lark(飞书), 💚 WeChat 등 팀이 이미 사용 중인 플랫폼에서 연구를 시작할 수 있습니다. 주제 하나 입력하면 논문 하나 완성 — 어디서 입력하든 상관없습니다.
-
----
-
-## 🚀 빠른 시작
-
-```bash
-# 1. 클론 & 설치
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-python3 -m venv .venv && source .venv/bin/activate
-pip install -e .
-
-# 2. 설정 (대화형 — OpenCode Beast Mode 설치, Docker/LaTeX 확인)
-researchclaw setup
-
-# 3. 구성
-researchclaw init          # 대화형: LLM 제공자 선택, config.arc.yaml 생성
-# 또는 수동: cp config.researchclaw.example.yaml config.arc.yaml
-
-# 4. 실행
-export OPENAI_API_KEY="sk-..."
-researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve
-```
-
-출력 → `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` — 컴파일 가능한 LaTeX, BibTeX, 실험 코드, 차트.
-
-<details>
-<summary>📝 최소 필수 설정</summary>
-
-```yaml
-project:
-  name: "my-research"
-
-research:
-  topic: "Your research topic here"
-
-llm:
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-4o"
-  fallback_models: ["gpt-4o-mini"]
-
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python"
-```
-
-</details>
-
----
-
-## 🧠 차별화 요소
-
-| 기능 | 작동 방식 |
-|------|----------|
-| **🧑‍✈️ 코파일럿 모드** | 6가지 개입 모드 — 완전 자율부터 단계별까지. 중요한 결정(가설, 베이스라인, 논문 작성)에서 AI를 안내하거나 자유롭게 실행. SmartPause가 인간의 입력이 도움이 될 때를 자동 감지. |
-| **🔄 PIVOT / REFINE 루프** | 15단계에서 자율적으로 결정: PROCEED, REFINE (매개변수 조정), 또는 PIVOT (새 방향). 산출물 자동 버전 관리. |
-| **🤖 멀티 에이전트 토론** | 가설 생성, 결과 분석, 피어 리뷰 각각에서 구조화된 다관점 토론을 수행. |
-| **🧬 자기 학습** | 각 실행에서 교훈 추출 (의사결정 근거, 런타임 경고, 메트릭 이상), 30일 시간 감쇠. 향후 실행이 과거의 실수에서 학습. |
-| **📚 지식 기반** | 각 실행에서 6개 카테고리 (결정, 실험, 발견, 문헌, 질문, 리뷰)에 걸친 구조화된 지식 기반 구축. |
-| **🛡️ 센티넬 감시견** | 백그라운드 품질 모니터: NaN/Inf 감지, 논문-증거 일관성, 인용 관련성 점수, 날조 방지 가드. |
-| **🔍 클레임 검증** | 인라인 팩트 체킹: AI 생성 텍스트에서 주장을 추출하고 수집된 문헌과 교차 검증. 근거 없는 인용과 날조된 숫자를 플래그. |
-| **🌿 분기 탐색** | 파이프라인을 분기하여 여러 연구 방향을 동시에 탐색하고, 결과를 나란히 비교하고, 최적의 경로를 병합. |
-
----
-
-## 🦞 OpenClaw 통합
-
-<table>
-<tr>
-
-**AutoResearchClaw는 [OpenClaw](https://github.com/openclaw/openclaw) 호환 서비스입니다.** OpenClaw에 설치하고 단일 메시지로 자율 연구를 시작하거나 — CLI, Claude Code 또는 기타 AI 코딩 어시스턴트를 통해 독립적으로 사용하세요.
-
-</tr>
-</table>
-
-### 🚀 OpenClaw와 함께 사용 (권장)
-
-[OpenClaw](https://github.com/openclaw/openclaw)을 이미 AI 어시스턴트로 사용하고 있다면:
-
-```
-1️⃣  GitHub 저장소 URL을 OpenClaw에 공유
-2️⃣  OpenClaw이 자동으로 RESEARCHCLAW_AGENTS.md를 읽고 → 파이프라인을 이해
-3️⃣  "Research [주제]"라고 말하기
-4️⃣  완료 — OpenClaw이 클론, 설치, 설정, 실행, 결과 반환까지 자동 처리
-```
-
-**그게 전부입니다.** OpenClaw이 `git clone`, `pip install`, 설정 구성, 파이프라인 실행을 자동으로 처리합니다. 채팅만 하면 됩니다.
-
-<details>
-<summary>💡 내부 동작 과정</summary>
-
-1. OpenClaw이 `RESEARCHCLAW_AGENTS.md`를 읽고 → 연구 오케스트레이터 역할을 학습
-2. OpenClaw이 `README.md`를 읽고 → 설치 및 파이프라인 구조를 이해
-3. OpenClaw이 `config.researchclaw.example.yaml`을 → `config.yaml`로 복사
-4. LLM API 키를 요청 (또는 환경 변수를 사용)
-5. `pip install -e .` + `researchclaw run --topic "..." --auto-approve` 실행
-6. 논문, LaTeX, 실험, 인용을 반환
-
-</details>
-
-### 🔌 OpenClaw 브릿지 (고급)
-
-더 깊은 통합을 위해 AutoResearchClaw는 6가지 선택적 기능을 갖춘 **브릿지 어댑터 시스템**을 포함합니다:
-
-```yaml
-# config.arc.yaml
-openclaw_bridge:
-  use_cron: true              # ⏰ 예약된 연구 실행
-  use_message: true           # 💬 진행 상황 알림 (Discord/Slack/Telegram)
-  use_memory: true            # 🧠 세션 간 지식 영속성
-  use_sessions_spawn: true    # 🔀 동시 단계를 위한 병렬 서브세션 생성
-  use_web_fetch: true         # 🌐 문헌 검토 중 실시간 웹 검색
-  use_browser: false          # 🖥️ 브라우저 기반 논문 수집
-```
-
-각 플래그는 타입이 지정된 어댑터 프로토콜을 활성화합니다. OpenClaw이 이러한 기능을 제공하면 어댑터가 코드 변경 없이 이를 소비합니다. 전체 세부 사항은 [`integration-guide.md`](integration-guide.md)를 참조하세요.
-
-### ACP (Agent Client Protocol)
-
-AutoResearchClaw는 **모든 ACP 호환 코딩 에이전트**를 LLM 백엔드로 사용할 수 있습니다 — API 키가 필요 없습니다. 에이전트는 [acpx](https://github.com/openclaw/acpx)를 통해 통신하며, 전체 23개 파이프라인 단계에 걸쳐 단일 영구 세션을 유지합니다.
-
-| 에이전트 | 명령어 | 비고 |
-|---------|--------|------|
-| Claude Code | `claude` | Anthropic |
-| Codex CLI | `codex` | OpenAI |
-| Copilot CLI | `gh` | GitHub |
-| Gemini CLI | `gemini` | Google |
-| OpenCode | `opencode` | SST |
-| Kimi CLI | `kimi` | Moonshot |
-
-```yaml
-# config.yaml — ACP 예시
-llm:
-  provider: "acp"
-  acp:
-    agent: "claude"   # 모든 ACP 호환 에이전트 CLI 명령어
-    cwd: "."          # 에이전트의 작업 디렉토리
-  # base_url이나 api_key 불필요 — 에이전트가 자체 인증을 처리합니다.
-```
-
-```bash
-# 바로 실행 — 에이전트가 자체 자격 증명 사용
-researchclaw run --config config.yaml --topic "Your research idea" --auto-approve
-```
-
-### 🛠️ 기타 실행 방법
-
-| 방법 | 사용법 |
-|------|--------|
-| **독립형 CLI** | `researchclaw run --topic "..." --auto-approve` (자율) 또는 `--mode co-pilot` (협력) |
-| **Python API** | `from researchclaw.pipeline import Runner; Runner(config).run()` |
-| **Claude Code** | `RESEARCHCLAW_CLAUDE.md`를 읽음 — *"Run research on [주제]"*라고 말하기 |
-| **Copilot CLI** | `researchclaw run --topic "..."` 에 `llm.acp.agent: "gh"` 사용 |
-| **OpenCode** | `.claude/skills/`를 읽음 — 동일한 자연어 인터페이스 |
-| **기타 AI CLI** | `RESEARCHCLAW_AGENTS.md`를 컨텍스트로 제공 → 에이전트가 자동 부트스트랩 |
-
----
-
-## 🔬 파이프라인: 23단계, 8페이즈
-
-```
-페이즈 A: 연구 범위 설정            페이즈 E: 실험 실행
-  1. TOPIC_INIT                      12. EXPERIMENT_RUN
-  2. PROBLEM_DECOMPOSE               13. ITERATIVE_REFINE  ← 자가 복구
-
-페이즈 B: 문헌 탐색                페이즈 F: 분석 및 의사결정
-  3. SEARCH_STRATEGY                 14. RESULT_ANALYSIS    ← 멀티 에이전트
-  4. LITERATURE_COLLECT  ← 실제 API  15. RESEARCH_DECISION  ← PIVOT/REFINE
-  5. LITERATURE_SCREEN   [게이트]
-  6. KNOWLEDGE_EXTRACT               페이즈 G: 논문 작성
-                                     16. PAPER_OUTLINE
-페이즈 C: 지식 종합                   17. PAPER_DRAFT
-  7. SYNTHESIS                       18. PEER_REVIEW        ← 증거 확인
-  8. HYPOTHESIS_GEN    ← 토론        19. PAPER_REVISION
-
-페이즈 D: 실험 설계               페이즈 H: 최종화
-  9. EXPERIMENT_DESIGN   [게이트]      20. QUALITY_GATE      [게이트]
- 10. CODE_GENERATION                 21. KNOWLEDGE_ARCHIVE
- 11. RESOURCE_PLANNING               22. EXPORT_PUBLISH     ← LaTeX
-                                     23. CITATION_VERIFY    ← 관련성 확인
-```
-
-> **게이트 단계** (5, 9, 20)는 사람의 승인을 기다리거나 `--auto-approve`로 자동 승인합니다. 거부 시 파이프라인이 롤백됩니다.
-
-> **코파일럿 모드** (`--mode co-pilot`): 7-8단계(아이디어 워크숍), 9단계(베이스라인 내비게이터), 16-17단계(논문 코라이터)에서 깊은 인간-AI 협업. 나머지 단계는 SmartPause 모니터링과 함께 자동 실행.
-
-> **의사결정 루프**: 15단계에서 REFINE (→ 13단계) 또는 PIVOT (→ 8단계)을 트리거할 수 있으며, 산출물 버전 관리가 자동으로 이루어집니다.
-
-<details>
-<summary>📋 각 페이즈별 상세 설명</summary>
-
-| 페이즈 | 수행 내용 |
-|--------|----------|
-| **A: 범위 설정** | LLM이 주제를 연구 질문이 포함된 구조화된 문제 트리로 분해 |
-| **A+: 하드웨어** | GPU 자동 감지 (NVIDIA CUDA / Apple MPS / CPU 전용), 로컬 하드웨어가 제한적인 경우 경고, 이에 맞게 코드 생성 적응 |
-| **B: 문헌** | 다중 소스 검색 (OpenAlex → Semantic Scholar → arXiv)으로 실제 논문 수집, 관련성별 선별, 지식 카드 추출 |
-| **C: 종합** | 연구 결과 클러스터링, 연구 갭 식별, 멀티 에이전트 토론을 통한 검증 가능한 가설 생성 |
-| **D: 설계** | 실험 계획 설계, 하드웨어 인식 실행 가능 Python 생성 (GPU 등급 → 패키지 선택), 리소스 요구 사항 추정 |
-| **E: 실행** | 샌드박스에서 실험 실행, NaN/Inf 및 런타임 버그 감지, LLM을 통한 표적화된 코드 자가 복구 |
-| **F: 분석** | 결과에 대한 멀티 에이전트 분석; 근거가 포함된 자율 PROCEED / REFINE / PIVOT 결정 |
-| **G: 작성** | 개요 → 섹션별 작성 (5,000-6,500단어) → 피어 리뷰 (방법론-증거 일관성 포함) → 길이 제한 적용 수정 |
-| **H: 최종화** | 품질 게이트, 지식 아카이빙, 학회 템플릿 포함 LaTeX 내보내기, 인용 무결성 + 관련성 검증 |
-
-</details>
-
----
-
-## ✨ 주요 기능
-
-| 기능 | 설명 |
-|------|------|
-| **📚 다중 소스 문헌** | OpenAlex, Semantic Scholar, arXiv에서 실제 논문 — 쿼리 확장, 중복 제거, 3상태 서킷 브레이커와 단계적 성능 저하 |
-| **🔍 4계층 인용 검증** | arXiv ID 확인 → CrossRef/DataCite DOI → Semantic Scholar 제목 매칭 → LLM 관련성 점수. 환각된 참고문헌 자동 삭제. |
-| **🖥️ 하드웨어 인식 실행** | GPU (NVIDIA CUDA / Apple MPS / CPU 전용) 자동 감지, 이에 맞게 코드 생성, import, 실험 규모 적응 |
-| **🦾 OpenCode Beast Mode** | 복잡한 실험을 [OpenCode](https://github.com/anomalyco/opencode)로 자동 라우팅 — 커스텀 아키텍처, 학습 루프, 절제 연구가 포함된 다중 파일 프로젝트 생성. `researchclaw setup`으로 설치. |
-| **🧪 샌드박스 실험** | AST 검증 코드, 불변 하네스, NaN/Inf 즉시 실패, 자가 복구, 반복적 개선 (최대 10라운드), 부분 결과 캡처 |
-| **📝 학회 수준 작성** | NeurIPS/ICML/ICLR 템플릿, 섹션별 작성 (5,000-6,500단어), 날조 방지 가드, 수정 길이 제한, 면책 조항 방지 적용 |
-| **📐 템플릿 전환** | `neurips_2025`, `iclr_2026`, `icml_2026` — Markdown → LaTeX (수학, 표, 그림, 교차 참조, `\cite{}` 포함) |
-| **🛡️ 날조 방지** | VerifiedRegistry가 논문에서 실험 데이터의 진실성을 강제. 실패한 실험을 자동 진단하고 작성 전에 복구. 검증되지 않은 숫자는 제거. |
-| **🚦 품질 게이트** | 3개의 Human-in-the-loop 게이트 (단계 5, 9, 20), 롤백 지원. `--auto-approve`로 건너뛰기. |
-| **🧑‍✈️ HITL 코파일럿** | 단계별 정책이 있는 6가지 개입 모드. 아이디어 워크숍, 베이스라인 내비게이터, 논문 코라이터로 깊은 협업. SmartPause, 비용 가드레일, 에스컬레이션 정책, 개입 학습으로 프로덕션 안전성 확보. CLI/WebSocket/MCP 어댑터. |
-| **💰 비용 가드레일** | 구성 가능한 임계값 알림(50%/80%/100%)이 포함된 예산 모니터링. 비용이 예산을 초과하면 파이프라인 자동 일시 정지. |
-| **🔐 재현성** | 모든 단계 산출물에 대한 SHA256 체크섬. 검증을 위한 불변 매니페스트. 버전 관리된 스냅샷을 사용한 다단계 실행 취소. |
-
----
-
-## 🧑‍✈️ Human-in-the-Loop 코파일럿
-
-**AutoResearchClaw v0.4.0은 완전한 Human-in-the-Loop (HITL) 시스템을 도입하여** 파이프라인을 순수 자율 시스템에서 인간-AI 협력 연구 엔진으로 전환합니다. 참여 수준을 선택하세요:
-
-### 개입 모드
-
-| 모드 | 명령어 | 기능 |
-|------|--------|------|
-| **Full Auto** | `--auto-approve` | 기존 동작 — 인간 개입 없음 |
-| **Gate Only** | `--mode gate-only` | 3개 게이트 단계(5, 9, 20)에서 승인을 위해 일시 정지 |
-| **Checkpoint** | `--mode checkpoint` | 각 페이즈 경계에서 일시 정지 (8개 체크포인트) |
-| **Co-Pilot** | `--mode co-pilot` | 중요 단계에서 깊은 협업, 나머지는 자동 |
-| **Step-by-Step** | `--mode step-by-step` | 모든 단계 후 일시 정지 — 파이프라인 학습 |
-| **Express** | `--mode express` | 빠른 검토 — 가장 중요한 3개 게이트만 |
-
-### 코파일럿 워크플로우
-
-```
-You: researchclaw run --topic "양자 노이즈를 신경망 정규화로 활용" --mode co-pilot
-
-파이프라인이 1-7단계를 자동 실행...
-
-  ┌─────────────────────────────────────────────────────────────┐
-  │  HITL | Stage 08: HYPOTHESIS_GEN                            │
-  │  Post-stage review                                          │
-  │                                                             │
-  │  Hypotheses mentioned: 3                                    │
-  │  Novelty score: 0.72 (moderate)                             │
-  │                                                             │
-  │  [a] Approve  [r] Reject  [e] Edit  [c] Collaborate         │
-  │  [i] Inject guidance  [v] View output  [q] Abort            │
-  └─────────────────────────────────────────────────────────────┘
-
-You: c  (협업 채팅 시작)
-You: 가설 3이 흥미롭지만 Dropout/Label Smoothing을 베이스라인으로 추가해야 합니다
-AI:  업데이트 완료 — Dropout, Label Smoothing, MixUp, CutMix을 베이스라인으로 추가했습니다...
-You: approve
-
-파이프라인이 수정된 가설로 계속 진행...
-```
-
-### CLI 명령어
-
-```bash
-# HITL 모드로 시작
-researchclaw run --topic "..." --mode co-pilot
-
-# 일시 정지된 파이프라인에 연결 (다른 터미널에서)
-researchclaw attach artifacts/rc-2026-xxx
-
-# 파이프라인 및 HITL 상태 확인
-researchclaw status artifacts/rc-2026-xxx
-
-# 다른 터미널이나 스크립트에서 승인/거부
-researchclaw approve artifacts/rc-2026-xxx --message "LGTM"
-researchclaw reject artifacts/rc-2026-xxx --reason "핵심 베이스라인 누락"
-
-# 단계에 가이던스 주입 (실행 전에도 가능)
-researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "ResNet-50을 주요 베이스라인으로 사용"
-```
-
-### 주요 기능
-
-| 기능 | 설명 |
-|------|------|
-| **아이디어 워크숍** | 가설을 협력적으로 브레인스토밍, 평가, 정제 (7-8단계) |
-| **베이스라인 내비게이터** | AI가 베이스라인 제안 + 인간이 추가/제거 + 재현성 체크리스트 (9단계) |
-| **논문 코라이터** | 인간 편집과 AI 다듬기를 통한 섹션별 작성 (16-19단계) |
-| **SmartPause** | 신뢰도 기반 동적 일시 정지 — 인간의 입력이 도움이 될 때를 자동 감지 |
-| **클레임 검증** | 수집된 문헌과 대조한 인라인 팩트 체킹 — 근거 없는 주장을 플래그 |
-| **비용 가드레일** | 50%/80%/100% 임계값 알림이 포함된 예산 모니터링 |
-| **개입 학습** | ALHF — 검토 패턴에서 학습하여 향후 일시 정지 결정을 최적화 |
-| **분기 탐색** | 파이프라인을 분기하여 여러 가설을 탐색, 비교, 최적 경로 병합 |
-| **에스컬레이션 정책** | 계층형 알림 (터미널 → Slack → 이메일 → 자동 중지) 무인 시 |
-| **3가지 어댑터** | CLI (터미널), WebSocket (웹 대시보드), MCP (외부 에이전트) |
-
-### 설정
-
-```yaml
-# config.arc.yaml
-hitl:
-  enabled: true
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | co-pilot | custom
-  cost_budget_usd: 50.0              # 비용이 예산을 초과하면 일시 정지 (0 = 제한 없음)
-
-  notifications:
-    on_pause: true
-    on_quality_drop: true
-    channels: ["terminal"]            # terminal | slack | webhook
-
-  timeouts:
-    default_human_timeout_sec: 86400  # 24시간 기본 대기
-    auto_proceed_on_timeout: false
-
-  collaboration:
-    max_chat_turns: 50
-    save_chat_history: true
-
-  # 단계별 커스텀 정책 (선택, 'custom' 모드용)
-  stage_policies:
-    8: { require_approval: true, enable_collaboration: true }
-    9: { require_approval: true, allow_edit_output: true }
-```
-
-### 하위 호환성
-
-- **기본값: 꺼짐.** `hitl.enabled: true` 또는 `--mode` 없이는 파이프라인이 이전과 정확히 동일하게 동작합니다.
-- **`--auto-approve`는 그대로 작동.** HITL 모드를 오버라이드합니다.
-- **기존 2,699개 테스트 모두 통과** (HITL 코드 포함).
-
----
-
-## 🧠 MetaClaw 통합
-
-**AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = 모든 실행에서 학습하는 파이프라인.**
-
-MetaClaw는 AutoResearchClaw에 **교차 실행 지식 전이**를 추가합니다. 활성화되면 파이프라인이 실패와 경고에서 자동으로 교훈을 추출하고, 이를 재사용 가능한 스킬로 변환하여 후속 실행의 전체 23단계에 주입합니다 — 같은 실수를 다시 반복하지 않습니다.
-
-### 작동 방식
-
-```
-Run N executes → failures/warnings captured as Lessons
-                      ↓
-          MetaClaw Lesson → Skill conversion
-                      ↓
-          arc-* Skill files stored in ~/.metaclaw/skills/
-                      ↓
-Run N+1 → build_overlay() injects skills into every LLM prompt
-                      ↓
-          LLM avoids known pitfalls → higher quality, fewer retries
-```
-
-### 빠른 설정
-
-```bash
-# 1. MetaClaw 설치 (미설치 시)
-pip install metaclaw
-
-# 2. 설정에서 활성화
-```
-
-```yaml
-# config.arc.yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000"        # MetaClaw 프록시 (선택)
-  skills_dir: "~/.metaclaw/skills"          # 스킬 저장 위치
-  fallback_url: "https://api.openai.com/v1" # 직접 LLM 폴백
-  fallback_api_key: ""                      # 폴백 URL의 API 키
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # warning + error 변환
-    max_skills_per_run: 3
-```
-
-```bash
-# 3. 평소대로 실행 — MetaClaw가 투명하게 작동
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-```
-
-각 실행 후 `~/.metaclaw/skills/arc-*/SKILL.md`를 확인하여 파이프라인이 학습한 스킬을 확인하세요.
-
-### 실험 결과
-
-대조 A/B 실험 (동일 주제, 동일 LLM, 동일 설정):
-
-| 메트릭 | 기준선 | MetaClaw 사용 시 | 개선 |
-|--------|--------|-----------------|------|
-| 단계 재시도율 | 10.5% | 7.9% | **-24.8%** |
-| Refine 사이클 수 | 2.0 | 1.2 | **-40.0%** |
-| 파이프라인 단계 완료율 | 18/19 | 19/19 | **+5.3%** |
-| 전체 견고성 점수 (종합) | 0.714 | 0.845 | **+18.3%** |
-
-> 종합 견고성 점수는 단계 완료율 (40%), 재시도 감소 (30%), Refine 사이클 효율성 (30%)의 가중 평균입니다.
-
-### 하위 호환성
-
-- **기본값: 꺼짐.** `metaclaw_bridge`가 없거나 `enabled: false`이면 파이프라인은 이전과 정확히 동일하게 동작합니다.
-- **새로운 종속성 없음.** MetaClaw는 선택 사항입니다 — 핵심 파이프라인은 MetaClaw 없이도 동작합니다.
-- **기존 2,699개 테스트 모두 통과** (통합 코드 포함).
-
----
-
-## 🧩 스킬 라이브러리
-
-AutoResearchClaw는 이제 연구 경험을 더욱 향상시키기 위해 **오픈소스 및 커스텀 스킬** 로딩을 지원합니다. 과학적 글쓰기, 문헌 검색, 화학, 생물학 등을 포괄하는 **20개의 사전 로드된 내장 스킬**도 즉시 사용 가능한 참고자료로 제공되어 높은 유연성을 제공합니다. frontmatter에 `enabled: false`를 추가하여 스킬을 비활성화할 수 있습니다.
-
-**내장 스킬 예시:**
-
-| 카테고리 | 스킬 | 설명 |
-|----------|------|------|
-| **작성** | `scientific-writing` | IMRAD 구조, 인용 서식, 보고 가이드라인 |
-| **도메인** | `chemistry-rdkit` | 분자 분석, SMILES, 핑거프린트, 신약 발견 |
-| **실험** | `literature-search` | 체계적 리뷰, PRISMA 방법론 |
-
-> `researchclaw skills list`로 20개 전체 스킬을 확인하세요.
-
-### 직접 스킬 로딩
-
-```bash
-# 옵션 1: 스킬 설치 (프로젝트 간 영구 유지)
-researchclaw skills install /path/to/my-skill/
-
-# 옵션 2: 프로젝트에 SKILL.md 추가
-mkdir -p .claude/skills/my-custom-skill
-# YAML frontmatter(name, description, trigger-keywords, applicable-stages)가 포함된 SKILL.md를 생성
-
-# 옵션 3: config.arc.yaml에서 공유 스킬 디렉토리 설정
-# skills:
-#   custom_dirs:
-#     - /path/to/team-shared-skills
-```
-
-### 스킬 사용
-
-스킬은 자동으로 로드되어 LLM 프롬프트에 주입됩니다 — 수동 활성화가 필요 없습니다. CLI로 확인:
-
-```bash
-researchclaw skills list               # 소스와 함께 로드된 모든 스킬 표시
-researchclaw skills validate ./my-skill # SKILL.md 형식 확인
-```
-
-커뮤니티 스킬 찾아보기: [K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills) (여러 분야에 걸친 150개 이상의 과학 스킬).
-
----
-
-## ⚙️ 설정 참고서
-
-<details>
-<summary>전체 설정 참고서 펼치기</summary>
-
-```yaml
-# === 프로젝트 ===
-project:
-  name: "my-research"              # 프로젝트 식별자
-  mode: "docs-first"               # docs-first | semi-auto | full-auto
-
-# === 연구 ===
-research:
-  topic: "..."                     # 연구 주제 (필수)
-  domains: ["ml", "nlp"]           # 문헌 검색용 연구 분야
-  daily_paper_count: 8             # 검색 쿼리당 목표 논문 수
-  quality_threshold: 4.0           # 논문 최소 품질 점수
-
-# === 런타임 ===
-runtime:
-  timezone: "America/New_York"     # 타임스탬프용
-  max_parallel_tasks: 3            # 동시 실험 제한
-  approval_timeout_hours: 12       # 게이트 단계 타임아웃
-  retry_limit: 2                   # 단계 실패 시 재시도 횟수
-
-# === LLM ===
-llm:
-  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
-  base_url: "https://..."          # API 엔드포인트 (openai-compatible 필수)
-  api_key_env: "OPENAI_API_KEY"    # API 키용 환경 변수 (openai-compatible 필수)
-  api_key: ""                      # 또는 키를 직접 입력
-  primary_model: "gpt-4o"          # 기본 모델
-  fallback_models: ["gpt-4o-mini"] # 폴백 체인
-  s2_api_key: ""                   # Semantic Scholar API 키 (선택, 더 높은 속도 제한)
-  acp:                             # provider: "acp" 인 경우에만 사용
-    agent: "claude"                # ACP 에이전트 CLI 명령어 (claude, codex, gemini 등)
-    cwd: "."                       # 에이전트의 작업 디렉토리
-
-# === 실험 ===
-experiment:
-  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
-  time_budget_sec: 300             # 실행당 최대 실행 시간 (기본값: 300초)
-  max_iterations: 10               # 최대 최적화 반복 횟수
-  metric_key: "val_loss"           # 기본 메트릭 이름
-  metric_direction: "minimize"     # minimize | maximize
-  sandbox:
-    python_path: ".venv/bin/python"
-    gpu_required: false
-    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
-    max_memory_mb: 4096
-  docker:
-    image: "researchclaw/experiment:latest"
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    auto_install_deps: true        # import 자동 감지 → requirements.txt
-  ssh_remote:
-    host: ""                       # GPU 서버 호스트명
-    gpu_ids: []                    # 사용 가능한 GPU ID
-    remote_workdir: "/tmp/researchclaw_experiments"
-  opencode:                          # OpenCode Beast Mode (`researchclaw setup`으로 자동 설치)
-    enabled: true                    # 마스터 스위치 (기본값: true)
-    auto: true                       # 확인 없이 자동 트리거 (기본값: true)
-    complexity_threshold: 0.2        # 0.0-1.0 — 높을수록 복잡한 실험에서만 트리거
-    model: ""                        # 모델 오버라이드 (비어있으면 llm.primary_model 사용)
-    timeout_sec: 600                 # OpenCode 생성 최대 초
-    max_retries: 1                   # 실패 시 재시도 횟수
-    workspace_cleanup: true          # 수집 후 임시 작업 공간 제거
-
-# === 내보내기 ===
-export:
-  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
-  authors: "Anonymous"
-  bib_file: "references"
-
-# === 프롬프트 ===
-prompts:
-  custom_file: ""                  # 사용자 정의 프롬프트 YAML 경로 (비어 있으면 기본값)
-
-# === HITL 코파일럿 (v0.4.0 신규) ===
-hitl:
-  enabled: false                     # true로 설정하여 HITL 활성화
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
-  cost_budget_usd: 0.0              # USD 비용 한도 (0 = 제한 없음)
-  notifications:
-    on_pause: true                   # 파이프라인 일시 정지 시 알림
-    on_quality_drop: true            # 품질 문제 시 알림
-    channels: ["terminal"]           # terminal | slack | webhook
-  timeouts:
-    default_human_timeout_sec: 86400 # 인간 입력 최대 24시간 대기
-    auto_proceed_on_timeout: false   # true이면 타임아웃 시 자동 승인
-  collaboration:
-    max_chat_turns: 50               # 협업 세션당 최대 턴 수
-    save_chat_history: true          # 채팅 로그 영구 저장
-  stage_policies: {}                 # 단계별 오버라이드 ('custom' 모드용)
-
-# === 보안 ===
-security:
-  hitl_required_stages: [5, 9, 20] # 사람의 승인이 필요한 단계
-  allow_publish_without_approval: false
-  redact_sensitive_logs: true
-
-# === 지식 기반 ===
-knowledge_base:
-  backend: "markdown"              # markdown | obsidian
-  root: "docs/kb"
-
-# === 알림 ===
-notifications:
-  channel: "console"               # console | discord | slack
-  target: ""
-
-# === MetaClaw Bridge (선택) ===
-metaclaw_bridge:
-  enabled: false                   # true로 설정하여 교차 실행 학습 활성화
-  proxy_url: "http://localhost:30000"  # MetaClaw 프록시 URL
-  skills_dir: "~/.metaclaw/skills" # arc-* 스킬 저장 위치
-  fallback_url: ""                 # 프록시 장애 시 직접 LLM 폴백
-  fallback_api_key: ""             # 폴백 엔드포인트의 API 키
-  lesson_to_skill:
-    enabled: true                  # 교훈을 스킬로 자동 변환
-    min_severity: "warning"        # 변환할 최소 심각도
-    max_skills_per_run: 3          # 파이프라인 실행당 최대 새 스킬 수
-  prm:                             # Process Reward Model 품질 게이트 (선택)
-    enabled: false                 # LLM-as-judge를 사용하여 단계 출력 점수 매기기
-    model: "gpt-5.4"              # PRM 심사 모델
-    votes: 3                       # 다수결 투표 수
-    gate_stages: [5, 9, 15, 20]   # PRM 게이트를 적용할 단계
-
-# === OpenClaw 브릿지 ===
-openclaw_bridge:
-  use_cron: false                  # 예약된 연구 실행
-  use_message: false               # 진행 상황 알림
-  use_memory: false                # 세션 간 지식 영속성
-  use_sessions_spawn: false        # 병렬 서브세션 생성
-  use_web_fetch: false             # 실시간 웹 검색
-  use_browser: false               # 브라우저 기반 논문 수집
-```
-
-</details>
-
----
-
-## 🙏 감사의 말
-
-다음 프로젝트에서 영감을 받았습니다:
-
-- 🔬 [AI Scientist](https://github.com/SakanaAI/AI-Scientist) (Sakana AI) — 자동화 연구의 선구자
-- 🧠 [AutoResearch](https://github.com/karpathy/autoresearch) (Andrej Karpathy) — 엔드투엔드 연구 자동화
-- 🌐 [FARS](https://analemma.ai/blog/introducing-fars/) (Analemma) — 완전 자동 연구 시스템
-
----
-
-## 📄 라이선스
-
-MIT — 자세한 내용은 [LICENSE](../LICENSE)를 참조하세요.
-
----
-
-## 📌 인용
-
-AutoResearchClaw가 유용했다면, 아래를 인용해 주세요:
-
-```bibtex
-@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
-      title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
-      author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
-      year={2026},
-      eprint={2605.20025},
-      archivePrefix={arXiv},
-      primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2605.20025},
-}
-```
-
-<p align="center">
-  <sub>Built with 🦞 by the AutoResearchClaw team</sub>
-</p>
diff --git a/docs/README_PT.md b/docs/README_PT.md
deleted file mode 100644
index 4abc2ec6..00000000
--- a/docs/README_PT.md
+++ /dev/null
@@ -1,790 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="700" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center"><b>Converse uma ideia. Receba um artigo. Autônomo, Colaborativo & Auto-evolutivo.</b></h2>
-
-
-
-<p align="center">
-  <b><i><font size="5">Converse com o <a href="#integração-openclaw">OpenClaw</a>: "Pesquise X" → pronto.</font></i></b>
-</p>
-
-<p align="center">
-  📄 <b>Nosso artigo está no arXiv — venha ler!</b> <a href="https://arxiv.org/abs/2605.20025"><i>AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration</i></a>
-</p>
-
-<p align="center">
-  <img src="../image/framework_v2.png" width="100%" alt="AutoResearchClaw Framework">
-</p>
-
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2605.20025"><img src="https://img.shields.io/badge/arXiv-2605.20025-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
-  <a href="https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ARC--Bench-yellow" alt="ARC-Bench on Hugging Face"></a>
-  <a href="../LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT License"></a>
-  <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" alt="Python 3.11+"></a>
-  <a href="#testes"><img src="https://img.shields.io/badge/Tests-2699%20passed-brightgreen?logo=pytest&logoColor=white" alt="2699 Tests Passed"></a>
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/GitHub-AutoResearchClaw-181717?logo=github" alt="GitHub"></a>
-  <a href="#integração-openclaw"><img src="https://img.shields.io/badge/OpenClaw-Compatible-ff4444?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6IiBmaWxsPSJ3aGl0ZSIvPjwvc3ZnPg==" alt="OpenClaw Compatible"></a>
-  <a href="https://discord.gg/u4ksqW5P"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
-</p>
-
-<p align="center">
-  <a href="../README.md">🇺🇸 English</a> ·
-  <a href="README_CN.md">🇨🇳 中文</a> ·
-  <a href="README_JA.md">🇯🇵 日本語</a> ·
-  <a href="README_KO.md">🇰🇷 한국어</a> ·
-  <a href="README_FR.md">🇫🇷 Français</a> ·
-  <a href="README_DE.md">🇩🇪 Deutsch</a> ·
-  <a href="README_ES.md">🇪🇸 Español</a> ·
-  <a href="README_PT.md">🇧🇷 Português</a> ·
-  <a href="README_RU.md">🇷🇺 Русский</a> ·
-  <a href="README_AR.md">🇸🇦 العربية</a>
-</p>
-
-<p align="center">
-  <a href="showcase/SHOWCASE.md">🏆 Galeria de Artigos</a> · <a href="HITL_GUIDE.md">🧑‍✈️ Guia do Co-Piloto</a> · <a href="integration-guide.md">📖 Guia de Integração</a> · <a href="https://discord.gg/u4ksqW5P">💬 Comunidade Discord</a>
-</p>
-
----
-
-<table>
-<tr>
-<td width="18%">
-<a href="showcase/SHOWCASE.md"><img src="showcase/thumbnails/paper_I_random_matrix-01.png" width="120" alt="Artigo Exemplo"/></a>
-</td>
-<td valign="middle">
-<b>🏆 Galeria de Artigos Gerados</b><br><br>
-<b>8 artigos em 8 domínios</b> — matemática, estatística, biologia, computação, NLP, RL, visão, robustez — gerados de forma totalmente autônoma ou com orientação de co-piloto Human-in-the-Loop.<br><br>
-<a href="showcase/SHOWCASE.md"><img src="https://img.shields.io/badge/Ver_Galeria_Completa_→-Todos_os_8_Artigos-d73a49?style=for-the-badge" alt="Ver Galeria"></a>
-</td>
-</tr>
-</table>
-
----
-
-> **🧪 Estamos procurando testadores!** Experimente o pipeline com sua própria ideia de pesquisa — de qualquer área — e [diga-nos o que achou](TESTER_GUIDE.md). Seu feedback molda diretamente a próxima versão. **[→ Testing Guide](TESTER_GUIDE.md)** | **[→ 中文测试指南](TESTER_GUIDE_CN.md)** | **[→ 日本語テストガイド](TESTER_GUIDE_JA.md)**
-
----
-
-## 🔥 News
-- **[05/19/2026]** **v0.5.0** — **Agentes de Experimento Multidomínio + ARC-Bench** — Duas atualizações principais. **(1) Agentes de execução especializados por domínio:** o estágio de experimentos (estágios 10–13) agora vai além do sandbox de ML padrão e roteia para agentes especializados por área — **física de altas energias** (ColliderAgent: FeynRules → MadGraph5 → Delphes via nuvem Magnus), **biologia** (modelagem metabólica em escala genômica com COBRApy) e **estatística** (agente de estudos de simulação), com um executor Docker genérico para química/materiais. O pipeline seleciona automaticamente o executor certo conforme o domínio. **(2) ARC-Bench:** um benchmark de pesquisa autônoma aberta com **55 tópicos** cobrindo **ML (25), física de altas energias (10), quântica (10), biologia (7) e estatística (3)**, cada um com um manifesto e uma rubrica de avaliação (`experiments/arc_bench/`, e também no [🤗 Hugging Face](https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench)). **[→ Guia de Integração de Domínios](DOMAIN_INTEGRATION_GUIDE.md)**
-- **[04/01/2026]** **v0.4.0** — **Sistema Co-Piloto Human-in-the-Loop** — O AutoResearchClaw não é mais puramente autônomo. O novo sistema HITL adiciona 6 modos de intervenção (`full-auto`, `gate-only`, `checkpoint`, `step-by-step`, `co-pilot`, `custom`), políticas por estágio e colaboração profunda humano-IA. Inclui: Idea Workshop para co-criação de hipóteses, Baseline Navigator para revisão de design experimental, Paper Co-Writer para redação colaborativa, SmartPause (intervenção dinâmica baseada em confiança), aprendizado de intervenção ALHF, verificação de afirmações anti-alucinação, guardrails de orçamento de custo, ramificação de pipeline para exploração paralela de hipóteses e comandos CLI (`attach`/`status`/`approve`/`reject`/`guide`). **[→ Guia Completo HITL](HITL_GUIDE.md)**
-- **[03/30/2026]** **Carregamento Flexível de Skills** — O AutoResearchClaw agora suporta o carregamento de skills open-source e customizadas de qualquer disciplina para aprimorar ainda mais sua experiência de pesquisa. 20 skills pré-carregadas estão incluídas como referências prontas para uso, cobrindo escrita científica, design experimental, química, biologia e mais — incluindo uma skill de evolução agêntica [A-Evolve](https://github.com/A-EVO-Lab/a-evolve) contribuída pela comunidade. Carregue as suas via `researchclaw skills install` ou coloque um `SKILL.md` em `.claude/skills/`. Veja [Biblioteca de Skills](#-biblioteca-de-skills).
-- **[03/22/2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) — **Suporte multiplataforma + grande estabilidade** — O AutoResearchClaw agora funciona com qualquer agente compativel com ACP (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) e suporta plataformas de mensagens (Discord, Telegram, Lark, WeChat) via ponte OpenClaw. Novo backend de geracao de codigo CLI-agent que delega os Stages 10 e 13 a agentes CLI externos com controle de orcamento e gerenciamento de timeout. Inclui sistema anti-fabricacao (VerifiedRegistry + loop de diagnostico e reparo), 100+ correcoes de bugs, refatoracao modular do executor, auto-deteccao de `--resume`, endurecimento de retries LLM e correcoes da comunidade.
-
-<details>
-<summary>Versões anteriores</summary>
-
-- **[03/18/2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) — **OpenCode Beast Mode + Community Contributions** — New "Beast Mode" routes complex code generation to [OpenCode](https://github.com/anomalyco/opencode) with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
-- **[03/17/2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) — **MetaClaw Integration** — AutoResearchClaw now supports [MetaClaw](https://github.com/aiming-lab/MetaClaw) cross-run learning: pipeline failures → structured lessons → reusable skills, injected into all 23 stages. **+18.3%** robustness in controlled experiments. Opt-in (`metaclaw_bridge.enabled: true`), fully backward-compatible. See [Integration Guide](#-metaclaw-integration).
-- **[03/16/2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) — Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
-- **[03/15/2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) — We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.
-
-</details>
-
----
-
-## ⚡ Um Comando. Um Artigo.
-
-```bash
-# Totalmente autônomo — sem intervenção humana
-pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve
-
-# Modo Co-Piloto — colabore com a IA em pontos de decisão chave
-researchclaw run --topic "Your research idea here" --mode co-pilot
-```
-
-
----
-
-## 🤔 O Que É Isto?
-
-**Você pensa. AutoResearchClaw escreve. Você guia as decisões-chave.**
-
-Forneça um tópico de pesquisa — receba de volta um artigo acadêmico completo com literatura real do OpenAlex, Semantic Scholar & arXiv, experimentos em sandbox com detecção automática de hardware (GPU/MPS/CPU), análise estatística, revisão por pares multi-agente, e LaTeX pronto para conferência mirando NeurIPS/ICML/ICLR. Execute de forma totalmente autônoma, ou use o **modo Co-Piloto** para guiar a IA em pontos de decisão críticos — escolha direções de pesquisa, revise designs experimentais e co-escreva o artigo. Sem referências alucinadas.
-
-<table>
-<tr><td>📄</td><td><code>paper_draft.md</code></td><td>Artigo acadêmico completo (Introdução, Trabalhos Relacionados, Método, Experimentos, Resultados, Conclusão)</td></tr>
-<tr><td>📐</td><td><code>paper.tex</code></td><td>LaTeX pronto para conferência (templates NeurIPS / ICLR / ICML)</td></tr>
-<tr><td>📚</td><td><code>references.bib</code></td><td>Referências BibTeX reais do OpenAlex, Semantic Scholar e arXiv — auto-podadas para corresponder às citações inline</td></tr>
-<tr><td>🔍</td><td><code>verification_report.json</code></td><td>Verificação de integridade + relevância de citações em 4 camadas (arXiv, CrossRef, DataCite, LLM)</td></tr>
-<tr><td>🧪</td><td><code>experiment runs/</code></td><td>Código gerado + resultados do sandbox + métricas JSON estruturadas</td></tr>
-<tr><td>📊</td><td><code>charts/</code></td><td>Gráficos de comparação de condições gerados automaticamente com barras de erro e intervalos de confiança</td></tr>
-<tr><td>📝</td><td><code>reviews.md</code></td><td>Revisão por pares multi-agente com verificações de consistência metodologia-evidência</td></tr>
-<tr><td>🧬</td><td><code>evolution/</code></td><td>Lições de autoaprendizagem extraídas de cada execução</td></tr>
-<tr><td>📦</td><td><code>deliverables/</code></td><td>Todas as saídas finais em uma pasta — pronto para compilar no Overleaf</td></tr>
-</table>
-
-O pipeline roda **de ponta a ponta** — totalmente autônomo ou com colaboração human-in-the-loop. Quando experimentos falham, ele se auto-repara. Quando hipóteses não se sustentam, ele pivota. Quando citações são falsas, ele as elimina. Quando você quer direcionar, ele pausa e escuta.
-
-🌍 **Execute em qualquer lugar.** O AutoResearchClaw não está preso a uma única plataforma. Use-o de forma independente via CLI, conecte-o ao [OpenClaw](https://github.com/openclaw/openclaw), ou integre-o com qualquer agente compatível com ACP — 🤖 Claude Code, 💻 Codex CLI, 🐙 Copilot CLI, ♊ Gemini CLI, 🌙 Kimi CLI, e muito mais. Graças à ponte de mensagens do OpenClaw, você pode iniciar uma pesquisa completa pelo 💬 Discord, ✈️ Telegram, 🐦 Lark (飞书), 💚 WeChat, ou qualquer plataforma que sua equipe já utiliza. Um tópico na entrada, um artigo na saída — não importa de onde você digita.
-
----
-
-## 🚀 Início Rápido
-
-```bash
-# 1. Clone & instale
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-python3 -m venv .venv && source .venv/bin/activate
-pip install -e .
-
-# 2. Setup (interativo — instala OpenCode beast mode, verifica Docker/LaTeX)
-researchclaw setup
-
-# 3. Configure
-researchclaw init          # Interativo: escolha provedor LLM, cria config.arc.yaml
-# Ou manualmente: cp config.researchclaw.example.yaml config.arc.yaml
-
-# 4. Execute
-export OPENAI_API_KEY="sk-..."
-researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve
-```
-
-Saída → `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` — LaTeX, BibTeX, código de experimentos, gráficos prontos para compilação.
-
-<details>
-<summary>📝 Configuração mínima necessária</summary>
-
-```yaml
-project:
-  name: "my-research"
-
-research:
-  topic: "Your research topic here"
-
-llm:
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-4o"
-  fallback_models: ["gpt-4o-mini"]
-
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python"
-```
-
-</details>
-
----
-
-## 🧠 O Que o Torna Diferente
-
-| Capacidade | Como Funciona |
-|-----------|-------------|
-| **🧑‍✈️ Modo Co-Piloto** | 6 modos de intervenção — de totalmente autônomo a passo a passo. Guie a IA em decisões críticas (hipóteses, baselines, escrita do artigo) ou deixe-a executar livremente. SmartPause detecta automaticamente quando a contribuição humana ajudaria. |
-| **🔄 Loop PIVOT / REFINE** | O Estágio 15 decide autonomamente: PROCEED, REFINE (ajustar parâmetros) ou PIVOT (nova direção). Artefatos versionados automaticamente. |
-| **🤖 Debate Multi-Agente** | Geração de hipóteses, análise de resultados e revisão por pares usam debate estruturado com múltiplas perspectivas. |
-| **🧬 Autoaprendizagem** | Lições extraídas por execução (justificativa de decisões, avisos de runtime, anomalias em métricas) com decaimento temporal de 30 dias. Execuções futuras aprendem com erros passados. |
-| **📚 Base de Conhecimento** | Cada execução constrói uma KB estruturada com 6 categorias (decisões, experimentos, descobertas, literatura, questões, revisões). |
-| **🛡️ Sentinel Watchdog** | Monitor de qualidade em segundo plano: detecção de NaN/Inf, consistência artigo-evidência, pontuação de relevância de citações, guarda anti-fabricação. |
-| **🔍 Verificação de Afirmações** | Verificação de fatos inline: extrai afirmações do texto gerado por IA e cruza referências com a literatura coletada. Sinaliza citações infundadas e números fabricados. |
-| **🌿 Exploração de Ramificações** | Bifurque o pipeline para explorar múltiplas direções de pesquisa simultaneamente, compare resultados lado a lado e mescle o melhor caminho. |
-
----
-
-## 🦞 Integração OpenClaw
-
-<table>
-<tr>
-
-**AutoResearchClaw é um serviço compatível com [OpenClaw](https://github.com/openclaw/openclaw).** Instale-o no OpenClaw e inicie pesquisa autônoma com uma única mensagem — ou use-o de forma independente via CLI, Claude Code ou qualquer assistente de codificação IA.
-
-</tr>
-</table>
-
-### 🚀 Usar com OpenClaw (Recomendado)
-
-Se você já usa o [OpenClaw](https://github.com/openclaw/openclaw) como seu assistente de IA:
-
-```
-1️⃣  Compartilhe a URL do repositório GitHub com o OpenClaw
-2️⃣  O OpenClaw lê automaticamente RESEARCHCLAW_AGENTS.MD → entende o pipeline
-3️⃣  Diga: "Pesquise [seu tópico]"
-4️⃣  Pronto — o OpenClaw clona, instala, configura, executa e retorna os resultados
-```
-
-**É isso.** O OpenClaw gerencia `git clone`, `pip install`, configuração e execução do pipeline automaticamente. Você apenas conversa.
-
-<details>
-<summary>💡 O que acontece por baixo dos panos</summary>
-
-1. O OpenClaw lê `RESEARCHCLAW_AGENTS.md` → aprende o papel de orquestrador de pesquisa
-2. O OpenClaw lê `README.md` → entende a instalação e estrutura do pipeline
-3. O OpenClaw copia `config.researchclaw.example.yaml` → `config.yaml`
-4. Solicita sua chave de API do LLM (ou usa sua variável de ambiente)
-5. Executa `pip install -e .` + `researchclaw run --topic "..." --auto-approve`
-6. Retorna o artigo, LaTeX, experimentos e citações
-
-</details>
-
-### 🔌 Bridge OpenClaw (Avançado)
-
-Para integração mais profunda, o AutoResearchClaw inclui um **sistema de adaptadores bridge** com 6 capacidades opcionais:
-
-```yaml
-# config.arc.yaml
-openclaw_bridge:
-  use_cron: true              # ⏰ Execuções de pesquisa agendadas
-  use_message: true           # 💬 Notificações de progresso (Discord/Slack/Telegram)
-  use_memory: true            # 🧠 Persistência de conhecimento entre sessões
-  use_sessions_spawn: true    # 🔀 Criar sub-sessões paralelas para estágios concorrentes
-  use_web_fetch: true         # 🌐 Busca web ao vivo durante revisão de literatura
-  use_browser: false          # 🖥️ Coleta de artigos baseada em navegador
-```
-
-Cada flag ativa um protocolo de adaptador tipado. Quando o OpenClaw fornece essas capacidades, os adaptadores as consomem sem alterações no código. Consulte [`integration-guide.md`](integration-guide.md) para detalhes completos.
-
-### ACP (Agent Client Protocol)
-
-O AutoResearchClaw pode usar **qualquer agente de codificação compatível com ACP** como seu backend LLM — sem necessidade de chaves de API. O agente se comunica via [acpx](https://github.com/openclaw/acpx), mantendo uma única sessão persistente ao longo de todos os 23 estágios do pipeline.
-
-| Agente | Comando | Notas |
-|-------|---------|-------|
-| Claude Code | `claude` | Anthropic |
-| Codex CLI | `codex` | OpenAI |
-| Copilot CLI | `gh` | GitHub |
-| Gemini CLI | `gemini` | Google |
-| OpenCode | `opencode` | SST |
-| Kimi CLI | `kimi` | Moonshot |
-
-```yaml
-# config.yaml — exemplo ACP
-llm:
-  provider: "acp"
-  acp:
-    agent: "claude"   # Qualquer comando CLI de agente compatível com ACP
-    cwd: "."          # Diretório de trabalho para o agente
-  # Sem base_url ou api_key necessários — o agente gerencia sua própria autenticação.
-```
-
-```bash
-# Basta executar — o agente usa suas próprias credenciais
-researchclaw run --config config.yaml --topic "Your research idea" --auto-approve
-```
-
-### 🛠️ Outras Formas de Executar
-
-| Método | Como |
-|--------|------|
-| **CLI Independente** | `researchclaw run --topic "..." --auto-approve` (autônomo) ou `--mode co-pilot` (colaborativo) |
-| **API Python** | `from researchclaw.pipeline import Runner; Runner(config).run()` |
-| **Claude Code** | Lê `RESEARCHCLAW_CLAUDE.md` — basta dizer *"Execute pesquisa sobre [tópico]"* |
-| **Copilot CLI** | `researchclaw run --topic "..."` com `llm.acp.agent: "gh"` |
-| **OpenCode** | Lê `.claude/skills/` — mesma interface em linguagem natural |
-| **Qualquer CLI de IA** | Forneça `RESEARCHCLAW_AGENTS.md` como contexto → o agente faz bootstrap automaticamente |
-
----
-
-## 🔬 Pipeline: 23 Estágios, 8 Fases
-
-```
-Fase A: Escopo da Pesquisa           Fase E: Execução de Experimentos
-  1. TOPIC_INIT                        12. EXPERIMENT_RUN
-  2. PROBLEM_DECOMPOSE                 13. ITERATIVE_REFINE  ← auto-reparo
-
-Fase B: Descoberta de Literatura     Fase F: Análise & Decisão
-  3. SEARCH_STRATEGY                   14. RESULT_ANALYSIS    ← multi-agente
-  4. LITERATURE_COLLECT  ← API real    15. RESEARCH_DECISION  ← PIVOT/REFINE
-  5. LITERATURE_SCREEN   [gate]
-  6. KNOWLEDGE_EXTRACT                 Fase G: Escrita do Artigo
-                                       16. PAPER_OUTLINE
-Fase C: Síntese de Conhecimento       17. PAPER_DRAFT
-  7. SYNTHESIS                         18. PEER_REVIEW        ← verif. evidência
-  8. HYPOTHESIS_GEN    ← debate        19. PAPER_REVISION
-
-Fase D: Design de Experimentos      Fase H: Finalização
-  9. EXPERIMENT_DESIGN   [gate]        20. QUALITY_GATE      [gate]
- 10. CODE_GENERATION                   21. KNOWLEDGE_ARCHIVE
- 11. RESOURCE_PLANNING                 22. EXPORT_PUBLISH     ← LaTeX
-                                       23. CITATION_VERIFY    ← verif. relevância
-```
-
-> **Estágios gate** (5, 9, 20) pausam para aprovação humana ou aprovam automaticamente com `--auto-approve`. Em caso de rejeição, o pipeline faz rollback.
-
-> **Modo Co-Piloto** (`--mode co-pilot`): Colaboração profunda humano-IA nos Estágios 7-8 (Idea Workshop), Estágio 9 (Baseline Navigator) e Estágios 16-17 (Paper Co-Writer). Os outros estágios executam automaticamente com monitoramento SmartPause.
-
-> **Loops de decisão**: O Estágio 15 pode acionar REFINE (→ Estágio 13) ou PIVOT (→ Estágio 8), com versionamento automático de artefatos.
-
-<details>
-<summary>📋 O Que Cada Fase Faz</summary>
-
-| Fase | O Que Acontece |
-|------|----------------|
-| **A: Escopo** | O LLM decompõe o tópico em uma árvore de problemas estruturada com questões de pesquisa |
-| **A+: Hardware** | Detecta automaticamente GPU (NVIDIA CUDA / Apple MPS / apenas CPU), avisa se o hardware local é limitado, adapta a geração de código adequadamente |
-| **B: Literatura** | Busca multi-fonte (OpenAlex → Semantic Scholar → arXiv) por artigos reais, triagem por relevância, extração de fichas de conhecimento |
-| **C: Síntese** | Agrupa descobertas, identifica lacunas de pesquisa, gera hipóteses testáveis via debate multi-agente |
-| **D: Design** | Projeta plano de experimento, gera Python executável com consciência de hardware (tier de GPU → seleção de pacotes), estima necessidades de recursos |
-| **E: Execução** | Executa experimentos em sandbox, detecta NaN/Inf e bugs de runtime, auto-repara código via reparo direcionado por LLM |
-| **F: Análise** | Análise multi-agente dos resultados; decisão autônoma PROCEED / REFINE / PIVOT com justificativa |
-| **G: Escrita** | Outline → redação seção por seção (5.000-6.500 palavras) → revisão por pares (com consistência metodologia-evidência) → revisão com guarda de tamanho |
-| **H: Finalização** | Quality gate, arquivamento de conhecimento, exportação LaTeX com template de conferência, verificação de integridade + relevância de citações |
-
-</details>
-
----
-
-## ✨ Funcionalidades Principais
-
-| Funcionalidade | Descrição |
-|---------|------------|
-| **📚 Literatura Multi-Fonte** | Artigos reais do OpenAlex, Semantic Scholar & arXiv — expansão de consultas, deduplicação, circuit breaker com degradação graciosa |
-| **🔍 Verificação de Citações em 4 Camadas** | Verificação de arXiv ID → CrossRef/DataCite DOI → correspondência de título no Semantic Scholar → pontuação de relevância por LLM. Referências alucinadas removidas automaticamente. |
-| **🖥️ Execução com Consciência de Hardware** | Detecta automaticamente GPU (NVIDIA CUDA / Apple MPS / apenas CPU) e adapta geração de código, imports e escala de experimentos |
-| **🦾 OpenCode Beast Mode** | Experimentos complexos roteados automaticamente para o [OpenCode](https://github.com/anomalyco/opencode) — gera projetos multi-arquivo com arquiteturas customizadas, loops de treinamento e estudos de ablação. Instale via `researchclaw setup`. |
-| **🧪 Experimentos em Sandbox** | Código validado por AST, harness imutável, fast-fail para NaN/Inf, reparo auto-reparável, refinamento iterativo (até 10 rodadas), captura de resultados parciais |
-| **📝 Escrita com Qualidade de Conferência** | Templates NeurIPS/ICML/ICLR, redação seção por seção (5.000-6.500 palavras), guarda anti-fabricação, guarda de tamanho na revisão, imposição anti-disclaimer |
-| **📐 Troca de Template** | `neurips_2025`, `iclr_2026`, `icml_2026` — Markdown → LaTeX com matemática, tabelas, figuras, referências cruzadas, `\cite{}` |
-| **🛡️ Anti-Fabricação** | VerifiedRegistry impõe dados experimentais reais nos artigos. Diagnostica automaticamente experimentos falhados e os repara antes da escrita. Números não verificados são sanitizados. |
-| **🚦 Quality Gates** | 3 gates com human-in-the-loop (Estágios 5, 9, 20) com rollback. Pule com `--auto-approve`. |
-| **🧑‍✈️ Co-Piloto HITL** | 6 modos de intervenção com políticas por estágio. Idea Workshop, Baseline Navigator, Paper Co-Writer para colaboração profunda. SmartPause, guardrails de custo, políticas de escalação e aprendizado de intervenção para segurança em produção. Adaptadores CLI/WebSocket/MCP. |
-| **💰 Guardrails de Custo** | Monitoramento de orçamento com alertas de limite configuráveis (50%/80%/100%). O pipeline pausa automaticamente quando o custo excede o orçamento. |
-| **🔐 Reprodutibilidade** | Checksums SHA256 para todos os artefatos de estágio. Manifestos imutáveis para verificação. Undo multi-nível com snapshots versionados. |
-
----
-
-## 🧑‍✈️ Co-Piloto Human-in-the-Loop
-
-**O AutoResearchClaw v0.4.0 introduz um sistema Human-in-the-Loop (HITL) completo** que transforma o pipeline de puramente autônomo para um motor de pesquisa colaborativa humano-IA. Escolha seu nível de envolvimento:
-
-### Modos de Intervenção
-
-| Modo | Comando | O Que Faz |
-|------|---------|-----------|
-| **Full Auto** | `--auto-approve` | Comportamento original — sem intervenção humana |
-| **Gate Only** | `--mode gate-only` | Pausa nos 3 estágios gate (5, 9, 20) para aprovação |
-| **Checkpoint** | `--mode checkpoint` | Pausa em cada fronteira de fase (8 checkpoints) |
-| **Co-Pilot** | `--mode co-pilot` | Colaboração profunda em estágios críticos, automático nos demais |
-| **Step-by-Step** | `--mode step-by-step` | Pausa após cada estágio — aprenda o pipeline |
-| **Express** | `--mode express` | Revisão rápida — apenas os 3 gates mais críticos |
-
-### Fluxo de Trabalho do Co-Piloto
-
-```
-Você: researchclaw run --topic "Ruído quântico como regularização de redes neurais" --mode co-pilot
-
-Pipeline executa Estágios 1-7 automaticamente...
-
-  ┌─────────────────────────────────────────────────────────────┐
-  │  HITL | Estágio 08: HYPOTHESIS_GEN                          │
-  │  Revisão pós-estágio                                        │
-  │                                                             │
-  │  Hipóteses mencionadas: 3                                   │
-  │  Pontuação de novidade: 0.72 (moderada)                     │
-  │                                                             │
-  │  [a] Aprovar  [r] Rejeitar  [e] Editar  [c] Colaborar      │
-  │  [i] Injetar orientação  [v] Ver saída  [q] Abortar         │
-  └─────────────────────────────────────────────────────────────┘
-
-Você: c  (iniciar chat colaborativo)
-Você: Hipótese 3 é interessante mas precisa de Dropout/Label Smoothing como baselines
-IA:   Atualizado — adicionei Dropout, Label Smoothing, MixUp, CutMix como baselines...
-Você: approve
-
-Pipeline continua com sua hipótese refinada...
-```
-
-### Comandos CLI
-
-```bash
-# Iniciar com modo HITL
-researchclaw run --topic "..." --mode co-pilot
-
-# Anexar a um pipeline pausado (de outro terminal)
-researchclaw attach artifacts/rc-2026-xxx
-
-# Verificar status do pipeline e HITL
-researchclaw status artifacts/rc-2026-xxx
-
-# Aprovar/rejeitar de outro terminal ou script
-researchclaw approve artifacts/rc-2026-xxx --message "LGTM"
-researchclaw reject artifacts/rc-2026-xxx --reason "Baseline chave faltando"
-
-# Injetar orientação para um estágio (mesmo antes de ele executar)
-researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "Usar ResNet-50 como baseline principal"
-```
-
-### Capacidades Principais
-
-| Funcionalidade | Descrição |
-|---------|------------|
-| **Idea Workshop** | Brainstorm, avalie e refine hipóteses de forma colaborativa (Estágio 7-8) |
-| **Baseline Navigator** | IA sugere baselines + humano adiciona/remove + checklist de reprodutibilidade (Estágio 9) |
-| **Paper Co-Writer** | Redação seção por seção com edição humana e polimento por IA (Estágio 16-19) |
-| **SmartPause** | Pausa dinâmica baseada em confiança — detecta automaticamente quando a contribuição humana ajudaria |
-| **Verificação de Afirmações** | Verificação de fatos inline contra a literatura coletada — sinaliza afirmações infundadas |
-| **Guardrails de Custo** | Monitoramento de orçamento com alertas de limite 50%/80%/100% |
-| **Aprendizado de Intervenção** | ALHF — aprende com seus padrões de revisão para otimizar futuras decisões de pausa |
-| **Exploração de Ramificações** | Bifurque o pipeline para explorar múltiplas hipóteses, compare e mescle a melhor |
-| **Política de Escalação** | Notificação em camadas (terminal → Slack → email → auto-halt) quando desacompanhado |
-| **3 Adaptadores** | CLI (terminal), WebSocket (dashboard web), MCP (agentes externos) |
-
-### Configuração
-
-```yaml
-# config.arc.yaml
-hitl:
-  enabled: true
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | co-pilot | custom
-  cost_budget_usd: 50.0              # Pausar quando custo exceder orçamento (0 = sem limite)
-
-  notifications:
-    on_pause: true
-    on_quality_drop: true
-    channels: ["terminal"]            # terminal | slack | webhook
-
-  timeouts:
-    default_human_timeout_sec: 86400  # Espera padrão de 24h
-    auto_proceed_on_timeout: false
-
-  collaboration:
-    max_chat_turns: 50
-    save_chat_history: true
-
-  # Políticas customizadas por estágio (opcional, para modo 'custom')
-  stage_policies:
-    8: { require_approval: true, enable_collaboration: true }
-    9: { require_approval: true, allow_edit_output: true }
-```
-
-### Compatibilidade Retroativa
-
-- **Padrão: DESATIVADO.** Sem `hitl.enabled: true` ou `--mode`, o pipeline funciona exatamente como antes.
-- **`--auto-approve` ainda funciona.** Ele sobrescreve o modo HITL.
-- **Todos os 2.699 testes existentes passam** com o código HITL presente.
-
----
-
-## 🧠 Integração MetaClaw
-
-**AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = Um pipeline que aprende com cada execução.**
-
-MetaClaw adiciona **transferência de conhecimento entre execuções** ao AutoResearchClaw. Quando ativado, o pipeline captura automaticamente lições de falhas e avisos, converte-as em habilidades reutilizáveis e injeta essas habilidades em todos os 23 estágios do pipeline em execuções subsequentes — para que os mesmos erros nunca se repitam.
-
-### Como Funciona
-
-```
-Run N executa → falhas/avisos capturados como Lessons
-                      ↓
-          MetaClaw Lesson → conversão em Skill
-                      ↓
-          Arquivos arc-* Skill armazenados em ~/.metaclaw/skills/
-                      ↓
-Run N+1 → build_overlay() injeta skills em cada prompt LLM
-                      ↓
-          LLM evita armadilhas conhecidas → maior qualidade, menos retentativas
-```
-
-### Configuração Rápida
-
-```bash
-# 1. Instale o MetaClaw (se ainda não tiver)
-pip install metaclaw
-
-# 2. Ative na sua configuração
-```
-
-```yaml
-# config.arc.yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000"        # Proxy MetaClaw (opcional)
-  skills_dir: "~/.metaclaw/skills"          # Onde as skills são armazenadas
-  fallback_url: "https://api.openai.com/v1" # Fallback direto para LLM
-  fallback_api_key: ""                      # Chave de API para URL de fallback
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # Converte warnings + errors
-    max_skills_per_run: 3
-```
-
-```bash
-# 3. Execute normalmente — MetaClaw funciona de forma transparente
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-```
-
-Após cada execução, verifique `~/.metaclaw/skills/arc-*/SKILL.md` para ver as skills que seu pipeline aprendeu.
-
-### Resultados dos Experimentos
-
-Em experimentos A/B controlados (mesmo tópico, mesmo LLM, mesma configuração):
-
-| Métrica | Baseline | Com MetaClaw | Melhoria |
-|---------|----------|---------------|----------|
-| Taxa de retentativa por estágio | 10.5% | 7.9% | **-24.8%** |
-| Contagem de ciclos REFINE | 2.0 | 1.2 | **-40.0%** |
-| Conclusão de estágios do pipeline | 18/19 | 19/19 | **+5.3%** |
-| Pontuação de robustez geral (composta) | 0.714 | 0.845 | **+18.3%** |
-
-> A pontuação composta de robustez é uma média ponderada da taxa de conclusão de estágios (40%), redução de retentativas (30%) e eficiência de ciclos REFINE (30%).
-
-### Compatibilidade Retroativa
-
-- **Padrão: DESATIVADO.** Se `metaclaw_bridge` estiver ausente ou `enabled: false`, o pipeline funciona exatamente como antes.
-- **Sem novas dependências.** MetaClaw é opcional — o pipeline principal funciona sem ele.
-- **Todos os 2.699 testes existentes passam** com o código de integração presente.
-
----
-
-## 🧩 Biblioteca de Skills
-
-O AutoResearchClaw agora suporta o carregamento de **skills open-source e customizadas** para aprimorar ainda mais sua experiência de pesquisa. Também incluímos **20 skills integradas pré-carregadas** (escrita científica, busca de literatura, química, biologia e mais) como referências prontas para uso, oferecendo um alto grau de flexibilidade desde o início. Desabilite qualquer skill adicionando `enabled: false` ao seu frontmatter.
-
-**Exemplos de skills integradas:**
-
-| Categoria | Skill | Descrição |
-|-----------|-------|-----------|
-| **Escrita** | `scientific-writing` | Estrutura IMRAD, formatação de citações, diretrizes de relatórios |
-| **Domínio** | `chemistry-rdkit` | Análise molecular, SMILES, fingerprints, descoberta de fármacos |
-| **Experimento** | `literature-search` | Revisão sistemática, metodologia PRISMA |
-
-> Veja todas as 20 skills com `researchclaw skills list`.
-
-### Carregue Suas Próprias Skills
-
-```bash
-# Opção 1: Instalar uma skill (persiste entre projetos)
-researchclaw skills install /path/to/my-skill/
-
-# Opção 2: Coloque um SKILL.md no projeto
-mkdir -p .claude/skills/my-custom-skill
-# Depois crie um SKILL.md com frontmatter YAML (name, description, trigger-keywords, applicable-stages)
-
-# Opção 3: Configure diretórios compartilhados de skills no config.arc.yaml
-# skills:
-#   custom_dirs:
-#     - /path/to/team-shared-skills
-```
-
-### Usando Skills
-
-Skills são carregadas e injetadas nos prompts LLM automaticamente — sem ativação manual necessária. Use o CLI para inspecionar:
-
-```bash
-researchclaw skills list               # Mostra todas as skills carregadas com fontes
-researchclaw skills validate ./my-skill # Verifica formato do SKILL.md
-```
-
-Explore skills da comunidade: [K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills) (150+ skills científicas em múltiplas disciplinas).
-
----
-
-## ⚙️ Referência de Configuração
-
-<details>
-<summary>Clique para expandir a referência completa de configuração</summary>
-
-```yaml
-# === Projeto ===
-project:
-  name: "my-research"              # Identificador do projeto
-  mode: "docs-first"               # docs-first | semi-auto | full-auto
-
-# === Pesquisa ===
-research:
-  topic: "..."                     # Tópico de pesquisa (obrigatório)
-  domains: ["ml", "nlp"]           # Domínios de pesquisa para busca de literatura
-  daily_paper_count: 8             # Artigos alvo por consulta de busca
-  quality_threshold: 4.0           # Pontuação mínima de qualidade para artigos
-
-# === Runtime ===
-runtime:
-  timezone: "America/New_York"     # Para timestamps
-  max_parallel_tasks: 3            # Limite de experimentos concorrentes
-  approval_timeout_hours: 12       # Timeout de estágios gate
-  retry_limit: 2                   # Contagem de retentativas em falha de estágio
-
-# === LLM ===
-llm:
-  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
-  base_url: "https://..."          # Endpoint da API (obrigatório para openai-compatible)
-  api_key_env: "OPENAI_API_KEY"    # Variável de ambiente para chave da API (obrigatório para openai-compatible)
-  api_key: ""                      # Ou insira a chave diretamente aqui
-  primary_model: "gpt-4o"          # Modelo primário
-  fallback_models: ["gpt-4o-mini"] # Cadeia de fallback
-  s2_api_key: ""                   # Chave API do Semantic Scholar (opcional, limites de taxa maiores)
-  acp:                             # Usado apenas quando provider: "acp"
-    agent: "claude"                # Comando CLI do agente ACP (claude, codex, gemini, etc.)
-    cwd: "."                       # Diretório de trabalho para o agente
-
-# === Experimento ===
-experiment:
-  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
-  time_budget_sec: 300             # Tempo máximo de execução por run (padrão: 300s)
-  max_iterations: 10               # Máximo de iterações de otimização
-  metric_key: "val_loss"           # Nome da métrica primária
-  metric_direction: "minimize"     # minimize | maximize
-  sandbox:
-    python_path: ".venv/bin/python"
-    gpu_required: false
-    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
-    max_memory_mb: 4096
-  docker:
-    image: "researchclaw/experiment:latest"
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    auto_install_deps: true        # Detecção automática de imports → requirements.txt
-  ssh_remote:
-    host: ""                       # Hostname do servidor GPU
-    gpu_ids: []                    # IDs de GPU disponíveis
-    remote_workdir: "/tmp/researchclaw_experiments"
-  opencode:                          # OpenCode Beast Mode (auto-instalado via `researchclaw setup`)
-    enabled: true                    # Interruptor principal (padrão: true)
-    auto: true                       # Acionamento automático sem confirmação (padrão: true)
-    complexity_threshold: 0.2        # 0.0-1.0 — maior = só aciona em experimentos complexos
-    model: ""                        # Modelo override (vazio = usa llm.primary_model)
-    timeout_sec: 600                 # Máximo de segundos para geração OpenCode
-    max_retries: 1                   # Contagem de retentativas em falha
-    workspace_cleanup: true          # Remove workspace temporário após coleta
-  code_agent:                        # CodeAgent v2 — geração de código multi-fase
-    enabled: true                    # Usar CodeAgent em vez da geração legada de prompt único
-    architecture_planning: true      # Gerar blueprint detalhado de implementação antes de codificar
-    sequential_generation: true      # Gerar arquivos um a um seguindo o DAG de dependências
-    hard_validation: true            # Validação baseada em AST (bloqueia ablações idênticas, métricas hardcoded)
-    hard_validation_max_repairs: 2   # Máximo de tentativas de reparo quando validação falha
-    exec_fix_max_iterations: 3       # Tentativas de correção na execução
-    exec_fix_timeout_sec: 60         # Timeout por tentativa de correção
-  benchmark_agent:                   # BenchmarkAgent — seleção automatizada de datasets e baselines
-    enabled: true                    # Ativar pipeline de 4 agentes (Surveyor→Selector→Acquirer→Validator)
-    enable_hf_search: true           # Buscar em HuggingFace Datasets
-    enable_web_search: true          # Buscar benchmarks no Google Scholar
-    tier_limit: 2                    # Filtragem de tier de datasets (1=pequeno/cache, 2=médio, 3=grande)
-    min_benchmarks: 1                # Mínimo de datasets necessários
-    min_baselines: 2                 # Mínimo de métodos baseline necessários
-  figure_agent:                      # FigureAgent — geração de figuras acadêmicas
-    enabled: true                    # Ativar pipeline de 5 agentes (Planner→CodeGen→Renderer→Critic→Integrator)
-    min_figures: 3                   # Mínimo de figuras a gerar
-    max_figures: 8                   # Máximo de figuras
-    max_iterations: 3                # Iterações de refinamento via Critic
-    dpi: 300                         # Resolução de saída
-    strict_mode: false               # Falhar pipeline se geração de figuras falhar
-  repair:                            # Anti-fabricação — reparo de experimentos
-    enabled: true                    # Auto-diagnosticar e reparar experimentos falhados
-    max_cycles: 3                    # Ciclos de reparo
-    min_completion_rate: 0.5         # >=50% das condições devem completar para prosseguir
-    min_conditions: 2                # Mínimo de 2 condições para experimento válido
-    use_opencode: true               # Rotear reparos pelo OpenCode Beast Mode
-
-# === Busca Web (Opcional) ===
-web_search:
-  enabled: true                      # Ativar busca de literatura com web
-  tavily_api_key_env: "TAVILY_API_KEY"  # Variável de ambiente para chave Tavily API (opcional)
-  enable_scholar: true               # Busca no Google Scholar
-  enable_pdf_extraction: true        # Extrair texto de PDFs
-  max_web_results: 10                # Máximo de resultados web por consulta
-
-# === Exportação ===
-export:
-  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
-  authors: "Anonymous"
-  bib_file: "references"
-
-# === Prompts ===
-prompts:
-  custom_file: ""                  # Caminho para YAML de prompts customizados (vazio = padrões)
-
-# === Co-Piloto HITL (NOVO no v0.4.0) ===
-hitl:
-  enabled: false                     # Defina como true para ativar HITL
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
-  cost_budget_usd: 0.0              # Limite de custo em USD (0 = sem limite)
-  notifications:
-    on_pause: true                   # Notificar quando o pipeline pausar
-    on_quality_drop: true            # Notificar em problemas de qualidade
-    channels: ["terminal"]           # terminal | slack | webhook
-  timeouts:
-    default_human_timeout_sec: 86400 # Esperar até 24h por entrada humana
-    auto_proceed_on_timeout: false   # Se true, auto-aprovar no timeout
-  collaboration:
-    max_chat_turns: 50               # Máximo de turnos por sessão de colaboração
-    save_chat_history: true          # Persistir logs de chat
-  stage_policies: {}                 # Overrides por estágio (para modo 'custom')
-
-# === Segurança ===
-security:
-  hitl_required_stages: [5, 9, 20] # Estágios que requerem aprovação humana
-  allow_publish_without_approval: false
-  redact_sensitive_logs: true
-
-# === Base de Conhecimento ===
-knowledge_base:
-  backend: "markdown"              # markdown | obsidian
-  root: "docs/kb"
-
-# === Notificações ===
-notifications:
-  channel: "console"               # console | discord | slack
-  target: ""
-
-# === MetaClaw Bridge (Opcional) ===
-metaclaw_bridge:
-  enabled: false                   # Defina como true para ativar aprendizado entre execuções
-  proxy_url: "http://localhost:30000"  # URL do proxy MetaClaw
-  skills_dir: "~/.metaclaw/skills" # Onde as skills arc-* são armazenadas
-  fallback_url: ""                 # Fallback direto para LLM quando o proxy está fora
-  fallback_api_key: ""             # Chave de API para endpoint de fallback
-  lesson_to_skill:
-    enabled: true                  # Auto-converter lições em skills
-    min_severity: "warning"        # Severidade mínima para converter
-    max_skills_per_run: 3          # Máximo de novas skills por execução do pipeline
-  prm:                             # Process Reward Model quality gate (opcional)
-    enabled: false                 # Usar LLM-as-judge para pontuar saídas de estágio
-    model: "gpt-5.4"              # Modelo juiz PRM
-    votes: 3                       # Contagem de votos por maioria
-    gate_stages: [5, 9, 15, 20]   # Estágios onde aplicar gates PRM
-
-# === Bridge OpenClaw ===
-openclaw_bridge:
-  use_cron: false                  # Execuções de pesquisa agendadas
-  use_message: false               # Notificações de progresso
-  use_memory: false                # Persistência de conhecimento entre sessões
-  use_sessions_spawn: false        # Criar sub-sessões paralelas
-  use_web_fetch: false             # Busca web ao vivo
-  use_browser: false               # Coleta de artigos baseada em navegador
-```
-
-</details>
-
----
-
-## 🙏 Agradecimentos
-
-Inspirado por:
-
-- 🔬 [AI Scientist](https://github.com/SakanaAI/AI-Scientist) (Sakana AI) — Pioneiro em pesquisa automatizada
-- 🧠 [AutoResearch](https://github.com/karpathy/autoresearch) (Andrej Karpathy) — Automação de pesquisa de ponta a ponta
-- 🌐 [FARS](https://analemma.ai/blog/introducing-fars/) (Analemma) — Fully Automated Research System
-
----
-
-## 📄 Licença
-
-MIT — veja [LICENSE](../LICENSE) para detalhes.
-
----
-
-## 📌 Citação
-
-Se você achar o AutoResearchClaw útil, por favor cite:
-
-```bibtex
-@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
-      title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
-      author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
-      year={2026},
-      eprint={2605.20025},
-      archivePrefix={arXiv},
-      primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2605.20025},
-}
-```
-
-<p align="center">
-  <sub>Construído com 🦞 pela equipe AutoResearchClaw</sub>
-</p>
diff --git a/docs/README_RU.md b/docs/README_RU.md
deleted file mode 100644
index edcf1759..00000000
--- a/docs/README_RU.md
+++ /dev/null
@@ -1,786 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="700" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center"><b>Напишите идею. Получите статью. Автономный, Совместный & Самоэволюционирующий.</b></h2>
-
-<p align="center">
-  <b><i><font size="5">Просто напишите <a href="#-интеграция-с-openclaw">OpenClaw</a>: «Исследуй X» → готово.</font></i></b>
-</p>
-
-<p align="center">
-  📄 <b>Наша статья доступна на arXiv — обязательно почитайте!</b> <a href="https://arxiv.org/abs/2605.20025"><i>AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration</i></a>
-</p>
-
-<p align="center">
-  <img src="../image/framework_v2.png" width="100%" alt="AutoResearchClaw Framework">
-</p>
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2605.20025"><img src="https://img.shields.io/badge/arXiv-2605.20025-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"></a>
-  <a href="https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ARC--Bench-yellow" alt="ARC-Bench on Hugging Face"></a>
-  <a href="../LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT License"></a>
-  <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?logo=python&logoColor=white" alt="Python 3.11+"></a>
-  <a href="#тестирование"><img src="https://img.shields.io/badge/Tests-2699%20passed-brightgreen?logo=pytest&logoColor=white" alt="2699 Tests Passed"></a>
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/GitHub-AutoResearchClaw-181717?logo=github" alt="GitHub"></a>
-  <a href="#-интеграция-с-openclaw"><img src="https://img.shields.io/badge/OpenClaw-Compatible-ff4444?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6IiBmaWxsPSJ3aGl0ZSIvPjwvc3ZnPg==" alt="OpenClaw Compatible"></a>
-  <a href="https://discord.gg/u4ksqW5P"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
-</p>
-
-<p align="center">
-  <a href="../README.md">🇺🇸 English</a> ·
-  <a href="README_CN.md">🇨🇳 中文</a> ·
-  <a href="README_JA.md">🇯🇵 日本語</a> ·
-  <a href="README_KO.md">🇰🇷 한국어</a> ·
-  <a href="README_FR.md">🇫🇷 Français</a> ·
-  <a href="README_DE.md">🇩🇪 Deutsch</a> ·
-  <a href="README_ES.md">🇪🇸 Español</a> ·
-  <a href="README_PT.md">🇧🇷 Português</a> ·
-  <a href="README_RU.md">🇷🇺 Русский</a> ·
-  <a href="README_AR.md">🇸🇦 العربية</a>
-</p>
-
-<p align="center">
-  <a href="showcase/SHOWCASE.md">🏆 Галерея статей</a> · <a href="HITL_GUIDE.md">🧑‍✈️ Руководство Co-Pilot</a> · <a href="integration-guide.md">📖 Руководство по интеграции</a> · <a href="https://discord.gg/u4ksqW5P">💬 Сообщество в Discord</a>
-</p>
-
----
-
-<table>
-<tr>
-<td width="18%">
-<a href="showcase/SHOWCASE.md"><img src="showcase/thumbnails/paper_I_random_matrix-01.png" width="120" alt="Пример статьи"/></a>
-</td>
-<td valign="middle">
-<b>🏆 Галерея сгенерированных статей</b><br><br>
-<b>8 статей в 8 областях</b> — математика, статистика, биология, информатика, NLP, RL, компьютерное зрение, робастность — сгенерированы полностью автономно или с направляющим участием Human-in-the-Loop Co-Pilot.<br><br>
-<a href="showcase/SHOWCASE.md"><img src="https://img.shields.io/badge/Посмотреть_галерею_→-Все_8_статей-d73a49?style=for-the-badge" alt="Посмотреть галерею"></a>
-</td>
-</tr>
-</table>
-
----
-
-> **🧪 Мы ищем тестировщиков!** Попробуйте запустить пайплайн со своей исследовательской идеей из любой области и [расскажите нам о результатах](TESTER_GUIDE.md). Ваш фидбек напрямую влияет на развитие проекта. **[→ Руководство по тестированию](TESTER_GUIDE.md)** | **[→ 中文测试指南](TESTER_GUIDE_CN.md)** | **[→ 日本語テストガイド](TESTER_GUIDE_JA.md)**
-
----
-
-## 🔥 Новости
-- **[19.05.2026]** **v0.5.0** — **Мультидоменные экспериментальные агенты + ARC-Bench** — Два ключевых обновления. **(1) Специализированные агенты выполнения по доменам:** этап экспериментов (этапы 10–13) теперь выходит за рамки стандартной ML-песочницы и направляет задачи профильным агентам — **физика высоких энергий** (ColliderAgent: FeynRules → MadGraph5 → Delphes через облако Magnus), **биология** (полногеномное метаболическое моделирование на COBRApy) и **статистика** (агент имитационных исследований), а химию/материалы покрывает универсальный Docker-исполнитель. Конвейер автоматически выбирает нужный исполнитель по домену исследования. **(2) ARC-Bench:** открытый бенчмарк автономных исследований из **55 тем**, охватывающий **ML (25), физику высоких энергий (10), квантовые вычисления (10), биологию (7) и статистику (3)**; к каждой теме прилагаются манифест и оценочная рубрика (`experiments/arc_bench/`, а также на [🤗 Hugging Face](https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench)). **[→ Руководство по интеграции доменов](DOMAIN_INTEGRATION_GUIDE.md)**
-- **[01.04.2026]** **v0.4.0** — **Система Human-in-the-Loop Co-Pilot** — AutoResearchClaw больше не является чисто автономным. Новая HITL-система добавляет 6 режимов вмешательства (`full-auto`, `gate-only`, `checkpoint`, `step-by-step`, `co-pilot`, `custom`), настраиваемые политики для каждого этапа и глубокое взаимодействие человека и ИИ. Включает: Мастерскую идей для совместного формирования гипотез, Навигатор бейзлайнов для обзора дизайна экспериментов, Совместное написание статьи с Paper Co-Writer, SmartPause (динамическое вмешательство по уровню уверенности), обучение на интервенциях (ALHF), проверку утверждений на антигаллюцинацию, контроль бюджета затрат, ветвление пайплайна для параллельного исследования гипотез и CLI-команды (`attach`/`status`/`approve`/`reject`/`guide`). **[→ Полное руководство HITL](HITL_GUIDE.md)**
-- **[30.03.2026]** **Гибкая загрузка навыков** — AutoResearchClaw теперь поддерживает загрузку открытых и пользовательских навыков из любой дисциплины для расширения исследовательских возможностей. 20 предустановленных навыков включены в качестве готовых примеров — научное письмо, дизайн экспериментов, химия, биология и др., включая навык агентной эволюции [A-Evolve](https://github.com/A-EVO-Lab/a-evolve), предоставленный сообществом. Загружайте свои навыки через `researchclaw skills install` или поместите `SKILL.md` в `.claude/skills/`. См. [Библиотека навыков](#-библиотека-навыков).
-- **[22.03.2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) — **Кроссплатформенная поддержка + крупное обновление стабильности** — AutoResearchClaw теперь работает с любым ACP-совместимым агентом (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) и поддерживает мессенджеры (Discord, Telegram, Lark, WeChat) через мост OpenClaw. Новый CLI-agent бэкенд генерации кода делегирует Stage 10 и 13 внешним CLI-агентам с контролем бюджета и управлением таймаутами. Включает систему защиты от фабрикации (VerifiedRegistry + цикл диагностики и ремонта экспериментов), 100+ исправлений багов, модульный рефакторинг executor, автоопределение `--resume`, усиление повторов LLM и исправления от сообщества.
-
-<details>
-<summary>Предыдущие версии</summary>
-
-- **[18.03.2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) — **OpenCode Beast Mode + Контрибьюты сообщества** — Новый режим "Beast Mode" перенаправляет сложную генерацию кода в [OpenCode](https://github.com/anomalyco/opencode) с автоматической оценкой сложности и безопасным фоллбэком. Добавлена поддержка провайдера Novita AI, улучшена потокобезопасность, повышена надежность парсинга ответов LLM, а также исправлено более 20 багов благодаря PR от сообщества и внутреннему аудиту.
-- **[17.03.2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) — **Интеграция с MetaClaw** — AutoResearchClaw теперь поддерживает кросс-сессионное обучение через [MetaClaw](https://github.com/aiming-lab/MetaClaw): ошибки пайплайна → структурированные уроки → переиспользуемые навыки, которые внедряются во все 23 этапа. Робастность в контролируемых экспериментах выросла на **+18.3%**. Фича опциональна (`metaclaw_bridge.enabled: true`) и полностью обратно совместима. См. [Руководство по интеграции](#-интеграция-с-metaclaw).
-- **[16.03.2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) — Три мультиагентные подсистемы (CodeAgent, BenchmarkAgent, FigureAgent), защищенная Docker-песочница с поддержкой сетевых политик, 4-этапный аудит качества статьи (поиск ИИ-галлюцинаций, оценка по 7 критериям, чек-лист NeurIPS) и более 15 исправлений багов с продакшена.
-- **[15.03.2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) — Релиз AutoResearchClaw: полностью автономный исследовательский пайплайн из 23 этапов, который превращает одну идею в готовую для конференции статью. Без вмешательства человека.
-
-</details>
-
----
-
-## ⚡ Одна команда. Одна статья.
-
-```bash
-# Полностью автономный режим — без участия человека
-pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Ваша исследовательская идея" --auto-approve
-
-# Режим Co-Pilot — совместная работа с ИИ в ключевых точках принятия решений
-researchclaw run --topic "Ваша исследовательская идея" --mode co-pilot
-```
-
----
-
-## 🤔 Что это такое?
-
-**Вы придумываете. AutoResearchClaw пишет. Вы направляете ключевые решения.**
-
-Задайте тему исследования — и получите полноценную академическую статью с реальным обзором литературы из OpenAlex, Semantic Scholar и arXiv, экспериментами в песочнице с учетом вашего железа (автоопределение GPU/MPS/CPU), статистическим анализом, мультиагентным рецензированием и готовым LaTeX-кодом для конференций NeurIPS/ICML/ICLR. Запускайте полностью автономно или используйте **режим Co-Pilot**, чтобы направлять ИИ в критических точках — выбирайте направления исследований, проверяйте дизайн экспериментов и совместно пишите статью. Никаких выдуманных ссылок.
-
-<table>
-<tr><td>📄</td><td><code>paper_draft.md</code></td><td>Полная академическая статья (Введение, Обзор литературы, Метод, Эксперименты, Результаты, Заключение)</td></tr>
-<tr><td>📐</td><td><code>paper.tex</code></td><td>Готовый LaTeX-код (шаблоны NeurIPS / ICLR / ICML)</td></tr>
-<tr><td>📚</td><td><code>references.bib</code></td><td>Реальные BibTeX-ссылки из OpenAlex, Semantic Scholar и arXiv — автоматически отфильтрованные под цитаты в тексте</td></tr>
-<tr><td>🔍</td><td><code>verification_report.json</code></td><td>4-уровневая проверка целостности и релевантности цитирования (arXiv, CrossRef, DataCite, LLM)</td></tr>
-<tr><td>🧪</td><td><code>experiment runs/</code></td><td>Сгенерированный код + результаты из песочницы + структурированные JSON-метрики</td></tr>
-<tr><td>📊</td><td><code>charts/</code></td><td>Автоматически сгенерированные графики сравнения с планками погрешностей и доверительными интервалами</td></tr>
-<tr><td>📝</td><td><code>reviews.md</code></td><td>Мультиагентное рецензирование с проверкой согласованности методологии и результатов</td></tr>
-<tr><td>🧬</td><td><code>evolution/</code></td><td>Уроки для самообучения, извлеченные из каждого запуска</td></tr>
-<tr><td>📦</td><td><code>deliverables/</code></td><td>Все итоговые материалы в одной папке — готовы к загрузке в Overleaf</td></tr>
-</table>
-
-Пайплайн работает **от начала до конца** — полностью автономно или с совместным участием человека (human-in-the-loop). Если эксперименты падают — он чинит код. Если гипотезы не подтверждаются — он меняет направление. Если цитаты оказываются фейковыми — он их удаляет. Если вы хотите направить процесс — он останавливается и слушает.
-
-🌍 **Запускайте где угодно.** AutoResearchClaw не привязан к одной платформе. Используйте его автономно через CLI, подключите к [OpenClaw](https://github.com/openclaw/openclaw) или интегрируйте с любым ACP-совместимым агентом — 🤖 Claude Code, 💻 Codex CLI, 🐙 Copilot CLI, ♊ Gemini CLI, 🌙 Kimi CLI. Благодаря мосту сообщений OpenClaw вы можете запустить полное исследование из 💬 Discord, ✈️ Telegram, 🐦 Lark (飞书), 💚 WeChat или любой другой платформы, которую использует ваша команда. Один топик на входе, одна статья на выходе — откуда бы вы ни писали.
-
----
-
-## 🚀 Быстрый старт
-
-```bash
-# 1. Клонируйте и установите
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-python3 -m venv .venv && source .venv/bin/activate
-pip install -e .
-
-# 2. Настройка (интерактивная — устанавливает OpenCode beast mode, проверяет Docker/LaTeX)
-researchclaw setup
-
-# 3. Конфигурация
-researchclaw init          # Интерактивный режим: выбор провайдера LLM, создание config.arc.yaml
-# Или вручную: cp config.researchclaw.example.yaml config.arc.yaml
-
-# 4. Запуск
-export OPENAI_API_KEY="sk-..."
-researchclaw run --config config.arc.yaml --topic "Ваша исследовательская идея" --auto-approve
-```
-
-Результаты → `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` — готовые к компиляции LaTeX, BibTeX, код экспериментов, графики.
-
-<details>
-<summary>📝 Минимальная конфигурация</summary>
-
-```yaml
-project:
-  name: "my-research"
-
-research:
-  topic: "Ваша тема исследования"
-
-llm:
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-4o"
-  fallback_models: ["gpt-4o-mini"]
-
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python"
-```
-
-</details>
-
----
-
-## 🧠 В чем отличие от других
-
-| Фича | Как это работает |
-|-----------|-------------|
-| **🧑‍✈️ Режим Co-Pilot** | 6 режимов вмешательства — от полностью автономного до пошагового. Направляйте ИИ в критических решениях (гипотезы, бейзлайны, написание статьи) или дайте ему работать самостоятельно. SmartPause автоматически определяет, когда помощь человека будет полезна. |
-| **🔄 Цикл PIVOT / REFINE** | На 15-м этапе система автономно решает: ПРОДОЛЖИТЬ, УЛУЧШИТЬ (подобрать параметры) или СМЕНИТЬ КУРС (PIVOT). Артефакты версионируются автоматически. |
-| **🤖 Мультиагентные дебаты** | Генерация гипотез, анализ результатов и рецензирование проходят в формате структурированных дебатов с разных точек зрения. |
-| **🧬 Самообучение** | Из каждого запуска извлекаются уроки (обоснование решений, ошибки в коде, аномалии в метриках) с периодом полураспада в 30 дней. Будущие запуски учатся на прошлых ошибках. |
-| **📚 База знаний** | Каждый запуск пополняет структурированную базу знаний по 6 категориями (решения, эксперименты, находки, литература, вопросы, рецензии). |
-| **🛡️ Сторожевой модуль Sentinel** | Фоновый мониторинг качества: обнаружение NaN/Inf, проверка соответствия текста статьи реальным данным, оценка релевантности цитат, защита от фабрикации фактов. |
-| **🔍 Проверка утверждений** | Инлайн-фактчекинг: извлекает утверждения из текста, сгенерированного ИИ, и сверяет их с собранной литературой. Отмечает необоснованные цитаты и выдуманные числа. |
-| **🌿 Ветвление исследований** | Ответвление пайплайна для одновременного исследования нескольких направлений, сравнение результатов бок о бок и слияние лучшего пути. |
-
----
-
-## 🦞 Интеграция с OpenClaw
-
-<table>
-<tr>
-
-**AutoResearchClaw полностью совместим с [OpenClaw](https://github.com/openclaw/openclaw).** Установите его в OpenClaw и запускайте автономные исследования одним сообщением — или используйте отдельно через CLI, Claude Code или любой другой ИИ-ассистент.
-
-</tr>
-</table>
-
-### 🚀 Использование с OpenClaw (Рекомендуется)
-
-Если вы уже используете [OpenClaw](https://github.com/openclaw/openclaw) как своего ИИ-ассистента:
-
-```
-1️⃣  Отправьте URL репозитория в OpenClaw
-2️⃣  OpenClaw автоматически прочитает RESEARCHCLAW_AGENTS.md → поймет структуру пайплайна
-3️⃣  Напишите: "Проведи исследование на тему [ваша тема]"
-4️⃣  Готово — OpenClaw сам склонирует, установит, настроит, запустит и вернет результаты
-```
-
-**Вот и всё.** OpenClaw берет на себя `git clone`, `pip install`, настройку конфигов и запуск пайплайна. Вы просто общаетесь в чате.
-
-<details>
-<summary>💡 Что происходит под капотом</summary>
-
-1. OpenClaw читает `RESEARCHCLAW_AGENTS.md` → принимает на себя роль исследовательского оркестратора
-2. OpenClaw читает `README.md` → понимает процесс установки и структуру пайплайна
-3. OpenClaw копирует `config.researchclaw.example.yaml` → `config.yaml`
-4. Запрашивает ваш API-ключ (или использует переменную окружения)
-5. Выполняет `pip install -e .` + `researchclaw run --topic "..." --auto-approve`
-6. Возвращает готовую статью, LaTeX, код экспериментов и список литературы
-
-</details>
-
-### 🔌 Мост OpenClaw (Продвинутый уровень)
-
-Для более глубокой интеграции в AutoResearchClaw встроена **система адаптеров** с 6 опциональными возможностями:
-
-```yaml
-# config.arc.yaml
-openclaw_bridge:
-  use_cron: true              # ⏰ Запуск исследований по расписанию
-  use_message: true           # 💬 Уведомления о прогрессе (Discord/Slack/Telegram)
-  use_memory: true            # 🧠 Сохранение знаний между сессиями
-  use_sessions_spawn: true    # 🔀 Запуск параллельных подсессий для независимых этапов
-  use_web_fetch: true         # 🌐 Поиск в интернете в реальном времени при обзоре литературы
-  use_browser: false          # 🖥️ Сбор статей через браузер
-```
-
-Каждый флаг активирует типизированный протокол адаптера. Если OpenClaw поддерживает эти функции, адаптеры используют их без изменения кода. Подробности см. в [`integration-guide.md`](integration-guide.md).
-
-### ACP (Agent Client Protocol)
-
-AutoResearchClaw может использовать **любого ACP-совместимого агента** в качестве LLM-бэкенда — API-ключи не требуются. Агент общается через [acpx](https://github.com/openclaw/acpx), поддерживая единую сессию на протяжении всех 23 этапов.
-
-| Агент | Команда | Примечания |
-|-------|---------|-------|
-| Claude Code | `claude` | Anthropic |
-| Codex CLI | `codex` | OpenAI |
-| Copilot CLI | `gh` | GitHub |
-| Gemini CLI | `gemini` | Google |
-| OpenCode | `opencode` | SST |
-| Kimi CLI | `kimi` | Moonshot |
-
-```yaml
-# config.yaml — пример ACP
-llm:
-  provider: "acp"
-  acp:
-    agent: "claude"   # Любая команда CLI ACP-совместимого агента
-    cwd: "."          # Рабочая директория для агента
-  # base_url и api_key не нужны — агент сам управляет авторизацией.
-```
-
-```bash
-# Просто запускайте — агент использует свои собственные учетные данные
-researchclaw run --config config.yaml --topic "Ваша идея" --auto-approve
-```
-
-### 🛠️ Другие способы запуска
-
-| Способ | Как запустить |
-|--------|-----|
-| **CLI** | `researchclaw run --topic "..." --auto-approve` (автономный) или `--mode co-pilot` (совместный) |
-| **Python API** | `from researchclaw.pipeline import Runner; Runner(config).run()` |
-| **Claude Code** | Читает `RESEARCHCLAW_CLAUDE.md` — просто напишите *"Run research on [topic]"* |
-| **Copilot CLI** | `researchclaw run --topic "..."` с `llm.acp.agent: "gh"` |
-| **OpenCode** | Читает `.claude/skills/` — такой же интерфейс на естественном языке |
-| **Любой AI CLI** | Скормите `RESEARCHCLAW_AGENTS.md` в контекст → агент сам поймет, что делать |
-
----
-
-## 🔬 Пайплайн: 23 этапа, 8 фаз
-
-```
-Фаза A: Определение области          Фаза E: Выполнение экспериментов
-  1. TOPIC_INIT                         12. EXPERIMENT_RUN
-  2. PROBLEM_DECOMPOSE                  13. ITERATIVE_REFINE  ← самовосстановление
-
-Фаза B: Поиск литературы             Фаза F: Анализ и принятие решений
-  3. SEARCH_STRATEGY                    14. RESULT_ANALYSIS    ← мультиагентный анализ
-  4. LITERATURE_COLLECT  ← API          15. RESEARCH_DECISION  ← PIVOT/REFINE
-  5. LITERATURE_SCREEN   [гейт]
-  6. KNOWLEDGE_EXTRACT                  Фаза G: Написание статьи
-                                        16. PAPER_OUTLINE
-Фаза C: Синтез знаний                   17. PAPER_DRAFT
-  7. SYNTHESIS                          18. PEER_REVIEW        ← проверка доказательств
-  8. HYPOTHESIS_GEN    ← дебаты         19. PAPER_REVISION
-
-Фаза D: Дизайн экспериментов         Фаза H: Финализация
-  9. EXPERIMENT_DESIGN   [гейт]         20. QUALITY_GATE      [гейт]
- 10. CODE_GENERATION                    21. KNOWLEDGE_ARCHIVE
- 11. RESOURCE_PLANNING                  22. EXPORT_PUBLISH     ← LaTeX
-                                        23. CITATION_VERIFY    ← проверка релевантности
-```
-
-> **Гейты (Контрольные точки)** (5, 9, 20) ставят пайплайн на паузу для апрува человеком (или пропускаются флагом `--auto-approve`). При отклонении пайплайн откатывается назад.
-
-> **Режим Co-Pilot** (`--mode co-pilot`): Глубокое взаимодействие человека и ИИ на этапах 7-8 (Мастерская идей), этапе 9 (Навигатор бейзлайнов) и этапах 16-17 (Совместное написание статьи). Остальные этапы выполняются автоматически с мониторингом SmartPause.
-
-> **Циклы принятия решений**: На 15-м этапе система может уйти на доработку (REFINE → Этап 13) или сменить курс (PIVOT → Этап 8), автоматически сохраняя версии артефактов.
-
-<details>
-<summary>📋 Что происходит на каждой фазе</summary>
-
-| Фаза | Описание |
-|-------|-------------|
-| **A: Определение области** | LLM разбивает тему на структурированное дерево проблем с исследовательскими вопросами. |
-| **A+: Железо** | Автоопределение GPU (NVIDIA CUDA / Apple MPS / CPU), предупреждения о нехватке ресурсов, адаптация генерации кода под доступное железо. |
-| **B: Литература** | Поиск по нескольким базам (OpenAlex → Semantic Scholar → arXiv) реальных статей, фильтрация по релевантности, извлечение карточек знаний. |
-| **C: Синтез** | Кластеризация находок, поиск пробелов в исследованиях, генерация проверяемых гипотез через мультиагентные дебаты. |
-| **D: Дизайн** | Проектирование плана экспериментов, генерация Python-кода с учетом железа (выбор пакетов под GPU), оценка требуемых ресурсов. |
-| **E: Выполнение** | Запуск экспериментов в песочнице, отлов NaN/Inf и багов в рантайме, самовосстановление кода через LLM. |
-| **F: Анализ** | Мультиагентный анализ результатов; автономное решение ПРОДОЛЖИТЬ / УЛУЧШИТЬ / СМЕНИТЬ КУРС с подробным обоснованием. |
-| **G: Написание** | План → написание по разделам (5,000-6,500 слов) → рецензирование (с проверкой соответствия методологии и результатов) → редактура с контролем объема. |
-| **H: Финализация** | Контроль качества, архивация знаний, экспорт в LaTeX по шаблонам конференций, проверка целостности и релевантности цитат. |
-
-</details>
-
----
-
-## ✨ Ключевые фичи
-
-| Фича | Описание |
-|---------|------------|
-| **📚 Мультиисточниковая литература** | Реальные статьи из OpenAlex, Semantic Scholar и arXiv — расширение запросов, дедупликация, защита от падений API с постепенной деградацией. |
-| **🔍 4-уровневая проверка цитат** | Проверка arXiv ID → CrossRef/DataCite DOI → совпадение заголовков в Semantic Scholar → оценка релевантности через LLM. Выдуманные ссылки удаляются автоматически. |
-| **🖥️ Адаптация под железо** | Автоопределение GPU (NVIDIA CUDA / Apple MPS / CPU) и адаптация генерации кода, импортов и масштаба экспериментов. |
-| **🦾 OpenCode Beast Mode** | Сложные эксперименты автоматически перенаправляются в [OpenCode](https://github.com/anomalyco/opencode) — генерация многофайловых проектов с кастомными архитектурами, циклами обучения и ablation studies. Устанавливается через `researchclaw setup`. |
-| **🧪 Эксперименты в песочнице** | Валидация кода через AST, неизменяемая обвязка, быстрый отказ при NaN/Inf, самовосстановление, итеративное улучшение (до 10 раундов), сохранение частичных результатов. |
-| **📝 Написание уровня конференций** | Шаблоны NeurIPS/ICML/ICLR, написание по разделам (5,000-6,500 слов), защита от выдуманных фактов, контроль объема при редактуре, удаление типичных ИИ-оговорок. |
-| **📐 Переключение шаблонов** | `neurips_2025`, `iclr_2026`, `icml_2026` — Markdown → LaTeX с формулами, таблицами, графиками, перекрестными ссылками и `\cite{}`. |
-| **🛡️ Анти-фабрикация** | VerifiedRegistry обеспечивает использование проверенных экспериментальных данных в статьях. Автоматическая диагностика и восстановление неудачных экспериментов перед написанием. Непроверенные числа очищаются. |
-| **🚦 Гейты качества** | 3 точки контроля человеком (Этапы 5, 9, 20) с возможностью отката. Можно пропустить флагом `--auto-approve`. |
-| **🧑‍✈️ HITL Co-Pilot** | 6 режимов вмешательства с настраиваемыми политиками для каждого этапа. Мастерская идей, Навигатор бейзлайнов, Paper Co-Writer для глубокого взаимодействия. SmartPause, контроль бюджета, политики эскалации и обучение на интервенциях для безопасности в продакшене. Адаптеры CLI/WebSocket/MCP. |
-| **💰 Контроль бюджета** | Мониторинг затрат с настраиваемыми порогами оповещений (50%/80%/100%). Пайплайн автоматически приостанавливается при превышении бюджета. |
-| **🔐 Воспроизводимость** | SHA256-контрольные суммы для всех артефактов этапов. Неизменяемые манифесты для верификации. Многоуровневый откат с версионированными снимками. |
-
----
-
-## 🧑‍✈️ Human-in-the-Loop Co-Pilot
-
-**AutoResearchClaw v0.4.0 представляет полноценную систему Human-in-the-Loop (HITL)**, которая превращает пайплайн из чисто автономного в совместный исследовательский движок человека и ИИ. Выберите свой уровень вовлеченности:
-
-### Режимы вмешательства
-
-| Режим | Команда | Что делает |
-|------|---------|-------------|
-| **Full Auto** | `--auto-approve` | Исходное поведение — без вмешательства человека |
-| **Gate Only** | `--mode gate-only` | Пауза на 3 гейтах (5, 9, 20) для одобрения |
-| **Checkpoint** | `--mode checkpoint` | Пауза на границе каждой фазы (8 контрольных точек) |
-| **Co-Pilot** | `--mode co-pilot` | Глубокое взаимодействие на критических этапах, автоматический режим на остальных |
-| **Step-by-Step** | `--mode step-by-step` | Пауза после каждого этапа — для изучения пайплайна |
-| **Express** | `--mode express` | Быстрый обзор — только 3 самых критических гейта |
-
-### Рабочий процесс Co-Pilot
-
-```
-You: researchclaw run --topic "Квантовый шум как регуляризация нейронных сетей" --mode co-pilot
-
-Пайплайн выполняет этапы 1-7 автоматически...
-
-  ┌─────────────────────────────────────────────────────────────┐
-  │  HITL | Stage 08: HYPOTHESIS_GEN                            │
-  │  Пост-этапная проверка                                      │
-  │                                                             │
-  │  Упомянуто гипотез: 3                                       │
-  │  Оценка новизны: 0.72 (умеренная)                           │
-  │                                                             │
-  │  [a] Одобрить  [r] Отклонить  [e] Редактировать  [c] Чат    │
-  │  [i] Дать указание  [v] Посмотреть вывод  [q] Прервать      │
-  └─────────────────────────────────────────────────────────────┘
-
-You: c  (начать совместный чат)
-You: Гипотеза 3 интересна, но нужны бейзлайны Dropout/Label Smoothing
-AI:  Обновлено — добавлены Dropout, Label Smoothing, MixUp, CutMix в качестве бейзлайнов...
-You: approve
-
-Пайплайн продолжает с вашей улучшенной гипотезой...
-```
-
-### CLI-команды
-
-```bash
-# Запуск в режиме HITL
-researchclaw run --topic "..." --mode co-pilot
-
-# Подключение к приостановленному пайплайну (из другого терминала)
-researchclaw attach artifacts/rc-2026-xxx
-
-# Проверка статуса пайплайна и HITL
-researchclaw status artifacts/rc-2026-xxx
-
-# Одобрение/отклонение из другого терминала или скрипта
-researchclaw approve artifacts/rc-2026-xxx --message "LGTM"
-researchclaw reject artifacts/rc-2026-xxx --reason "Отсутствует ключевой бейзлайн"
-
-# Предоставление указаний для этапа (даже до его выполнения)
-researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "Использовать ResNet-50 как основной бейзлайн"
-```
-
-### Ключевые возможности
-
-| Возможность | Описание |
-|---------|------------|
-| **Мастерская идей** | Совместное обсуждение, оценка и доработка гипотез (Этапы 7-8) |
-| **Навигатор бейзлайнов** | ИИ предлагает бейзлайны + человек добавляет/удаляет + чек-лист воспроизводимости (Этап 9) |
-| **Paper Co-Writer** | Написание по разделам с редактированием человеком и полировкой ИИ (Этапы 16-19) |
-| **SmartPause** | Динамическая пауза по уровню уверенности — автоматически определяет, когда помощь человека будет полезна |
-| **Проверка утверждений** | Инлайн-фактчекинг по собранной литературе — отмечает необоснованные утверждения |
-| **Контроль бюджета** | Мониторинг затрат с порогами оповещений 50%/80%/100% |
-| **Обучение на интервенциях** | ALHF — учится на ваших паттернах ревью для оптимизации будущих пауз |
-| **Ветвление исследований** | Ответвление пайплайна для исследования нескольких гипотез, сравнение, слияние лучшего |
-| **Политика эскалации** | Многоуровневые уведомления (терминал → Slack → email → авто-остановка) при отсутствии оператора |
-| **3 адаптера** | CLI (терминал), WebSocket (веб-панель), MCP (внешние агенты) |
-
-### Конфигурация
-
-```yaml
-# config.arc.yaml
-hitl:
-  enabled: true
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | co-pilot | custom
-  cost_budget_usd: 50.0              # Пауза при превышении бюджета (0 = без лимита)
-
-  notifications:
-    on_pause: true
-    on_quality_drop: true
-    channels: ["terminal"]            # terminal | slack | webhook
-
-  timeouts:
-    default_human_timeout_sec: 86400  # Ожидание до 24ч
-    auto_proceed_on_timeout: false
-
-  collaboration:
-    max_chat_turns: 50
-    save_chat_history: true
-
-  # Кастомные политики для этапов (опционально, для режима 'custom')
-  stage_policies:
-    8: { require_approval: true, enable_collaboration: true }
-    9: { require_approval: true, allow_edit_output: true }
-```
-
-### Обратная совместимость
-
-- **По умолчанию: ВЫКЛЮЧЕНО.** Без `hitl.enabled: true` или `--mode` пайплайн работает как раньше.
-- **`--auto-approve` по-прежнему работает.** Он перекрывает режим HITL.
-- **Все 2 699 тестов проходят успешно** с кодом HITL.
-
----
-
-## 🧠 Интеграция с MetaClaw
-
-**AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = Пайплайн, который учится на каждом запуске.**
-
-MetaClaw добавляет **перенос знаний между запусками**. Если эта функция включена, пайплайн автоматически извлекает уроки из ошибок и предупреждений, превращает их в переиспользуемые навыки и внедряет во все 23 этапа при следующих запусках — чтобы больше никогда не повторять одни и те же ошибки.
-
-### Как это работает
-
-```
-Запуск N выполняется → ошибки/предупреждения сохраняются как Уроки (Lessons)
-                      ↓
-          MetaClaw конвертирует Урок → Навык (Skill)
-                      ↓
-          Файлы навыков arc-* сохраняются в ~/.metaclaw/skills/
-                      ↓
-Запуск N+1 → build_overlay() внедряет навыки в каждый промпт LLM
-                      ↓
-          LLM избегает известных ошибок → выше качество, меньше ретраев
-```
-
-### Быстрая настройка
-
-```bash
-# 1. Установите MetaClaw (если еще не установлен)
-pip install metaclaw
-
-# 2. Включите в конфиге
-```
-
-```yaml
-# config.arc.yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000"        # Прокси MetaClaw (опционально)
-  skills_dir: "~/.metaclaw/skills"          # Папка для хранения навыков
-  fallback_url: "https://api.openai.com/v1" # Прямой фоллбэк к LLM
-  fallback_api_key: ""                      # API-ключ для фоллбэка
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # Конвертировать предупреждения и ошибки
-    max_skills_per_run: 3
-```
-
-```bash
-# 3. Запускайте как обычно — MetaClaw работает прозрачно
-researchclaw run --config config.arc.yaml --topic "Ваша идея" --auto-approve
-```
-
-После каждого запуска заглядывайте в `~/.metaclaw/skills/arc-*/SKILL.md`, чтобы посмотреть, чему научился ваш пайплайн.
-
-### Результаты экспериментов
-
-В контролируемых A/B тестах (одна тема, одна LLM, один конфиг):
-
-| Метрика | База | С MetaClaw | Улучшение |
-|--------|----------|---------------|-------------|
-| Частота ретраев на этапах | 10.5% | 7.9% | **-24.8%** |
-| Количество циклов доработки (Refine) | 2.0 | 1.2 | **-40.0%** |
-| Успешное завершение пайплайна | 18/19 | 19/19 | **+5.3%** |
-| Общий индекс робастности (композитный) | 0.714 | 0.845 | **+18.3%** |
-
-> Композитный индекс робастности — это взвешенное среднее из процента завершения (40%), снижения ретраев (30%) и эффективности циклов доработки (30%).
-
-### Обратная совместимость
-
-- **По умолчанию: ВЫКЛЮЧЕНО.** Если блока `metaclaw_bridge` нет или `enabled: false`, пайплайн работает как раньше.
-- **Никаких новых зависимостей.** MetaClaw опционален — ядро работает и без него.
-- **Все 2 699 тестов проходят успешно** даже с кодом интеграции.
-
----
-
-## 🧩 Библиотека навыков
-
-AutoResearchClaw теперь поддерживает загрузку **открытых и пользовательских навыков** для расширения исследовательских возможностей. Мы также поставляем **20 предустановленных навыков** (научное письмо, поиск литературы, химия, биология и др.) в качестве готовых примеров, обеспечивая высокую гибкость из коробки. Отключите любой навык, добавив `enabled: false` в его метаданные.
-
-**Примеры встроенных навыков:**
-
-| Категория | Навык | Описание |
-|----------|-------|-------------|
-| **Написание** | `scientific-writing` | Структура IMRAD, форматирование цитат, стандарты отчетности |
-| **Домен** | `chemistry-rdkit` | Молекулярный анализ, SMILES, фингерпринты, открытие лекарств |
-| **Эксперимент** | `literature-search` | Систематический обзор, методология PRISMA |
-
-> Смотрите все 20 навыков командой `researchclaw skills list`.
-
-### Загрузка своих навыков
-
-```bash
-# Вариант 1: Установить навык (сохраняется между проектами)
-researchclaw skills install /path/to/my-skill/
-
-# Вариант 2: Поместить SKILL.md в проект
-mkdir -p .claude/skills/my-custom-skill
-# Затем создайте SKILL.md с YAML-метаданными (name, description, trigger-keywords, applicable-stages)
-
-# Вариант 3: Настроить общие директории навыков в config.arc.yaml
-# skills:
-#   custom_dirs:
-#     - /path/to/team-shared-skills
-```
-
-### Использование навыков
-
-Навыки загружаются и внедряются в промпты LLM автоматически — ручная активация не требуется. Используйте CLI для просмотра:
-
-```bash
-researchclaw skills list               # Показать все загруженные навыки с источниками
-researchclaw skills validate ./my-skill # Проверить формат SKILL.md
-```
-
-Навыки от сообщества: [K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills) (150+ научных навыков по множеству дисциплин).
-
----
-
-## ⚙️ Справочник по конфигурации
-
-<details>
-<summary>Нажмите, чтобы развернуть полный конфиг</summary>
-
-```yaml
-# === Проект ===
-project:
-  name: "my-research"              # Идентификатор проекта
-  mode: "docs-first"               # docs-first | semi-auto | full-auto
-
-# === Исследование ===
-research:
-  topic: "..."                     # Тема исследования (обязательно)
-  domains: ["ml", "nlp"]           # Домены для поиска литературы
-  daily_paper_count: 8             # Целевое количество статей на один запрос
-  quality_threshold: 4.0           # Минимальный порог качества для статей
-
-# === Рантайм ===
-runtime:
-  timezone: "Europe/Moscow"        # Для таймстемпов
-  max_parallel_tasks: 3            # Лимит параллельных экспериментов
-  approval_timeout_hours: 12       # Таймаут ожидания на гейтах
-  retry_limit: 2                   # Количество ретраев при падении этапа
-
-# === LLM ===
-llm:
-  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
-  base_url: "https://..."          # API endpoint (обязательно для openai-compatible)
-  api_key_env: "OPENAI_API_KEY"    # Переменная окружения с ключом (обязательно для openai-compatible)
-  api_key: ""                      # Или можно захардкодить ключ здесь
-  primary_model: "gpt-4o"          # Основная модель
-  fallback_models: ["gpt-4o-mini"] # Цепочка фоллбэков
-  s2_api_key: ""                   # API-ключ Semantic Scholar (опционально, дает лимиты выше)
-  acp:                             # Используется только если provider: "acp"
-    agent: "claude"                # Команда CLI ACP-агента (claude, codex, gemini и т.д.)
-    cwd: "."                       # Рабочая директория агента
-
-# === Эксперименты ===
-experiment:
-  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
-  time_budget_sec: 300             # Макс. время на один запуск (по умолчанию: 300с)
-  max_iterations: 10               # Макс. количество итераций оптимизации
-  metric_key: "val_loss"           # Название главной метрики
-  metric_direction: "minimize"     # minimize | maximize
-  sandbox:
-    python_path: ".venv/bin/python"
-    gpu_required: false
-    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
-    max_memory_mb: 4096
-  docker:
-    image: "researchclaw/experiment:latest"
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    auto_install_deps: true        # Автоопределение импортов → requirements.txt
-  ssh_remote:
-    host: ""                       # Хостнейм GPU-сервера
-    gpu_ids: []                    # Доступные ID видеокарт
-    remote_workdir: "/tmp/researchclaw_experiments"
-  opencode:                          # OpenCode Beast Mode (устанавливается через `researchclaw setup`)
-    enabled: true                    # Главный рубильник (по умолчанию: true)
-    auto: true                       # Автозапуск без подтверждения (по умолчанию: true)
-    complexity_threshold: 0.2        # 0.0-1.0 — чем выше, тем реже триггерится (только на сложных задачах)
-    model: ""                        # Переопределение модели (пусто = использовать llm.primary_model)
-    timeout_sec: 600                 # Макс. время на генерацию в OpenCode
-    max_retries: 1                   # Количество ретраев при падении
-    workspace_cleanup: true          # Удалять временный воркспейс после сбора результатов
-  code_agent:                        # CodeAgent v2 — многофазная генерация кода
-    enabled: true                    # Использовать CodeAgent вместо устаревшей однопромптной генерации
-    architecture_planning: true      # Генерировать подробный план реализации перед кодированием
-    sequential_generation: true      # Генерировать файлы по одному согласно DAG зависимостей
-    hard_validation: true            # AST-валидация (блокирует идентичные ablation, захардкоженные метрики)
-    hard_validation_max_repairs: 2   # Макс. попыток исправления при провале валидации
-    exec_fix_max_iterations: 3       # Попыток исправления при выполнении
-    exec_fix_timeout_sec: 60         # Таймаут на одну попытку исправления
-  benchmark_agent:                   # BenchmarkAgent — автоматический подбор датасетов и бейзлайнов
-    enabled: true                    # Включить 4-агентный пайплайн (Surveyor→Selector→Acquirer→Validator)
-    enable_hf_search: true           # Поиск по HuggingFace Datasets
-    enable_web_search: true          # Поиск бенчмарков в Google Scholar
-    tier_limit: 2                    # Фильтрация датасетов по уровню (1=малые/кэшированные, 2=средние, 3=большие)
-    min_benchmarks: 1                # Минимум необходимых датасетов
-    min_baselines: 2                 # Минимум бейзлайнов
-  figure_agent:                      # FigureAgent — генерация академических графиков
-    enabled: true                    # Включить 5-агентный пайплайн (Planner→CodeGen→Renderer→Critic→Integrator)
-    min_figures: 3                   # Минимум генерируемых графиков
-    max_figures: 8                   # Максимум графиков
-    max_iterations: 3                # Итераций улучшения через Critic
-    dpi: 300                         # Разрешение вывода
-    strict_mode: false               # Провал пайплайна при ошибке генерации графиков
-  repair:                            # Антифабрикация — ремонт экспериментов
-    enabled: true                    # Автодиагностика и ремонт упавших экспериментов
-    max_cycles: 3                    # Количество циклов ремонта
-    min_completion_rate: 0.5         # >=50% условий должны завершиться для продолжения
-    min_conditions: 2                # Минимум 2 условия для валидного эксперимента
-    use_opencode: true               # Направлять ремонт через OpenCode Beast Mode
-
-# === Веб-поиск (Опционально) ===
-web_search:
-  enabled: true                      # Включить веб-расширенный поиск литературы
-  tavily_api_key_env: "TAVILY_API_KEY"  # Переменная окружения для Tavily API-ключа (опционально)
-  enable_scholar: true               # Поиск в Google Scholar
-  enable_pdf_extraction: true        # Извлечение текста из PDF
-  max_web_results: 10                # Макс. веб-результатов на запрос
-
-# === Экспорт ===
-export:
-  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
-  authors: "Anonymous"
-  bib_file: "references"
-
-# === Промпты ===
-prompts:
-  custom_file: ""                  # Путь к кастомному YAML с промптами (пусто = дефолтные)
-
-# === HITL Co-Pilot (НОВОЕ в v0.4.0) ===
-hitl:
-  enabled: false                     # Установите true для включения HITL
-  mode: co-pilot                     # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom
-  cost_budget_usd: 0.0              # Лимит затрат в USD (0 = без лимита)
-  notifications:
-    on_pause: true                   # Уведомлять при паузе пайплайна
-    on_quality_drop: true            # Уведомлять при проблемах с качеством
-    channels: ["terminal"]           # terminal | slack | webhook
-  timeouts:
-    default_human_timeout_sec: 86400 # Ожидание до 24ч
-    auto_proceed_on_timeout: false   # Если true, авто-одобрение по таймауту
-  collaboration:
-    max_chat_turns: 50               # Макс. реплик за сессию взаимодействия
-    save_chat_history: true          # Сохранять логи чата
-  stage_policies: {}                 # Переопределения для этапов (для режима 'custom')
-
-# === Безопасность ===
-security:
-  hitl_required_stages: [5, 9, 20] # Этапы, требующие апрува человеком (Human-in-the-loop)
-  allow_publish_without_approval: false
-  redact_sensitive_logs: true
-
-# === База знаний ===
-knowledge_base:
-  backend: "markdown"              # markdown | obsidian
-  root: "docs/kb"
-
-# === Уведомления ===
-notifications:
-  channel: "console"               # console | discord | slack
-  target: ""
-
-# === Мост MetaClaw (Опционально) ===
-metaclaw_bridge:
-  enabled: false                   # Включить кросс-сессионное обучение
-  proxy_url: "http://localhost:30000"  # URL прокси MetaClaw
-  skills_dir: "~/.metaclaw/skills" # Папка для хранения навыков arc-*
-  fallback_url: ""                 # Прямой фоллбэк к LLM, если прокси лежит
-  fallback_api_key: ""             # API-ключ для фоллбэка
-  lesson_to_skill:
-    enabled: true                  # Автоматически конвертировать уроки в навыки
-    min_severity: "warning"        # Минимальная серьезность для конвертации
-    max_skills_per_run: 3          # Макс. количество новых навыков за один запуск
-  prm:                             # Process Reward Model — гейт качества (опционально)
-    enabled: false                 # Использовать LLM-as-judge для оценки результатов этапов
-    model: "gpt-5.4"              # Модель-судья PRM
-    votes: 3                       # Количество голосов (мажоритарное голосование)
-    gate_stages: [5, 9, 15, 20]   # Этапы для применения PRM-гейтов
-
-# === Мост OpenClaw ===
-openclaw_bridge:
-  use_cron: false                  # Запуск исследований по расписанию
-  use_message: false               # Уведомления о прогрессе
-  use_memory: false                # Сохранение знаний между сессиями
-  use_sessions_spawn: false        # Запуск параллельных подсессий
-  use_web_fetch: false             # Поиск в интернете в реальном времени
-  use_browser: false               # Сбор статей через браузер
-```
-
-</details>
-
----
-
-## 🙏 Благодарности
-
-Вдохновлено проектами:
-
-- 🔬 [AI Scientist](https://github.com/SakanaAI/AI-Scientist) (Sakana AI) — Пионер автоматизированных исследований
-- 🧠 [AutoResearch](https://github.com/karpathy/autoresearch) (Andrej Karpathy) — Сквозная автоматизация исследований
-- 🌐 [FARS](https://analemma.ai/blog/introducing-fars/) (Analemma) — Полностью автоматизированная исследовательская система
-
----
-
-## 📄 Лицензия
-
-MIT — подробности см. в [LICENSE](../LICENSE).
-
----
-
-## 📌 Цитирование
-
-Если AutoResearchClaw оказался вам полезен, пожалуйста, процитируйте:
-
-```bibtex
-@misc{liu2026autoresearchclawselfreinforcingautonomousresearch,
-      title={AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration},
-      author={Jiaqi Liu and Shi Qiu and Mairui Li and Bingzhou Li and Haonian Ji and Siwei Han and Xinyu Ye and Peng Xia and Zihan Dong and Congyu Zhang and Letian Zhang and Guiming Chen and Haoqin Tu and Xinyu Yang and Lu Feng and Xujiang Zhao and Haifeng Chen and Jiawei Zhou and Xiao Wang and Weitong Zhang and Hongtu Zhu and Yun Li and Jieru Mei and Hongliang Fei and Jiaheng Zhang and Linjie Li and Linjun Zhang and Yuyin Zhou and Sheng Wang and Caiming Xiong and James Zou and Zeyu Zheng and Cihang Xie and Mingyu Ding and Huaxiu Yao},
-      year={2026},
-      eprint={2605.20025},
-      archivePrefix={arXiv},
-      primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2605.20025},
-}
-```
-
-<p align="center">
-  <sub>Создано с 🦞 командой AutoResearchClaw</sub>
-</p>
\ No newline at end of file
diff --git a/docs/TESTER_GUIDE.md b/docs/TESTER_GUIDE.md
deleted file mode 100644
index a3997d87..00000000
--- a/docs/TESTER_GUIDE.md
+++ /dev/null
@@ -1,587 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="500" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center">🧪 Community Testing Guide</h2>
-
-<p align="center">
-  <b>Help us stress-test the world's first fully autonomous research pipeline — across every domain.</b>
-</p>
-
-<p align="center">
-  <a href="https://github.com/aiming-lab/AutoResearchClaw">⭐ Star the Repo</a> ·
-  <a href="#-quick-start">🚀 Quick Start</a> ·
-  <a href="#-feedback-template">📋 Feedback Template</a> ·
-  <a href="TESTER_GUIDE_CN.md">🇨🇳 中文测试指南</a> ·
-  <a href="TESTER_GUIDE_JA.md">🇯🇵 日本語テストガイド</a>
-</p>
-
----
-
-## 👋 Welcome, Tester!
-
-**AutoResearchClaw** is a fully autonomous academic paper generation pipeline. You give it a research idea — it handles everything else: literature search, experiment design, code generation, experiment execution, paper writing, peer review, and final delivery. **23 stages, zero human intervention.**
-
-We're looking for testers from **all disciplines and backgrounds** — machine learning, NLP, computer vision, reinforcement learning, bioinformatics, physics, social sciences, and beyond. The more diverse the testing, the better the pipeline becomes.
-
-**Your mission:** Run the pipeline with your own research idea, inspect the output, and submit a detailed feedback report. That's it. Every piece of feedback directly shapes the next version.
-
----
-
-## 📋 Table of Contents
-
-1. [Prerequisites](#-prerequisites)
-2. [Installation & Setup](#-installation--setup)
-3. [Running the Pipeline](#-running-the-pipeline)
-4. [Inspecting the Output](#-inspecting-the-output)
-5. [Feedback Report Requirements](#-feedback-report-requirements)
-6. [Feedback Template](#-feedback-template)
-7. [FAQ](#-faq)
-
----
-
-## 📦 Prerequisites
-
-| Item | Minimum | Recommended |
-|------|---------|-------------|
-| OS | macOS / Linux / WSL2 | Linux (Ubuntu 22.04+) |
-| Python | 3.11+ | 3.11 or 3.12 |
-| Disk | 500 MB | 2 GB+ |
-| RAM | 8 GB | 16 GB+ |
-| GPU | Not required (sandbox mode) | NVIDIA GPU + CUDA 12.x (docker mode) |
-| Network | Required (LLM API + literature search) | Stable connection |
-| LLM API Key | **Required** | OpenAI or Anthropic |
-
-### 🔑 About API Keys
-
-The pipeline calls a large language model (LLM) at every stage — writing, coding, reviewing, and more. You'll need an API key from **OpenAI** or **Anthropic**.
-
-> **We strongly recommend using the most capable models available for the best results:**
->
-> | Provider | Recommended Model | Fallback |
-> |----------|------------------|----------|
-> | **OpenAI** | **GPT-5.4** (best) | GPT-5.1 or GPT-4.1 |
-> | **Anthropic** | **Claude Opus 4.6** (best) | Claude Sonnet 4.6 |
->
-> Using a top-tier model significantly improves paper quality, code correctness, and experiment design. Older models (e.g., GPT-4o) may produce noticeably weaker output.
-
----
-
-## 🛠 Installation & Setup
-
-### ⚠️ Always Use the Latest Version
-
-> **This project is under active development.** The codebase is updated frequently, and different versions can produce significantly different results.
->
-> **Before every test run, always pull the latest code:**
->
-> ```bash
-> cd AutoResearchClaw
-> git pull origin main
-> pip install -e .    # Re-install to pick up changes
-> ```
->
-> Record your version for the feedback report:
-> ```bash
-> git log --oneline -1
-> ```
-
----
-
-### Option A: Claude Code (Fastest — Recommended ⚡)
-
-If you have [Claude Code](https://claude.ai/claude-code) (Anthropic's CLI tool), just paste this:
-
-```
-Please clone and install AutoResearchClaw:
-https://github.com/aiming-lab/AutoResearchClaw.git
-
-If already cloned, run git pull origin main to update to the latest version first.
-
-Then create a config file with:
-- LLM: OpenAI with gpt-5.4 (or Anthropic Claude Opus 4.6)
-- Experiment mode: sandbox (local execution)
-- Research topic: "<YOUR RESEARCH IDEA HERE>"
-- Auto-approve all gate stages
-
-My API key is: sk-xxxx (set it as an environment variable, don't hardcode it)
-```
-
-Claude Code will handle cloning, dependencies, configuration, and execution automatically.
-
-### Option B: Manual Installation
-
-```bash
-# 1. Clone the repo
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-
-# 2. Create a virtual environment
-python3 -m venv .venv
-source .venv/bin/activate       # macOS / Linux
-# .venv\Scripts\activate        # Windows (prefer WSL2)
-
-# 3. Install
-pip install -e .
-
-# 4. Verify
-researchclaw --help
-```
-
-### ⚙️ Configuration
-
-```bash
-cp config.researchclaw.example.yaml config.arc.yaml
-```
-
-Edit `config.arc.yaml` — here are the key fields:
-
-```yaml
-# === Project ===
-project:
-  name: "my-test"
-  mode: "full-auto"
-
-# === Research Topic — describe your idea in English ===
-research:
-  topic: "Your research idea in 1-2 sentences"
-  domains:
-    - "machine-learning"     # Options: nlp, cv, rl, graph-learning, etc.
-
-# === LLM — use the strongest model you have access to! ===
-#
-# Option 1: OpenAI (GPT-5.4 recommended)
-llm:
-  provider: "openai-compatible"
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-5.4"              # Best available
-  fallback_models:
-    - "gpt-5.1"
-    - "gpt-4.1"
-
-# Option 2: Anthropic Claude (Claude Opus 4.6 recommended)
-# llm:
-#   provider: "openai-compatible"
-#   base_url: "https://api.anthropic.com/v1"
-#   api_key_env: "ANTHROPIC_API_KEY"
-#   primary_model: "claude-opus-4-6"
-#   fallback_models:
-#     - "claude-sonnet-4-6"
-
-# === Experiment ===
-experiment:
-  mode: "sandbox"                # sandbox = local execution (recommended)
-  time_budget_sec: 600           # Max seconds per experiment run
-  max_iterations: 10
-  metric_key: "primary_metric"
-  metric_direction: "minimize"   # or "maximize"
-```
-
-### 🔐 Set Your API Key
-
-```bash
-# OpenAI users:
-export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxx"
-
-# Anthropic users:
-export ANTHROPIC_API_KEY="sk-ant-xxxxxxxxxxxxxxxxxxxxxxxx"
-
-# Optional: Semantic Scholar API key (speeds up literature search)
-export S2_API_KEY="your-s2-key"
-```
-
-> **🔒 Security:** Never hardcode API keys in files. Use `api_key_env` in the config to reference an environment variable.
-
----
-
-## 🚀 Running the Pipeline
-
-### Quick Start
-
-```bash
-source .venv/bin/activate
-export OPENAI_API_KEY="sk-xxxx"       # or ANTHROPIC_API_KEY
-
-researchclaw run --config config.arc.yaml --auto-approve
-```
-
-### With a Specific Topic
-
-```bash
-researchclaw run \
-  --config config.arc.yaml \
-  --topic "Investigating the effect of curriculum learning on image classification with adaptive difficulty scheduling" \
-  --auto-approve
-```
-
-### ⏱ Expected Runtime
-
-| Mode | Estimated Time | Notes |
-|------|---------------|-------|
-| sandbox | 30 min – 2 hours | Depends on experiment complexity & API speed |
-| docker (GPU) | 1 – 4 hours | For heavier deep learning experiments |
-
-The terminal shows real-time progress. **No manual intervention needed** — sit back and let it run.
-
-### ✅ How to Know It's Done
-
-You'll see output like:
-
-```
-[Stage 23/23] ✓ Deliverables packaged
-Pipeline complete — deliverables at: artifacts/rc-20260315-XXXXXX-YYYY/deliverables/
-```
-
-### 🔄 If It Gets Interrupted
-
-The pipeline supports checkpointing — just resume:
-
-```bash
-researchclaw run --config config.arc.yaml --resume
-```
-
----
-
-## 🔍 Inspecting the Output
-
-After completion, find your results in `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/`.
-
-### 📂 Deliverables
-
-| File / Directory | Description |
-|-----------------|-------------|
-| `paper_final.md` | Final paper in Markdown (5,000–6,500 words) |
-| `paper.tex` | Conference-ready LaTeX source (directly compilable) |
-| `references.bib` | BibTeX bibliography (verified citations) |
-| `code/main.py` | Auto-generated experiment code |
-| `code/requirements.txt` | Python dependencies for experiments |
-| `charts/` | Result visualization charts (PNG) |
-| `verification_report.json` | Citation integrity verification report |
-| `manifest.json` | Deliverable manifest with metadata |
-
-### 🔎 What to Check
-
-1. **Paper Content** (`paper_final.md` or `paper.tex`)
-   - Is the title relevant to the topic?
-   - Does the abstract clearly state problem, method, and results?
-   - Does Related Work cite key papers in the field?
-   - Is the method description technically correct?
-   - Is the experiment design sound (datasets, baselines, metrics)?
-   - Are results meaningful (not all zeros, not NaN)?
-   - Are conclusions consistent with experimental findings?
-
-2. **Experiment Code** (`code/main.py`)
-   - Can it run independently?
-   - Does it use real datasets (not randomly generated fake data)?
-   - Does it implement what the paper describes?
-   - Are hyperparameters reasonable?
-
-3. **Charts** (`charts/`)
-   - Are they readable and clean?
-   - Are axis labels correct?
-   - Does the data match the paper's claims?
-
-4. **References** (`references.bib`)
-   - Do the cited papers actually exist?
-   - Are citations relevant to the discussion?
-
-### 📊 Auto-Generated Quality Report
-
-The pipeline produces a quality assessment at `stage-20/quality_report.json` containing:
-
-- `score_1_to_10` — automated quality score
-- `verdict` — accept / reject recommendation
-- `strengths` — what went well
-- `weaknesses` — identified issues
-- `required_actions` — suggested improvements
-
-Please reference this in your feedback, and add your own expert judgment.
-
----
-
-## 📝 Feedback Report Requirements
-
-**Your feedback is the single most important input for improving this project.** Please be thorough and honest — critical feedback is just as valuable as praise.
-
-### What to Submit
-
-| # | Item | Details |
-|---|------|---------|
-| F1 | **Feedback Report** (use template below) | Markdown format, named `feedback_<your-name>.md` |
-| F2 | **Full Output Directory** | Zip the entire `artifacts/rc-XXXXXX/` directory |
-| F3 | **Config File** | Your `config.arc.yaml` (**remove API keys first!**) |
-| F4 | **Terminal Log** (optional but helpful) | Copy of the terminal output during the run |
-
-### The Four Dimensions of Feedback
-
-#### 🎯 (a) Quality Assessment
-
-From your domain expertise:
-
-- If this were a paper in your field, what level would it reach? (top venue / mid-tier / workshop / unpublishable)
-- How does the writing compare to papers you normally read?
-- Is the method technically correct? Any obvious errors?
-- Is the experiment design reasonable?
-
-#### 💡 (b) Improvement Suggestions
-
-- Which stage produced the weakest output? (literature search / experiment design / code generation / paper writing)
-- Any obvious code errors or poor design choices?
-- Specific suggestions for improving the paper structure or writing?
-
-#### ⚖️ (c) Pipeline Design Assessment
-
-- Are the 23 stages well-designed? Any redundant or missing steps?
-- Is the iterative experiment refinement effective?
-- Is the LLM guidance at each stage appropriate?
-
-#### 🐛 (d) Bug Reports
-
-Please report any issues you find, as specifically as possible:
-
-- **Writing bugs:** grammar errors, repeated paragraphs, contradictions, references to non-existent figures
-- **Code bugs:** runtime errors, logic errors, data handling issues
-- **Result bugs:** all-zero results, NaN values, unreasonable metrics
-- **Pipeline bugs:** stages getting stuck, unexpected crashes, resource exhaustion
-
----
-
-## 📋 Feedback Template
-
-Copy the template below, fill it out, and save as `feedback_<your-name>.md`:
-
-````markdown
-# AutoResearchClaw — Test Feedback Report
-
-## Basic Information
-
-- **Tester Name:**
-- **Domain / Field:** (e.g., Computer Vision / NLP / Reinforcement Learning / Bioinformatics / ...)
-- **Test Date:**
-- **Code Version:** (output of `git log --oneline -1`, e.g., `44151b1 fix: Phase 3 regression test findings`)
-- **Research Topic (English):**
-- **LLM Model Used:** (e.g., gpt-5.4 / gpt-5.1 / claude-opus-4-6 / claude-sonnet-4-6)
-- **Experiment Mode:** (sandbox / docker)
-- **Total Runtime:** (~X minutes)
-- **Completed All 23 Stages?:** Yes / No (if No, which stage failed?)
-
----
-
-## 1. Quality Assessment (Score: 1–10)
-
-**My Score:** X / 10
-
-### 1.1 Overall Paper Quality
-- What level paper does this correspond to? (top venue / mid-tier / workshop / unpublishable)
-- Reason for score:
-
-### 1.2 Section-by-Section Assessment
-
-| Section | Score (1-10) | Comments |
-|---------|-------------|----------|
-| Title | | |
-| Abstract | | |
-| Introduction | | |
-| Related Work | | |
-| Method | | |
-| Experiment Design | | |
-| Results & Analysis | | |
-| Conclusion | | |
-| References | | |
-| Charts / Figures | | |
-| Code Quality | | |
-
-### 1.3 Comparison with Human-Written Papers
-- Compared to papers you normally read/write, where are the gaps?
-- Anything surprisingly good?
-
----
-
-## 2. Improvement Suggestions
-
-### 2.1 Top Issues (list 3-5, in priority order)
-
-1.
-2.
-3.
-
-### 2.2 Code Issues
-- Can the code run independently?
-- Does it use real datasets and baselines?
-- Specific code issues (if any):
-
-### 2.3 Writing Issues
-- Is the paper structure reasonable?
-- Is the technical description accurate?
-- Specific writing issues (if any):
-
----
-
-## 3. Pipeline Design Assessment
-
-### 3.1 Pipeline Flow
-- Is the 23-stage design reasonable?
-- Any redundant or missing steps?
-
-### 3.2 Experiment Execution
-- Is the experiment design sound? (dataset choices, comparison methods, metrics)
-- Is the iterative refinement effective?
-
-### 3.3 LLM Usage
-- How well did the LLM perform at each stage?
-- Any obvious "hallucinations" or unreasonable outputs?
-
----
-
-## 4. Bug Reports
-
-### 4.1 Writing Bugs
-| # | Location (section/paragraph) | Description | Severity (High/Med/Low) |
-|---|------------------------------|-------------|------------------------|
-| W1 | | | |
-| W2 | | | |
-
-### 4.2 Code Bugs
-| # | File / Line | Description | Severity (High/Med/Low) |
-|---|-------------|-------------|------------------------|
-| C1 | | | |
-| C2 | | | |
-
-### 4.3 Result Bugs
-| # | Description | Affected Metrics/Charts | Severity (High/Med/Low) |
-|---|-------------|------------------------|------------------------|
-| R1 | | | |
-| R2 | | | |
-
-### 4.4 Pipeline Bugs
-| # | Stage | Description | Severity (High/Med/Low) |
-|---|-------|-------------|------------------------|
-| P1 | | | |
-| P2 | | | |
-
----
-
-## 5. Additional Comments
-
-(Free-form: any observations, ideas, or suggestions you think would be valuable)
-
----
-
-## Attachments Checklist
-
-- [ ] Feedback report (`feedback_<name>.md`)
-- [ ] Full output directory (`artifacts/rc-XXXXXX.zip`)
-- [ ] Config file (`config.arc.yaml`, API keys removed)
-- [ ] Terminal log (optional)
-````
-
----
-
-## ❓ FAQ
-
-### Q1: Can I test without a GPU?
-
-**Yes!** Use `experiment.mode: "sandbox"` — the pipeline runs experiments on your CPU. The experiments will be simpler, but still enough for a full end-to-end test.
-
-### Q2: How much does an API call cost?
-
-A full pipeline run costs roughly **$5–15** in API fees, depending on the model, number of revision iterations, and experiment complexity. Top-tier models (GPT-5.4, Claude Opus 4.6) cost a bit more but produce significantly better results.
-
-### Q3: What if the pipeline crashes mid-run?
-
-Resume from the checkpoint:
-
-```bash
-researchclaw run --config config.arc.yaml --resume
-```
-
-### Q4: Can I use a non-English research topic?
-
-We recommend describing your topic in **English**. The pipeline's prompts, literature search, and paper generation are all English-based. If your idea is originally in another language, please translate it first.
-
-### Q5: What kind of research topic should I pick?
-
-Choose a **specific research question in a field you know well** — that way you can meaningfully assess whether the output is technically correct. Tips:
-
-- ✅ Pick topics with clear experimental validation (classification, regression, RL tasks, etc.)
-- ❌ Avoid overly broad or abstract topics (e.g., "AGI", "general intelligence")
-- ✅ Be specific: *"Investigating the effect of data augmentation strategies on few-shot learning for medical image classification"*
-
-### Q6: How do I use Docker mode? (Advanced)
-
-If you have an NVIDIA GPU with Docker + NVIDIA Container Toolkit:
-
-```bash
-# 1. Build the experiment image
-docker build -t researchclaw/experiment:latest researchclaw/docker/
-
-# 2. Update config.arc.yaml:
-#   experiment:
-#     mode: "docker"
-#     docker:
-#       gpu_enabled: true
-#       memory_limit_mb: 8192
-#       network_policy: "setup_only"  # recommended default
-
-# 3. Run
-researchclaw run --config config.arc.yaml --auto-approve
-```
-
-Docker mode uses a three-phase execution model: pip install (network on) → setup.py (network on) → experiment (network off). The image includes pre-cached datasets (CIFAR-10/100, MNIST, FashionMNIST, STL-10, SVHN) so standard benchmarks work without network access.
-
-### Q7: I tested before — what should I do for a re-test?
-
-**Always pull the latest code** before each test:
-
-```bash
-cd AutoResearchClaw
-git pull origin main
-pip install -e .
-```
-
-Then verify your version:
-
-```bash
-git log --oneline -1
-```
-
-Different versions can produce very different results. Always note the commit hash in your feedback report.
-
-### Q8: Where do I submit my feedback?
-
-Submit your feedback report and attachments through one of these channels:
-
-- **GitHub Issues:** [Open an issue](https://github.com/aiming-lab/AutoResearchClaw/issues) with the label `feedback`
-- **Pull Request:** Submit your `feedback_<name>.md` to the `community-feedback/` directory
-- **Email:** Contact the project maintainers (see repo for details)
-
----
-
-## 🌍 We Need Testers from Every Field
-
-The pipeline has been tested primarily on ML topics so far. We especially welcome testers from:
-
-- 🧬 **Bioinformatics & Computational Biology**
-- 🧪 **Chemistry & Materials Science**
-- 📊 **Statistics & Applied Mathematics**
-- 🤖 **Robotics & Control Systems**
-- 🗣️ **NLP & Computational Linguistics**
-- 👁️ **Computer Vision & Graphics**
-- 🎮 **Reinforcement Learning & Game Theory**
-- 🏥 **Medical AI & Healthcare**
-- 🌐 **Graph Learning & Network Science**
-- 💹 **Financial ML & Econometrics**
-- 🛰️ **Remote Sensing & Geospatial AI**
-
-...and any other field where computational experiments are involved!
-
----
-
-## 🙏 Thank You
-
-Every piece of feedback — big or small — directly improves AutoResearchClaw. Thank you for being part of this journey.
-
-<p align="center">
-  <b>⭐ If you find this project interesting, please give us a star on <a href="https://github.com/aiming-lab/AutoResearchClaw">GitHub</a>!</b>
-</p>
diff --git a/docs/TESTER_GUIDE_CN.md b/docs/TESTER_GUIDE_CN.md
deleted file mode 100644
index 5b707b7d..00000000
--- a/docs/TESTER_GUIDE_CN.md
+++ /dev/null
@@ -1,595 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="500" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center">🧪 社区测试指南</h2>
-
-<p align="center">
-  <b>欢迎来自各个领域的你，一起测试全球首个全自动学术论文生成 Pipeline。</b>
-</p>
-
-<p align="center">
-  <a href="https://github.com/aiming-lab/AutoResearchClaw">⭐ Star 项目</a> ·
-  <a href="#-快速开始">🚀 快速开始</a> ·
-  <a href="#-反馈报告模板">📋 反馈模板</a> ·
-  <a href="TESTER_GUIDE.md">🇬🇧 English</a> ·
-  <a href="TESTER_GUIDE_JA.md">🇯🇵 日本語テストガイド</a>
-</p>
-
----
-
-## 👋 你好，测试者！
-
-**AutoResearchClaw** 是一个全自动学术论文生成 Pipeline。你只需提供一个研究 idea，系统就会自动完成文献检索、实验设计、代码生成、实验执行、论文撰写、同行评审到最终交付的全部 **23 个阶段**——无需任何人工干预。
-
-我们正在寻找来自**各个学科和领域**的测试者——机器学习、NLP、计算机视觉、强化学习、生物信息学、物理学、社会科学……领域越多样，Pipeline 就能变得越好。
-
-**你的任务：** 用你自己的研究 idea 运行一次完整的 Pipeline，检查输出质量，然后向我们提交一份详细的反馈报告。就这么简单——你的每一条反馈都会直接推动下一个版本的改进。
-
----
-
-## 📋 目录
-
-1. [环境要求](#-环境要求)
-2. [安装与配置](#-安装与配置)
-3. [运行测试](#-运行测试)
-4. [查看交付结果](#-查看交付结果)
-5. [反馈报告要求](#-反馈报告要求)
-6. [反馈报告模板](#-反馈报告模板)
-7. [常见问题](#-常见问题)
-
----
-
-## 📦 环境要求
-
-| 项目 | 最低要求 | 推荐配置 |
-|------|---------|---------|
-| 操作系统 | macOS / Linux / WSL2 | Linux (Ubuntu 22.04+) |
-| Python | 3.11+ | 3.11 或 3.12 |
-| 磁盘空间 | 500 MB | 2 GB+ |
-| 内存 | 8 GB | 16 GB+ |
-| GPU | 非必须（sandbox 模式） | NVIDIA GPU + CUDA 12.x（docker 模式） |
-| 网络 | 需要（调用 LLM API + 文献检索） | 稳定的网络连接 |
-| LLM API Key | **必须** | OpenAI 或 Anthropic |
-
-### 🔑 关于 API Key
-
-Pipeline 在每个阶段都会调用大语言模型（LLM）来完成写作、编码、评审等任务。你需要准备一个 **OpenAI** 或 **Anthropic** 的 API Key。
-
-> **强烈建议使用最新、最强的模型以获得最佳效果：**
->
-> | 提供商 | 推荐模型 | 备选 |
-> |--------|---------|------|
-> | **OpenAI** | **GPT-5.4**（首选） | GPT-5.1 或 GPT-4.1 |
-> | **Anthropic** | **Claude Opus 4.6**（首选） | Claude Sonnet 4.6 |
->
-> 使用顶级模型会显著提升论文写作质量、代码生成准确性和实验设计合理性。较低版本的模型（如 gpt-4o）可能导致输出质量明显下降。
-
----
-
-## 🛠 安装与配置
-
-### ⚠️ 请务必使用最新版本
-
-> **本项目处于快速迭代阶段，** 代码更新频繁，不同版本之间的生成效果可能存在较大差异。
->
-> **每次测试前，请务必拉取最新代码：**
->
-> ```bash
-> cd AutoResearchClaw
-> git pull origin main
-> pip install -e .    # 重新安装以确保更新生效
-> ```
->
-> 记录你的版本号，方便填写反馈报告：
-> ```bash
-> git log --oneline -1
-> ```
-
----
-
-### 方式 A：使用 Claude Code（最快 ⚡ 推荐）
-
-如果你正在使用 [Claude Code](https://claude.ai/claude-code)（Anthropic 的 CLI 工具），直接粘贴以下内容即可：
-
-```
-请帮我克隆并安装 AutoResearchClaw 项目：
-https://github.com/aiming-lab/AutoResearchClaw.git
-
-如果已经克隆过，请先 git pull origin main 更新到最新版本。
-
-安装完成后，帮我创建一个配置文件，使用以下参数：
-- LLM: OpenAI，模型选择 gpt-5.4（或 Anthropic Claude Opus 4.6）
-- 实验模式: sandbox（本地沙盒执行）
-- 研究主题: "<在这里填入你的研究 idea>"
-- 自动审批所有 gate stage
-
-我的 API Key 是: sk-xxxx（请设为环境变量，不要写在配置文件里）
-```
-
-Claude Code 会自动完成克隆、安装依赖、创建配置文件、运行 Pipeline 的全部步骤。
-
-### 方式 B：手动安装
-
-```bash
-# 1. 克隆项目
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-
-# ⚠️ 如果已经克隆过，务必先更新！
-# git pull origin main
-
-# 2. 创建 Python 虚拟环境
-python3 -m venv .venv
-source .venv/bin/activate   # macOS / Linux
-# .venv\Scripts\activate    # Windows（推荐使用 WSL2）
-
-# 3. 安装项目
-pip install -e .
-
-# 4. 验证安装成功
-researchclaw --help
-```
-
-### ⚙️ 配置文件
-
-```bash
-cp config.researchclaw.example.yaml config.yaml
-```
-
-编辑 `config.yaml`，修改以下关键字段：
-
-```yaml
-# === 项目设置 ===
-project:
-  name: "my-test"
-  mode: "full-auto"
-
-# === 研究主题——用英文描述你的 idea ===
-research:
-  topic: "你的研究 idea，用英文描述，一两句话即可"
-  domains:
-    - "machine-learning"    # 可选: nlp, cv, rl, graph-learning, etc.
-
-# === LLM 配置——请使用最强模型！ ===
-#
-# 方案一：OpenAI（推荐 GPT-5.4）
-llm:
-  provider: "openai-compatible"
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-5.4"              # 首选最强模型
-  fallback_models:
-    - "gpt-5.1"
-    - "gpt-4.1"
-
-# 方案二：Anthropic Claude（推荐 Claude Opus 4.6）
-# llm:
-#   provider: "openai-compatible"
-#   base_url: "https://api.anthropic.com/v1"
-#   api_key_env: "ANTHROPIC_API_KEY"
-#   primary_model: "claude-opus-4-6"
-#   fallback_models:
-#     - "claude-sonnet-4-6"
-
-# === 实验模式 ===
-experiment:
-  mode: "sandbox"                # sandbox = 本地执行（推荐）
-  time_budget_sec: 600           # 每次实验最长运行时间（秒）
-  max_iterations: 10
-  metric_key: "primary_metric"
-  metric_direction: "minimize"   # 或 "maximize"
-```
-
-### 🔐 设置 API Key
-
-```bash
-# OpenAI 用户：
-export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxx"
-
-# Anthropic 用户：
-export ANTHROPIC_API_KEY="sk-ant-xxxxxxxxxxxxxxxxxxxxxxxx"
-
-# 可选：Semantic Scholar API Key（可加快文献检索）
-export S2_API_KEY="your-s2-key"
-```
-
-> **🔒 安全提醒：** 请勿将 API Key 硬编码在任何文件中。使用 `api_key_env` 指定环境变量名即可。
-
----
-
-## 🚀 运行测试
-
-### 快速开始
-
-```bash
-source .venv/bin/activate
-export OPENAI_API_KEY="sk-xxxx"       # 或 ANTHROPIC_API_KEY
-
-researchclaw run --config config.yaml --auto-approve
-```
-
-### 指定研究主题运行
-
-```bash
-researchclaw run \
-  --config config.yaml \
-  --topic "Investigating the effect of curriculum learning on image classification with adaptive difficulty scheduling" \
-  --auto-approve
-```
-
-### ⏱ 预估运行时间
-
-| 实验模式 | 预估时间 | 说明 |
-|---------|---------|------|
-| sandbox | 30 分钟 ~ 2 小时 | 取决于实验复杂度和 API 响应速度 |
-| docker (GPU) | 1 ~ 4 小时 | 可运行更复杂的深度学习实验 |
-
-运行过程中终端会实时显示当前阶段和进度。**无需任何手动操作**，安心等待即可。
-
-### ✅ 如何知道运行结束
-
-当看到类似以下输出时，表示 Pipeline 已成功完成：
-
-```
-[Stage 23/23] ✓ Deliverables packaged
-Pipeline complete — deliverables at: artifacts/rc-20260315-XXXXXX-YYYY/deliverables/
-```
-
-### 🔄 如果运行中断
-
-Pipeline 支持断点续跑：
-
-```bash
-researchclaw run --config config.yaml --resume
-```
-
----
-
-## 🔍 查看交付结果
-
-运行结束后，输出文件位于 `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` 目录下。
-
-### 📂 交付物清单
-
-| 文件/目录 | 内容 |
-|----------|------|
-| `paper_final.md` | 最终论文（Markdown 格式，5,000–6,500 词） |
-| `paper.tex` | 会议格式 LaTeX 源文件（可直接编译为 PDF） |
-| `references.bib` | BibTeX 参考文献（经过引用验证） |
-| `code/main.py` | 自动生成的实验代码 |
-| `code/requirements.txt` | 实验代码的 Python 依赖 |
-| `charts/` | 实验结果可视化图表（PNG 格式） |
-| `verification_report.json` | 引用完整性验证报告 |
-| `manifest.json` | 交付物清单及元信息 |
-
-### 🔎 重点检查项
-
-1. **论文内容**（`paper_final.md` 或 `paper.tex`）
-   - 标题是否合理、与主题相关
-   - 摘要是否清晰概述了问题、方法、结果
-   - 相关工作是否引用了该领域的关键文献
-   - 方法描述是否清晰、技术上正确
-   - 实验设计是否合理（数据集、baselines、评估指标）
-   - 结果是否有意义（不是全零、不是 NaN）
-   - 结论是否与实验结果一致
-
-2. **实验代码**（`code/main.py`）
-   - 代码是否能独立运行
-   - 是否使用了真实数据集（而非随机生成的假数据）
-   - 是否实现了论文中描述的方法
-   - 是否包含合理的超参数设置
-
-3. **图表**（`charts/`）
-   - 图表是否清晰可读
-   - 坐标轴标签是否正确
-   - 数据是否与论文描述一致
-
-4. **引用**（`references.bib`）
-   - 引用的论文是否真实存在
-   - 引用是否与论文讨论的内容相关
-
-### 📊 自动质量评估报告
-
-Pipeline 会自动生成一份质量评估报告，位于 `stage-20/quality_report.json`，其中包含：
-
-- `score_1_to_10` — 自动评分
-- `verdict` — 接收/拒绝建议
-- `strengths` — 优点列表
-- `weaknesses` — 缺点列表
-- `required_actions` — 建议的改进事项
-
-请在你的反馈报告中参考此评估，并补充你自己的专业判断。
-
----
-
-## 📝 反馈报告要求
-
-**你的反馈是本项目改进的核心依据。** 无论是批评还是肯定，对我们都同样重要——请务必认真、详细地填写。
-
-### 需要提交的内容
-
-| # | 提交内容 | 说明 |
-|---|---------|------|
-| F1 | **反馈报告**（按下方模板填写） | Markdown 格式，命名为 `feedback_<你的名字>.md` |
-| F2 | **完整输出目录** | 将整个 `artifacts/rc-XXXXXX/` 目录打包提交（`.zip` 或 `.tar.gz`） |
-| F3 | **配置文件** | 你使用的 `config.yaml`（**删除 API Key 后**提交） |
-| F4 | **终端日志**（可选但推荐） | 运行时的终端输出，便于我们排查问题 |
-
-### 反馈的四个维度
-
-#### 🎯 (a) 质量评价
-
-请从你的专业领域角度评价产出论文的质量：
-
-- 如果这是你所在领域的论文，它能达到什么水平？（顶会 / 一般会议 / 无法发表）
-- 与你读过的该领域论文相比，写作质量如何？
-- 方法的技术正确性如何？有无明显错误？
-- 实验设计的合理性如何？
-
-#### 💡 (b) 优化建议
-
-请指出你认为可以改进的地方：
-
-- 哪个阶段的输出质量最差？（文献检索 / 实验设计 / 代码生成 / 论文撰写）
-- 代码中有没有明显写错或不合理的地方？
-- 论文结构或表述有什么具体的改进建议？
-
-#### ⚖️ (c) 合理性评估
-
-请评估 Pipeline 流程的合理性：
-
-- 23 个阶段的设计是否合理？有没有多余或缺失的步骤？
-- 实验迭代优化的过程是否有效？
-- LLM 生成内容的引导方式是否合理？
-
-#### 🐛 (d) Bug 报告
-
-请尽可能详细地报告你发现的任何问题：
-
-- **写作 Bug**：语法错误、重复段落、前后矛盾、引用不存在的图表
-- **代码 Bug**：运行报错、逻辑错误、数据处理问题
-- **结果 Bug**：全零结果、NaN 值、指标不合理
-- **流程 Bug**：阶段卡住、异常中断、资源耗尽
-
----
-
-## 📋 反馈报告模板
-
-请复制以下模板，填写后保存为 `feedback_<你的名字>.md`：
-
-````markdown
-# AutoResearchClaw 测试反馈报告
-
-## 基本信息
-
-- **测试人员**：
-- **所属领域**：（例如：计算机视觉 / 自然语言处理 / 强化学习 / 生物信息 / ...）
-- **测试日期**：
-- **代码版本**：（运行 `git log --oneline -1` 的输出，例如：`44151b1 fix: Phase 3 regression test findings`）
-- **研究主题（英文）**：
-- **使用的 LLM 模型**：（例如：gpt-5.4 / gpt-5.1 / claude-opus-4-6 / claude-sonnet-4-6）
-- **实验模式**：（sandbox / docker）
-- **运行总时长**：（约 X 分钟）
-- **是否成功完成 23 个阶段**：是 / 否（如否，请说明卡在哪个阶段）
-
----
-
-## 一、质量评价（总分 1-10）
-
-**我的评分**：X / 10
-
-### 1.1 论文整体质量
-- 相当于什么级别的论文？（顶会 / 一般会议 / workshop / 无法发表）
-- 简要说明评分理由：
-
-### 1.2 各部分质量评价
-
-| 部分 | 评分 (1-10) | 评价说明 |
-|------|-----------|---------|
-| 标题 | | |
-| 摘要 | | |
-| 引言 | | |
-| 相关工作 | | |
-| 方法 | | |
-| 实验设计 | | |
-| 结果与分析 | | |
-| 结论 | | |
-| 参考文献 | | |
-| 图表质量 | | |
-| 代码质量 | | |
-
-### 1.3 与人工撰写论文的对比
-- 与你平时阅读/撰写的论文相比，差距在哪里？
-- 有哪些方面出乎意料地好？
-
----
-
-## 二、优化建议
-
-### 2.1 最需要改进的环节
-（请列出 3-5 个最需要改进的具体问题，按优先级排序）
-
-1.
-2.
-3.
-
-### 2.2 代码问题
-- 代码是否能独立运行？
-- 是否使用了真实数据集和基线方法？
-- 具体代码问题（如有）：
-
-### 2.3 写作问题
-- 论文结构是否合理？
-- 技术描述是否准确？
-- 具体写作问题（如有）：
-
----
-
-## 三、合理性评估
-
-### 3.1 Pipeline 流程评价
-- 23 个阶段的流程设计是否合理？
-- 有没有你认为多余或缺失的步骤？
-
-### 3.2 实验执行评价
-- 实验设计是否合理？（数据集选择、对比方法、评估指标）
-- 迭代优化过程是否有效？
-
-### 3.3 LLM 使用评价
-- LLM 在各阶段的表现如何？
-- 有没有明显的"幻觉"或不合理的生成内容？
-
----
-
-## 四、Bug 报告
-
-### 4.1 写作 Bug
-| 编号 | 位置（章节/段落） | 描述 | 严重程度 (高/中/低) |
-|------|-----------------|------|-------------------|
-| W1 | | | |
-| W2 | | | |
-
-### 4.2 代码 Bug
-| 编号 | 文件/行号 | 描述 | 严重程度 (高/中/低) |
-|------|----------|------|-------------------|
-| C1 | | | |
-| C2 | | | |
-
-### 4.3 结果 Bug
-| 编号 | 描述 | 涉及指标/图表 | 严重程度 (高/中/低) |
-|------|------|-------------|-------------------|
-| R1 | | | |
-| R2 | | | |
-
-### 4.4 流程 Bug
-| 编号 | 阶段 | 描述 | 严重程度 (高/中/低) |
-|------|------|------|-------------------|
-| P1 | | | |
-| P2 | | | |
-
----
-
-## 五、其他建议
-
-（自由发挥：任何你觉得有价值的观察、建议或想法）
-
----
-
-## 附件清单
-
-- [ ] 反馈报告 (`feedback_<名字>.md`)
-- [ ] 完整输出目录 (`artifacts/rc-XXXXXX.zip`)
-- [ ] 配置文件 (`config.yaml`，已删除 API Key)
-- [ ] 终端日志（可选）
-````
-
----
-
-## ❓ 常见问题
-
-### Q1: 没有 GPU 能测试吗？
-
-**当然可以！** 使用 `experiment.mode: "sandbox"` 模式，Pipeline 会在本地 CPU 上运行实验。虽然实验规模会受限，但足以完成一次完整的端到端测试。
-
-### Q2: API 调用大概要花多少钱？
-
-一次完整的 Pipeline 运行约消耗 **$5–15** 的 API 费用，取决于所选模型、论文修订次数和实验复杂度。顶级模型（GPT-5.4、Claude Opus 4.6）费用稍高，但产出质量显著更好，推荐优先使用。
-
-### Q3: Pipeline 运行中断了怎么办？
-
-从断点继续即可：
-
-```bash
-researchclaw run --config config.yaml --resume
-```
-
-### Q4: 可以用中文主题吗？
-
-建议使用 **英文** 描述你的研究主题。Pipeline 的提示词、文献检索和论文生成均以英文为主。如果你的 idea 原始语言是中文，请先翻译成英文。
-
-### Q5: 我应该选什么样的研究主题？
-
-选择你**熟悉的领域内的一个具体研究问题**——这样你才能有效评估论文的技术正确性。建议：
-
-- ✅ 选择有明确实验验证方法的主题（分类、回归、强化学习任务等）
-- ❌ 避免过于宏大或抽象的主题（如 "AGI" 或 "通用人工智能"）
-- ✅ 描述要具体，例如：*"Investigating the effect of data augmentation strategies on few-shot learning for medical image classification"*
-
-### Q6: 如何使用 Docker 模式？（进阶）
-
-如果你有 NVIDIA GPU 并安装了 Docker + NVIDIA Container Toolkit：
-
-```bash
-# 1. 构建实验镜像
-docker build -t researchclaw/experiment:latest researchclaw/docker/
-
-# 2. 修改 config.yaml:
-#   experiment:
-#     mode: "docker"
-#     docker:
-#       gpu_enabled: true
-#       memory_limit_mb: 8192
-#       network_policy: "setup_only"  # 推荐默认值
-
-# 3. 运行
-researchclaw run --config config.yaml --auto-approve
-```
-
-Docker 模式采用三阶段执行：pip install（联网）→ setup.py（联网）→ 实验代码（断网）。镜像已预缓存常用数据集（CIFAR-10/100、MNIST、FashionMNIST、STL-10、SVHN），标准基准测试无需网络。
-
-### Q7: 我之前已经测试过了，再次测试需要注意什么？
-
-**每次测试前务必拉取最新代码：**
-
-```bash
-cd AutoResearchClaw
-git pull origin main
-pip install -e .
-```
-
-然后确认版本号：
-
-```bash
-git log --oneline -1
-```
-
-不同版本的生成效果可能差异很大，请在反馈报告中注明你使用的 commit hash。
-
-### Q8: 反馈提交到哪里？
-
-你可以通过以下任一渠道提交反馈：
-
-- **GitHub Issues：** [提交 Issue](https://github.com/aiming-lab/AutoResearchClaw/issues)，添加 `feedback` 标签
-- **Pull Request：** 将 `feedback_<名字>.md` 提交到 `community-feedback/` 目录
-- **邮件：** 联系项目维护者（详见仓库主页）
-
----
-
-## 🌍 我们需要来自各个领域的测试者
-
-目前 Pipeline 主要在机器学习领域进行了测试，我们特别欢迎来自以下领域的测试者：
-
-- 🧬 **生物信息学与计算生物学**
-- 🧪 **化学与材料科学**
-- 📊 **统计学与应用数学**
-- 🤖 **机器人学与控制系统**
-- 🗣️ **NLP 与计算语言学**
-- 👁️ **计算机视觉与图形学**
-- 🎮 **强化学习与博弈论**
-- 🏥 **医学 AI 与医疗健康**
-- 🌐 **图学习与网络科学**
-- 💹 **金融 ML 与计量经济学**
-- 🛰️ **遥感与地理空间 AI**
-
-……以及任何涉及计算实验的领域！
-
----
-
-## 🙏 感谢你的参与
-
-你的每一条反馈——无论大小——都在直接推动 AutoResearchClaw 变得更好。感谢你成为这段旅程的一部分。
-
-<p align="center">
-  <b>⭐ 如果你觉得这个项目有趣，请在 <a href="https://github.com/aiming-lab/AutoResearchClaw">GitHub</a> 上给我们一颗 Star！</b>
-</p>
diff --git a/docs/TESTER_GUIDE_JA.md b/docs/TESTER_GUIDE_JA.md
deleted file mode 100644
index 51a8acf7..00000000
--- a/docs/TESTER_GUIDE_JA.md
+++ /dev/null
@@ -1,587 +0,0 @@
-<p align="center">
-  <img src="../image/logo.png" width="500" alt="AutoResearchClaw Logo">
-</p>
-
-<h2 align="center">🧪 コミュニティテストガイド</h2>
-
-<p align="center">
-  <b>世界初の完全自律型研究パイプラインを、あらゆる分野でストレステストするためにご協力ください。</b>
-</p>
-
-<p align="center">
-  <a href="https://github.com/aiming-lab/AutoResearchClaw">⭐ リポジトリにスターを付ける</a> ·
-  <a href="#-クイックスタート">🚀 クイックスタート</a> ·
-  <a href="#-フィードバックテンプレート">📋 フィードバックテンプレート</a> ·
-  <a href="TESTER_GUIDE.md">🇺🇸 English Testing Guide</a> ·
-  <a href="TESTER_GUIDE_CN.md">🇨🇳 中文测试指南</a>
-</p>
-
----
-
-## 👋 テスターの皆さんへ
-
-**AutoResearchClaw** は、完全自律型の学術論文生成パイプラインです。研究アイデアを入力するだけで、文献検索、実験設計、コード生成、実験実行、論文執筆、査読、最終成果物の作成まで、すべてを自動で処理します。**23ステージ、人手介入ゼロ。**
-
-**あらゆる分野・バックグラウンド**のテスターを募集しています — 機械学習、NLP、コンピュータビジョン、強化学習、バイオインフォマティクス、物理学、社会科学など。テストが多様であるほど、パイプラインの改善に繋がります。
-
-**あなたのミッション：** 自分の研究アイデアでパイプラインを実行し、出力を検査して、詳細なフィードバックレポートを提出してください。それだけです。すべてのフィードバックが次のバージョンに直接反映されます。
-
----
-
-## 📋 目次
-
-1. [前提条件](#-前提条件)
-2. [インストールとセットアップ](#-インストールとセットアップ)
-3. [パイプラインの実行](#-パイプラインの実行)
-4. [出力の確認](#-出力の確認)
-5. [フィードバックレポートの要件](#-フィードバックレポートの要件)
-6. [フィードバックテンプレート](#-フィードバックテンプレート)
-7. [FAQ](#-faq)
-
----
-
-## 📦 前提条件
-
-| 項目 | 最小要件 | 推奨 |
-|------|---------|------|
-| OS | macOS / Linux / WSL2 | Linux (Ubuntu 22.04+) |
-| Python | 3.11+ | 3.11 または 3.12 |
-| ディスク | 500 MB | 2 GB+ |
-| RAM | 8 GB | 16 GB+ |
-| GPU | 不要（sandboxモード） | NVIDIA GPU + CUDA 12.x（dockerモード） |
-| ネットワーク | 必要（LLM API + 文献検索） | 安定した接続 |
-| LLM APIキー | **必須** | OpenAI または Anthropic |
-
-### 🔑 APIキーについて
-
-パイプラインは、執筆、コーディング、レビューなど、すべてのステージで大規模言語モデル（LLM）を呼び出します。**OpenAI** または **Anthropic** のAPIキーが必要です。
-
-> **最良の結果を得るために、利用可能な最も高性能なモデルの使用を強く推奨します：**
->
-> | プロバイダー | 推奨モデル | フォールバック |
-> |-------------|-----------|--------------|
-> | **OpenAI** | **GPT-5.4**（最良） | GPT-5.1 または GPT-4.1 |
-> | **Anthropic** | **Claude Opus 4.6**（最良） | Claude Sonnet 4.6 |
->
-> トップティアのモデルを使用することで、論文の品質、コードの正確性、実験設計が大幅に向上します。古いモデル（例：GPT-4o）では、出力品質が著しく低下する可能性があります。
-
----
-
-## 🛠 インストールとセットアップ
-
-### ⚠️ 常に最新バージョンを使用してください
-
-> **このプロジェクトは活発に開発中です。** コードベースは頻繁に更新され、バージョンによって結果が大きく異なる場合があります。
->
-> **テスト実行の前に、必ず最新のコードをプルしてください：**
->
-> ```bash
-> cd AutoResearchClaw
-> git pull origin main
-> pip install -e .    # 変更を反映するために再インストール
-> ```
->
-> フィードバックレポート用にバージョンを記録してください：
-> ```bash
-> git log --oneline -1
-> ```
-
----
-
-### オプションA：Claude Code（最速 — 推奨 ⚡）
-
-[Claude Code](https://claude.ai/claude-code)（AnthropicのCLIツール）をお持ちの場合、以下を貼り付けるだけです：
-
-```
-Please clone and install AutoResearchClaw:
-https://github.com/aiming-lab/AutoResearchClaw.git
-
-If already cloned, run git pull origin main to update to the latest version first.
-
-Then create a config file with:
-- LLM: OpenAI with gpt-5.4 (or Anthropic Claude Opus 4.6)
-- Experiment mode: sandbox (local execution)
-- Research topic: "<ここに研究アイデアを入力>"
-- Auto-approve all gate stages
-
-My API key is: sk-xxxx (set it as an environment variable, don't hardcode it)
-```
-
-Claude Codeがクローン、依存関係、設定、実行をすべて自動で処理します。
-
-### オプションB：手動インストール
-
-```bash
-# 1. リポジトリをクローン
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-
-# 2. 仮想環境を作成
-python3 -m venv .venv
-source .venv/bin/activate       # macOS / Linux
-# .venv\Scripts\activate        # Windows（WSL2推奨）
-
-# 3. インストール
-pip install -e .
-
-# 4. 動作確認
-researchclaw --help
-```
-
-### ⚙️ 設定
-
-```bash
-cp config.researchclaw.example.yaml config.yaml
-```
-
-`config.yaml` を編集してください — 主要なフィールドは以下の通りです：
-
-```yaml
-# === プロジェクト ===
-project:
-  name: "my-test"
-  mode: "full-auto"
-
-# === 研究トピック — アイデアを英語で記述してください ===
-research:
-  topic: "Your research idea in 1-2 sentences"
-  domains:
-    - "machine-learning"     # 選択肢: nlp, cv, rl, graph-learning など
-
-# === LLM — 利用可能な最も高性能なモデルを使用してください！ ===
-#
-# オプション1: OpenAI（GPT-5.4推奨）
-llm:
-  provider: "openai-compatible"
-  base_url: "https://api.openai.com/v1"
-  api_key_env: "OPENAI_API_KEY"
-  primary_model: "gpt-5.4"              # 最良のモデル
-  fallback_models:
-    - "gpt-5.1"
-    - "gpt-4.1"
-
-# オプション2: Anthropic Claude（Claude Opus 4.6推奨）
-# llm:
-#   provider: "openai-compatible"
-#   base_url: "https://api.anthropic.com/v1"
-#   api_key_env: "ANTHROPIC_API_KEY"
-#   primary_model: "claude-opus-4-6"
-#   fallback_models:
-#     - "claude-sonnet-4-6"
-
-# === 実験 ===
-experiment:
-  mode: "sandbox"                # sandbox = ローカル実行（推奨）
-  time_budget_sec: 600           # 実験実行あたりの最大秒数
-  max_iterations: 10
-  metric_key: "primary_metric"
-  metric_direction: "minimize"   # または "maximize"
-```
-
-### 🔐 APIキーの設定
-
-```bash
-# OpenAIユーザー：
-export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxx"
-
-# Anthropicユーザー：
-export ANTHROPIC_API_KEY="sk-ant-xxxxxxxxxxxxxxxxxxxxxxxx"
-
-# オプション：Semantic Scholar APIキー（文献検索を高速化）
-export S2_API_KEY="your-s2-key"
-```
-
-> **🔒 セキュリティ：** APIキーをファイルにハードコードしないでください。設定ファイルの `api_key_env` を使用して環境変数を参照してください。
-
----
-
-## 🚀 パイプラインの実行
-
-### クイックスタート
-
-```bash
-source .venv/bin/activate
-export OPENAI_API_KEY="sk-xxxx"       # または ANTHROPIC_API_KEY
-
-researchclaw run --config config.yaml --auto-approve
-```
-
-### 特定のトピックを指定する場合
-
-```bash
-researchclaw run \
-  --config config.yaml \
-  --topic "Investigating the effect of curriculum learning on image classification with adaptive difficulty scheduling" \
-  --auto-approve
-```
-
-### ⏱ 想定実行時間
-
-| モード | 推定時間 | 備考 |
-|--------|---------|------|
-| sandbox | 30分 〜 2時間 | 実験の複雑さとAPIの速度に依存 |
-| docker (GPU) | 1 〜 4時間 | より大規模なディープラーニング実験向け |
-
-ターミナルにリアルタイムで進捗が表示されます。**手動介入は不要です** — あとは実行完了を待つだけです。
-
-### ✅ 完了の確認方法
-
-以下のような出力が表示されます：
-
-```
-[Stage 23/23] ✓ Deliverables packaged
-Pipeline complete — deliverables at: artifacts/rc-20260315-XXXXXX-YYYY/deliverables/
-```
-
-### 🔄 中断された場合
-
-パイプラインはチェックポイントをサポートしています — 再開するだけです：
-
-```bash
-researchclaw run --config config.yaml --resume
-```
-
----
-
-## 🔍 出力の確認
-
-完了後、結果は `artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/` に格納されます。
-
-### 📂 成果物
-
-| ファイル / ディレクトリ | 説明 |
-|------------------------|------|
-| `paper_final.md` | Markdown形式の最終論文（5,000〜6,500語） |
-| `paper.tex` | 学会投稿可能なLaTeXソース（直接コンパイル可能） |
-| `references.bib` | BibTeX参考文献（検証済みの引用） |
-| `code/main.py` | 自動生成された実験コード |
-| `code/requirements.txt` | 実験用のPython依存関係 |
-| `charts/` | 結果の可視化チャート（PNG） |
-| `verification_report.json` | 引用整合性の検証レポート |
-| `manifest.json` | メタデータ付きの成果物マニフェスト |
-
-### 🔎 確認すべきポイント
-
-1. **論文の内容** (`paper_final.md` または `paper.tex`)
-   - タイトルはトピックに関連しているか？
-   - アブストラクトは問題、手法、結果を明確に述べているか？
-   - 関連研究はその分野の主要な論文を引用しているか？
-   - 手法の記述は技術的に正確か？
-   - 実験設計は妥当か（データセット、ベースライン、指標）？
-   - 結果は有意義か（すべてゼロやNaNではないか）？
-   - 結論は実験結果と一貫しているか？
-
-2. **実験コード** (`code/main.py`)
-   - 単独で実行できるか？
-   - 実際のデータセットを使用しているか（ランダム生成の偽データではないか）？
-   - 論文に記述された内容を実装しているか？
-   - ハイパーパラメータは妥当か？
-
-3. **チャート** (`charts/`)
-   - 読みやすく整理されているか？
-   - 軸ラベルは正しいか？
-   - データは論文の主張と一致しているか？
-
-4. **参考文献** (`references.bib`)
-   - 引用された論文は実在するか？
-   - 引用は議論に関連しているか？
-
-### 📊 自動生成品質レポート
-
-パイプラインは `stage-20/quality_report.json` に品質評価を出力します。内容は以下の通りです：
-
-- `score_1_to_10` — 自動品質スコア
-- `verdict` — 受理 / 却下の推奨
-- `strengths` — 良かった点
-- `weaknesses` — 特定された問題点
-- `required_actions` — 改善提案
-
-フィードバックでこれを参照し、ご自身の専門的な判断も加えてください。
-
----
-
-## 📝 フィードバックレポートの要件
-
-**あなたのフィードバックは、このプロジェクトを改善するための最も重要なインプットです。** 徹底的かつ正直に記述してください — 批判的なフィードバックも称賛と同様に価値があります。
-
-### 提出物
-
-| # | 項目 | 詳細 |
-|---|------|------|
-| F1 | **フィードバックレポート**（以下のテンプレートを使用） | Markdown形式、ファイル名は `feedback_<your-name>.md` |
-| F2 | **出力ディレクトリ一式** | `artifacts/rc-XXXXXX/` ディレクトリ全体をZip圧縮 |
-| F3 | **設定ファイル** | `config.yaml`（**APIキーを事前に削除してください！**） |
-| F4 | **ターミナルログ**（任意だが推奨） | 実行中のターミナル出力のコピー |
-
-### フィードバックの4つの観点
-
-#### 🎯 (a) 品質評価
-
-あなたの専門知識から：
-
-- この論文があなたの分野で発表されたとしたら、どのレベルに達するか？（トップ会議 / 中堅 / ワークショップ / 出版不可）
-- 普段読む論文と比較して、文章の質はどうか？
-- 手法は技術的に正確か？明らかな誤りはないか？
-- 実験設計は妥当か？
-
-#### 💡 (b) 改善提案
-
-- どのステージの出力が最も弱いか？（文献検索 / 実験設計 / コード生成 / 論文執筆）
-- 明らかなコードエラーや設計上の問題はないか？
-- 論文の構成や執筆の改善に関する具体的な提案は？
-
-#### ⚖️ (c) パイプライン設計の評価
-
-- 23ステージの設計は適切か？冗長または不足しているステップはないか？
-- 反復的な実験改善は効果的か？
-- 各ステージでのLLMの指示は適切か？
-
-#### 🐛 (d) バグ報告
-
-発見した問題をできるだけ具体的に報告してください：
-
-- **文章のバグ：** 文法エラー、段落の繰り返し、矛盾、存在しない図への参照
-- **コードのバグ：** ランタイムエラー、ロジックエラー、データ処理の問題
-- **結果のバグ：** すべてゼロの結果、NaN値、不合理な指標
-- **パイプラインのバグ：** ステージの停止、予期しないクラッシュ、リソース枯渇
-
----
-
-## 📋 フィードバックテンプレート
-
-以下のテンプレートをコピーし、記入して `feedback_<your-name>.md` として保存してください：
-
-````markdown
-# AutoResearchClaw — テストフィードバックレポート
-
-## 基本情報
-
-- **テスター名：**
-- **専門分野：** （例：コンピュータビジョン / NLP / 強化学習 / バイオインフォマティクス / ...）
-- **テスト日：**
-- **コードバージョン：** （`git log --oneline -1` の出力、例：`44151b1 fix: Phase 3 regression test findings`）
-- **研究トピック（英語）：**
-- **使用したLLMモデル：** （例：gpt-5.4 / gpt-5.1 / claude-opus-4-6 / claude-sonnet-4-6）
-- **実験モード：** （sandbox / docker）
-- **合計実行時間：** （約X分）
-- **全23ステージ完了？：** はい / いいえ（いいえの場合、どのステージで失敗？）
-
----
-
-## 1. 品質評価（スコア：1〜10）
-
-**私のスコア：** X / 10
-
-### 1.1 論文全体の品質
-- この論文はどのレベルに相当するか？（トップ会議 / 中堅 / ワークショップ / 出版不可）
-- スコアの理由：
-
-### 1.2 セクション別評価
-
-| セクション | スコア (1-10) | コメント |
-|-----------|-------------|---------|
-| タイトル | | |
-| アブストラクト | | |
-| イントロダクション | | |
-| 関連研究 | | |
-| 手法 | | |
-| 実験設計 | | |
-| 結果と分析 | | |
-| 結論 | | |
-| 参考文献 | | |
-| チャート / 図表 | | |
-| コード品質 | | |
-
-### 1.3 人間が書いた論文との比較
-- 普段読み書きする論文と比較して、どこにギャップがあるか？
-- 意外に良かった点は？
-
----
-
-## 2. 改善提案
-
-### 2.1 主要な問題点（優先順位で3〜5つ）
-
-1.
-2.
-3.
-
-### 2.2 コードの問題
-- コードは単独で実行できるか？
-- 実際のデータセットとベースラインを使用しているか？
-- 具体的なコードの問題（もしあれば）：
-
-### 2.3 文章の問題
-- 論文の構成は妥当か？
-- 技術的な記述は正確か？
-- 具体的な文章の問題（もしあれば）：
-
----
-
-## 3. パイプライン設計の評価
-
-### 3.1 パイプラインフロー
-- 23ステージの設計は妥当か？
-- 冗長または不足しているステップはないか？
-
-### 3.2 実験実行
-- 実験設計は妥当か？（データセットの選択、比較手法、指標）
-- 反復的な改善は効果的か？
-
-### 3.3 LLMの使用
-- 各ステージでのLLMのパフォーマンスはどうか？
-- 明らかな「ハルシネーション」や不合理な出力はないか？
-
----
-
-## 4. バグ報告
-
-### 4.1 文章のバグ
-| # | 場所（セクション/段落） | 説明 | 重要度（高/中/低） |
-|---|------------------------|------|-------------------|
-| W1 | | | |
-| W2 | | | |
-
-### 4.2 コードのバグ
-| # | ファイル / 行 | 説明 | 重要度（高/中/低） |
-|---|--------------|------|-------------------|
-| C1 | | | |
-| C2 | | | |
-
-### 4.3 結果のバグ
-| # | 説明 | 影響を受ける指標/チャート | 重要度（高/中/低） |
-|---|------|--------------------------|-------------------|
-| R1 | | | |
-| R2 | | | |
-
-### 4.4 パイプラインのバグ
-| # | ステージ | 説明 | 重要度（高/中/低） |
-|---|---------|------|-------------------|
-| P1 | | | |
-| P2 | | | |
-
----
-
-## 5. その他のコメント
-
-（自由記述：有益と思われる観察、アイデア、提案など）
-
----
-
-## 添付チェックリスト
-
-- [ ] フィードバックレポート (`feedback_<name>.md`)
-- [ ] 出力ディレクトリ一式 (`artifacts/rc-XXXXXX.zip`)
-- [ ] 設定ファイル (`config.yaml`、APIキー削除済み)
-- [ ] ターミナルログ（任意）
-````
-
----
-
-## ❓ FAQ
-
-### Q1: GPUなしでテストできますか？
-
-**はい！** `experiment.mode: "sandbox"` を使用してください — パイプラインはCPU上で実験を実行します。実験はシンプルになりますが、エンドツーエンドの完全なテストには十分です。
-
-### Q2: API呼び出しの費用はどのくらいですか？
-
-パイプラインの完全な実行は、モデル、修正反復回数、実験の複雑さに応じて、APIの費用が約**$5〜15**かかります。トップティアのモデル（GPT-5.4、Claude Opus 4.6）はやや高価ですが、大幅に良い結果を生成します。
-
-### Q3: パイプラインが実行中にクラッシュした場合は？
-
-チェックポイントから再開してください：
-
-```bash
-researchclaw run --config config.yaml --resume
-```
-
-### Q4: 英語以外の研究トピックを使用できますか？
-
-トピックは**英語**で記述することを推奨します。パイプラインのプロンプト、文献検索、論文生成はすべて英語ベースです。アイデアが他の言語の場合は、事前に翻訳してください。
-
-### Q5: どのような研究トピックを選べばよいですか？
-
-**自分がよく知っている分野の具体的な研究課題**を選んでください — そうすることで、出力が技術的に正確かどうかを意味のある形で評価できます。ヒント：
-
-- ✅ 明確な実験的検証があるトピックを選ぶ（分類、回帰、強化学習タスクなど）
-- ❌ 過度に広範または抽象的なトピックは避ける（例：「AGI」、「汎用知能」）
-- ✅ 具体的に：*"医用画像分類におけるFew-shot学習に対するデータ拡張戦略の効果の調査"*
-
-### Q6: Dockerモードの使用方法は？（上級者向け）
-
-NVIDIA GPUとDocker + NVIDIA Container Toolkitがある場合：
-
-```bash
-# 1. 実験用イメージをビルド
-docker build -t researchclaw/experiment:latest researchclaw/docker/
-
-# 2. config.yamlを更新：
-#   experiment:
-#     mode: "docker"
-#     docker:
-#       gpu_enabled: true
-#       memory_limit_mb: 8192
-#       network_policy: "setup_only"  # 推奨デフォルト
-
-# 3. 実行
-researchclaw run --config config.yaml --auto-approve
-```
-
-Dockerモードは3フェーズの実行モデルを使用します：pip install（ネットワーク有効）→ setup.py（ネットワーク有効）→ 実験（ネットワーク無効）。イメージにはプリキャッシュされたデータセット（CIFAR-10/100、MNIST、FashionMNIST、STL-10、SVHN）が含まれているため、標準的なベンチマークはネットワークアクセスなしで動作します。
-
-### Q7: 以前テストしましたが、再テストの場合はどうすればよいですか？
-
-テストの前に**必ず最新のコードをプル**してください：
-
-```bash
-cd AutoResearchClaw
-git pull origin main
-pip install -e .
-```
-
-バージョンを確認してください：
-
-```bash
-git log --oneline -1
-```
-
-バージョンが異なると、結果が大きく変わる可能性があります。フィードバックレポートには必ずコミットハッシュを記載してください。
-
-### Q8: フィードバックはどこに提出しますか？
-
-フィードバックレポートと添付ファイルは、以下のいずれかの方法で提出してください：
-
-- **GitHub Issues：** [Issueを作成](https://github.com/aiming-lab/AutoResearchClaw/issues)し、`feedback` ラベルを付ける
-- **Pull Request：** `feedback_<name>.md` を `community-feedback/` ディレクトリに提出
-- **メール：** プロジェクトのメンテナーに連絡（詳細はリポジトリを参照）
-
----
-
-## 🌍 あらゆる分野のテスターを募集しています
-
-パイプラインはこれまで主にML関連のトピックでテストされてきました。特に以下の分野のテスターを歓迎します：
-
-- 🧬 **バイオインフォマティクス・計算生物学**
-- 🧪 **化学・材料科学**
-- 📊 **統計学・応用数学**
-- 🤖 **ロボティクス・制御システム**
-- 🗣️ **NLP・計算言語学**
-- 👁️ **コンピュータビジョン・グラフィックス**
-- 🎮 **強化学習・ゲーム理論**
-- 🏥 **医療AI・ヘルスケア**
-- 🌐 **グラフ学習・ネットワーク科学**
-- 💹 **金融ML・計量経済学**
-- 🛰️ **リモートセンシング・地理空間AI**
-
-...その他、計算実験が関わるあらゆる分野！
-
----
-
-## 🙏 ありがとうございます
-
-大小問わず、すべてのフィードバックがAutoResearchClawの改善に直接つながります。この取り組みに参加していただき、ありがとうございます。
-
-<p align="center">
-  <b>⭐ このプロジェクトに興味を持たれたら、<a href="https://github.com/aiming-lab/AutoResearchClaw">GitHub</a>でスターをお願いします！</b>
-</p>
diff --git a/docs/integration-guide.md b/docs/integration-guide.md
deleted file mode 100644
index 9f834400..00000000
--- a/docs/integration-guide.md
+++ /dev/null
@@ -1,882 +0,0 @@
-# AutoResearchClaw Integration Guide
-
-> **The simplest way to use AutoResearchClaw**: give the repo URL to [OpenClaw](https://github.com/openclaw/openclaw) and say *"Research [your topic]."* That's it — OpenClaw handles cloning, installing, configuring, and running the entire 23-stage pipeline for you.
-
-This guide is for humans who want to understand what's happening under the hood, or who prefer to set things up manually.
-
----
-
-## Table of Contents
-
-1. [The Easy Way: OpenClaw](#1-the-easy-way-openclaw)
-2. [Manual Setup](#2-manual-setup)
-3. [Configuration Walkthrough](#3-configuration-walkthrough)
-4. [Running the Pipeline](#4-running-the-pipeline)
-5. [Understanding the 23 Stages](#5-understanding-the-23-stages)
-6. [Output Artifacts](#6-output-artifacts)
-7. [Experiment Modes](#7-experiment-modes)
-8. [Conference Templates](#8-conference-templates)
-9. [OpenClaw Bridge (Advanced)](#9-openclaw-bridge-advanced)
-10. [MetaClaw Integration (Cross-Run Learning)](#10-metaclaw-integration-cross-run-learning)
-11. [Other AI Platforms](#11-other-ai-platforms)
-12. [Python API](#12-python-api)
-13. [Troubleshooting](#13-troubleshooting)
-14. [FAQ](#14-faq)
-
----
-
-## 1. The Easy Way: OpenClaw
-
-If you use [OpenClaw](https://github.com/openclaw/openclaw) as your AI assistant, you don't need to read the rest of this guide.
-
-### Steps
-
-1. Share the GitHub repo URL with OpenClaw:
-   ```
-   https://github.com/aiming-lab/AutoResearchClaw
-   ```
-2. OpenClaw reads `RESEARCHCLAW_AGENTS.md` and `README.md` — it now understands the entire system.
-   > **Note:** `RESEARCHCLAW_AGENTS.md` is generated locally and listed in `.gitignore`. If it doesn't exist, OpenClaw can bootstrap from `README.md` and the project structure.
-3. Say something like:
-   ```
-   Research the application of graph neural networks in drug discovery
-   ```
-4. OpenClaw will:
-   - Clone the repo
-   - Create a virtual environment and install dependencies (`pip install -e .`)
-   - Copy `config.researchclaw.example.yaml` → `config.yaml`
-   - Ask you for an OpenAI API key (or use your environment variable)
-   - Run the full 23-stage pipeline
-   - Return the paper, experiment code, charts, and citations
-
-**That's the whole process.** OpenClaw is designed to read agent definition files and bootstrap itself. AutoResearchClaw ships with these files specifically so that any OpenClaw-compatible AI assistant can pick it up and run.
-
-### What if I want to tweak settings?
-
-Tell OpenClaw in natural language:
-
-- *"Use GPT-5.2 instead of GPT-4o"*
-- *"Run experiments in sandbox mode, not simulated"*
-- *"Target ICLR 2025 format instead of NeurIPS"*
-- *"Skip the quality gate, just auto-approve everything"*
-
-OpenClaw will modify `config.yaml` accordingly before running the pipeline.
-
----
-
-## 2. Manual Setup
-
-### Prerequisites
-
-| Requirement | Details |
-|-------------|---------|
-| **Python** | 3.11 or newer |
-| **LLM API** | Any OpenAI-compatible endpoint (OpenAI, Azure, local proxy, etc.) |
-| **Disk space** | ~100 MB for the repo + artifacts per run |
-| **Network** | Required for LLM API calls and literature search (Semantic Scholar, arXiv) |
-
-### Installation
-
-```bash
-# Clone the repository
-git clone https://github.com/aiming-lab/AutoResearchClaw.git
-cd AutoResearchClaw
-
-# Create a virtual environment (recommended)
-python3 -m venv .venv
-source .venv/bin/activate    # macOS/Linux
-# .venv\Scripts\activate     # Windows
-
-# Install
-pip install -e .
-```
-
-### Verify Installation
-
-```bash
-# Check the CLI is available
-researchclaw --help
-
-# Validate your configuration
-researchclaw validate --config config.yaml
-```
-
----
-
-## 3. Configuration Walkthrough
-
-Start from the provided template:
-
-```bash
-cp config.researchclaw.example.yaml config.yaml
-```
-
-Open `config.yaml` in your editor. Here's what each section does:
-
-### LLM Settings (Required)
-
-This is the only section you **must** configure. Everything else has sensible defaults.
-
-```yaml
-llm:
-  base_url: "https://api.openai.com/v1"     # Your LLM API endpoint
-  api_key_env: "OPENAI_API_KEY"              # Environment variable name...
-  api_key: ""                                # ...or paste the key directly here
-  primary_model: "gpt-4o"                    # Model to use (gpt-4o, gpt-5.2, etc.)
-  fallback_models:                           # Tried in order if primary fails
-    - "gpt-4.1"
-    - "gpt-4o-mini"
-  s2_api_key: ""                             # Optional: Semantic Scholar API key for higher rate limits
-```
-
-**Using an environment variable** (recommended for security):
-```bash
-export OPENAI_API_KEY="sk-..."
-```
-
-**Using a direct key** (simpler, less secure):
-```yaml
-llm:
-  api_key: "sk-your-key-here"
-```
-
-**Using a proxy or alternative provider**:
-```yaml
-llm:
-  base_url: "https://your-proxy.example.com/v1"
-  api_key: "your-proxy-key"
-  primary_model: "gpt-4o"    # Must be supported by your endpoint
-```
-
-### Research Settings
-
-```yaml
-research:
-  topic: "Your research topic here"    # Can also be set via CLI --topic flag
-  domains:
-    - "machine-learning"               # Guides literature search scope
-  daily_paper_count: 10                # Target papers to collect
-  quality_threshold: 4.0               # Minimum paper quality score (1-5)
-```
-
-### Experiment Settings
-
-```yaml
-experiment:
-  mode: "sandbox"              # How experiments run (see Section 7)
-  time_budget_sec: 300         # Max seconds per experiment run
-  max_iterations: 10           # Max refinement loops in Stage 13
-  metric_key: "primary_metric" # What metric to optimize
-  metric_direction: "minimize" # "minimize" or "maximize"
-  sandbox:
-    python_path: ".venv/bin/python3"   # Python binary for sandbox execution
-    gpu_required: false
-    max_memory_mb: 4096
-  code_agent:                        # CodeAgent v2 (multi-phase code generation)
-    enabled: true                    # Architecture planning + sequential file gen + hard validation
-  benchmark_agent:                   # Automated dataset & baseline selection
-    enabled: true                    # 4-agent pipeline: Surveyor→Selector→Acquirer→Validator
-  figure_agent:                      # Academic figure generation
-    enabled: true                    # 5-agent pipeline: Planner→CodeGen→Renderer→Critic→Integrator
-  repair:                            # Anti-fabrication experiment repair
-    enabled: true                    # Diagnose and fix failed experiments before paper writing
-    max_cycles: 3                    # Repair retry loops
-  opencode:                          # OpenCode Beast Mode (see README for details)
-    enabled: true
-```
-
-### Export Settings
-
-```yaml
-export:
-  target_conference: "neurips_2025"   # See Section 8 for all available templates
-  authors: "Anonymous"                 # Author line in the paper
-  bib_file: "references"              # BibTeX file name (without .bib)
-```
-
-### Everything Else (Optional)
-
-These have reasonable defaults. Change them only if you need to:
-
-```yaml
-project:
-  name: "my-research"      # Just an identifier for your run
-  mode: "full-auto"         # "docs-first", "semi-auto", or "full-auto"
-
-runtime:
-  timezone: "America/New_York"
-  max_parallel_tasks: 3
-  approval_timeout_hours: 12
-  retry_limit: 2
-
-security:
-  hitl_required_stages: [5, 9, 20]     # Stages that pause for human approval
-  allow_publish_without_approval: false
-
-notifications:
-  channel: "console"        # "console", "discord", or "slack"
-
-knowledge_base:
-  backend: "markdown"
-  root: "docs/kb"
-```
-
----
-
-## 4. Running the Pipeline
-
-### Basic Run
-
-```bash
-# Run with topic from config.yaml
-researchclaw run --config config.yaml --auto-approve
-
-# Override topic from command line
-researchclaw run --config config.yaml --topic "Transformer attention for time series" --auto-approve
-```
-
-### CLI Commands
-
-| Command | What It Does |
-|---------|-------------|
-| `researchclaw setup` | Interactive first-time setup (installs OpenCode Beast Mode, checks Docker/LaTeX) |
-| `researchclaw init` | Interactive config creation (choose LLM provider, creates `config.arc.yaml`) |
-| `researchclaw run` | Run the full 23-stage pipeline |
-| `researchclaw validate` | Check your config file for errors |
-| `researchclaw doctor` | Diagnose environment issues (Python, dependencies, API connectivity) |
-| `researchclaw report --run-dir <path>` | Generate a human-readable summary of a completed run |
-
-### Run Flags
-
-| Flag | Effect |
-|------|--------|
-| `--topic "..."` | Override the topic in config.yaml |
-| `--config path` | Config file path (default: `config.yaml`) |
-| `--output path` | Output directory (default: `artifacts/<run-id>/`) |
-| `--auto-approve` | Skip manual approval at gate stages (5, 9, 20) |
-| `--from-stage STAGE_NAME` | Start from a specific stage (e.g., `PAPER_OUTLINE`) |
-| `--resume` | Resume from the last checkpoint (auto-detects the most recent run matching your topic) |
-| `--skip-preflight` | Skip LLM connectivity check before starting |
-| `--skip-noncritical-stage` | Skip non-critical stages on failure instead of aborting |
-| `--no-graceful-degradation` | Fail pipeline on quality gate failure instead of degrading gracefully |
-
-### Examples
-
-```bash
-# Full autonomous run — no human intervention
-researchclaw run -c config.yaml -t "Graph neural networks for protein folding" --auto-approve
-
-# Resume a failed run from where it stopped
-researchclaw run -c config.yaml --resume --auto-approve
-
-# Re-run just the paper writing stages
-researchclaw run -c config.yaml --from-stage PAPER_OUTLINE --auto-approve
-
-# Check your setup before running
-researchclaw doctor -c config.yaml
-```
-
----
-
-## 5. Understanding the 23 Stages
-
-The pipeline runs in 8 phases. Each stage reads artifacts from previous stages and produces new ones.
-
-### Phase A: Research Scoping
-
-| # | Stage | What Happens | Produces |
-|---|-------|-------------|----------|
-| 1 | TOPIC_INIT | LLM formulates a SMART research goal; auto-detects GPU hardware (NVIDIA/MPS/CPU) | `goal.md`, `hardware_profile.json` |
-| 2 | PROBLEM_DECOMPOSE | Breaks the goal into prioritized sub-questions | `problem_tree.md` |
-
-### Phase B: Literature Discovery
-
-| # | Stage | What Happens | Produces |
-|---|-------|-------------|----------|
-| 3 | SEARCH_STRATEGY | Plans search queries and data sources | `search_plan.yaml`, `sources.json` |
-| 4 | LITERATURE_COLLECT | Queries **real APIs** (arXiv-first, then Semantic Scholar) with expanded queries for broad coverage | `candidates.jsonl` |
-| 5 | LITERATURE_SCREEN | **[Gate]** Filters by relevance and quality | `shortlist.jsonl` |
-| 6 | KNOWLEDGE_EXTRACT | Extracts structured knowledge cards from each paper | `cards/` |
-
-### Phase C: Knowledge Synthesis
-
-| # | Stage | What Happens | Produces |
-|---|-------|-------------|----------|
-| 7 | SYNTHESIS | Clusters findings, identifies research gaps | `synthesis.md` |
-| 8 | HYPOTHESIS_GEN | Generates falsifiable hypotheses | `hypotheses.md` |
-
-### Phase D: Experiment Design
-
-| # | Stage | What Happens | Produces |
-|---|-------|-------------|----------|
-| 9 | EXPERIMENT_DESIGN | **[Gate]** Designs experiment plan with baselines and metrics | `exp_plan.yaml` |
-| 10 | CODE_GENERATION | LLM writes hardware-aware experiment code (adapts packages/constraints to GPU tier) | `experiment.py`, `experiment_spec.md` |
-| 11 | RESOURCE_PLANNING | Estimates GPU/time requirements | `schedule.json` |
-
-### Phase E: Experiment Execution
-
-| # | Stage | What Happens | Produces |
-|---|-------|-------------|----------|
-| 12 | EXPERIMENT_RUN | Runs the experiment code (sandbox or simulated); immutable harness injected for time guard and metric validation; partial results captured on timeout | `runs/` |
-| 13 | ITERATIVE_REFINE | LLM analyzes results, improves code, re-runs (up to 10 iterations); timeout-aware prompts; NaN/divergence fast-fail; stdout truncated for context efficiency | `refinement_log.json`, `experiment_final.py` |
-
-### Phase F: Analysis & Decision
-
-| # | Stage | What Happens | Produces |
-|---|-------|-------------|----------|
-| 14 | RESULT_ANALYSIS | Statistical analysis of experiment results | `analysis.md` |
-| 15 | RESEARCH_DECISION | PROCEED / PIVOT decision with evidence | `decision.md` |
-
-### Phase G: Paper Writing
-
-| # | Stage | What Happens | Produces |
-|---|-------|-------------|----------|
-| 16 | PAPER_OUTLINE | Creates section-level paper outline | `outline.md` |
-| 17 | PAPER_DRAFT | Writes paper section-by-section (3 LLM calls, 5,000-6,500 words); **hard-blocked when no experiment metrics** (anti-fabrication); conference-grade title guidelines and abstract structure injected | `paper_draft.md` |
-| 18 | PEER_REVIEW | Simulates 2+ reviewer perspectives with NeurIPS/ICML rubric (1-10 scoring); checks baselines, ablations, claims vs evidence | `reviews.md` |
-| 19 | PAPER_REVISION | Addresses review comments with length guard (auto-retries if revised paper is shorter than draft) | `paper_revised.md` |
-
-### Phase H: Finalization
-
-| # | Stage | What Happens | Produces |
-|---|-------|-------------|----------|
-| 20 | QUALITY_GATE | **[Gate]** Checks paper quality score | `quality_report.json` |
-| 21 | KNOWLEDGE_ARCHIVE | Saves retrospective + reproducibility bundle | `archive.md`, `bundle_index.json` |
-| 22 | EXPORT_PUBLISH | Generates LaTeX, charts, and code package | `paper_final.md`, `paper.tex`, `code/` |
-| 23 | CITATION_VERIFY | Fact-checks all references against real APIs | `verification_report.json`, `references_verified.bib` |
-
-### Gate Stages
-
-Three stages pause for human review (unless `--auto-approve` is set):
-
-| Gate | What's Being Reviewed | On Reject, Rolls Back To |
-|------|-----------------------|--------------------------|
-| Stage 5 | Are the collected papers relevant and sufficient? | Stage 4 (re-collect literature) |
-| Stage 9 | Is the experiment design sound? | Stage 8 (re-generate hypotheses) |
-| Stage 20 | Does the paper meet quality standards? | Stage 16 (re-write from outline) |
-
-For fully autonomous operation, always use `--auto-approve`.
-
----
-
-## 6. Output Artifacts
-
-Each run creates a timestamped directory under `artifacts/`:
-
-```
-artifacts/rc-20260310-143200-a1b2c3/
-├── stage-1/goal.md                        # Research goal
-├── stage-2/problem_tree.md                # Problem decomposition
-├── stage-3/search_plan.yaml               # Search strategy
-├── stage-4/candidates.jsonl               # Raw literature results
-├── stage-5/shortlist.jsonl                # Screened papers
-├── stage-6/cards/                         # Knowledge cards (one per paper)
-├── stage-7/synthesis.md                   # Research gap analysis
-├── stage-8/hypotheses.md                  # Research hypotheses
-├── stage-9/exp_plan.yaml                  # Experiment plan
-├── stage-10/experiment.py                 # Generated experiment code
-├── stage-10/experiment_spec.md            # Experiment specification
-├── stage-11/schedule.json                 # Resource schedule
-├── stage-12/runs/run-1.json               # Experiment results
-├── stage-13/experiment_final.py           # Refined experiment code
-├── stage-13/experiment_v1.py              # Iteration 1 snapshot
-├── stage-13/refinement_log.json           # Refinement history
-├── stage-14/analysis.md                   # Statistical analysis
-├── stage-14/experiment_summary.json       # Metrics summary
-├── stage-15/decision.md                   # Proceed/Pivot decision
-├── stage-16/outline.md                    # Paper outline
-├── stage-17/paper_draft.md                # Full paper draft
-├── stage-18/reviews.md                    # Simulated peer reviews
-├── stage-19/paper_revised.md              # Revised paper
-├── stage-20/quality_report.json           # Quality assessment
-├── stage-21/archive.md                    # Knowledge retrospective
-├── stage-22/
-│   ├── paper_final.md                     # Final paper (Markdown)
-│   ├── paper.tex                          # Conference-ready LaTeX
-│   ├── references.bib                     # BibTeX references
-│   ├── charts/                            # Result visualizations
-│   └── code/                              # Open-source code package
-│       ├── experiment.py
-│       ├── requirements.txt
-│       └── README.md
-├── stage-23/
-│   ├── verification_report.json           # Citation fact-check results
-│   └── references_verified.bib            # Cleaned bibliography
-└── pipeline_summary.json                  # Overall execution summary
-```
-
-### Key Output Files
-
-| File | What You'll Use It For |
-|------|----------------------|
-| `stage-22/paper.tex` | Submit to a conference (compile with `pdflatex` or `tectonic`) |
-| `stage-22/paper_final.md` | Read or further edit the paper |
-| `stage-22/references.bib` | Bibliography for LaTeX compilation |
-| `stage-22/code/` | Share experiment code alongside the paper |
-| `stage-23/verification_report.json` | Check which citations are real vs. hallucinated |
-| `stage-13/experiment_final.py` | The best-performing experiment code |
-| `stage-22/charts/` | Figures for the paper |
-
----
-
-## 7. Experiment Modes
-
-AutoResearchClaw supports four modes for running experiments:
-
-### Simulated (Default)
-
-```yaml
-experiment:
-  mode: "simulated"
-```
-
-The LLM **generates synthetic experiment results** without executing any code. This is fast and requires no special setup, but the results are not real.
-
-**Best for**: Quick prototyping, testing the pipeline end-to-end, environments without Python scientific packages.
-
-### Sandbox
-
-```yaml
-experiment:
-  mode: "sandbox"
-  sandbox:
-    python_path: ".venv/bin/python3"
-    gpu_required: false
-    max_memory_mb: 4096
-```
-
-The pipeline **generates Python code and actually runs it** in a subprocess. The code is validated before execution (AST parsing, import whitelist, no file I/O outside sandbox). **Hardware-aware**: Stage 1 auto-detects your GPU (NVIDIA CUDA / Apple MPS / CPU-only) and adapts the generated code accordingly — high-tier GPUs get full PyTorch code, limited GPUs get lightweight experiments, CPU-only gets NumPy/sklearn only.
-
-**Best for**: Real experiments on your local machine. Supports numpy and stdlib; deep learning frameworks (torch, tensorflow) are available if installed in your environment and GPU is detected.
-
-**Safety features**:
-- Code validation blocks dangerous operations (subprocess, eval, exec, network calls)
-- Configurable memory limit and execution timeout
-- Auto-repair: if generated code has validation errors, the LLM fixes them (up to 3 attempts)
-
-### Docker
-
-```yaml
-experiment:
-  mode: "docker"
-  docker:
-    image: "researchclaw/experiment:latest"
-    gpu_enabled: true
-    memory_limit_mb: 8192
-    network_policy: "setup_only"   # none | setup_only | pip_only | full
-    auto_install_deps: true
-    shm_size_mb: 2048
-```
-
-The pipeline runs generated code inside a **Docker container** with GPU passthrough, dependency auto-installation, and network isolation. Execution follows a **three-phase model** within a single container:
-
-1. **Phase 0 (pip install)**: Installs auto-detected dependencies from `requirements.txt` (network enabled)
-2. **Phase 1 (setup.py)**: Runs `setup.py` for dataset downloads and environment preparation (network enabled)
-3. **Phase 2 (experiment)**: Executes the experiment code (network disabled by default via iptables)
-
-**Network policies**:
-- `none` — No network at all (all phases offline). Requires all deps pre-installed in image.
-- `setup_only` (default) — Network during Phase 0+1, disabled before Phase 2 via iptables (`--cap-add=NET_ADMIN`).
-- `pip_only` — Network only during Phase 0 (pip install), disabled for Phase 1+2.
-- `full` — Network available throughout all phases.
-
-**Pre-cached datasets**: The Docker image includes CIFAR-10/100, MNIST, FashionMNIST, STL-10, and SVHN at `/opt/datasets`, mounted read-only as `/workspace/data`. No download needed for these standard benchmarks.
-
-**Best for**: Reproducible experiments with full dependency isolation. Supports GPU passthrough (NVIDIA) and configurable network policies.
-
-**Setup**: Build the image first:
-```bash
-docker build -t researchclaw/experiment:latest researchclaw/docker/
-```
-
-### SSH Remote
-
-```yaml
-experiment:
-  mode: "ssh_remote"
-  ssh_remote:
-    host: "gpu-server.example.com"
-    gpu_ids: [0, 1]
-    remote_workdir: "/tmp/researchclaw_experiments"
-```
-
-The pipeline sends generated code to a remote GPU server for execution.
-
-**Best for**: Experiments that require GPU hardware you don't have locally.
-
----
-
-## 8. Conference Templates
-
-AutoResearchClaw generates LaTeX files formatted for specific conferences:
-
-```yaml
-export:
-  target_conference: "neurips_2025"
-```
-
-| Conference | Config Value | Layout |
-|------------|-------------|--------|
-| NeurIPS 2025 | `neurips_2025` (default) | Single-column, `neurips_2025` style |
-| NeurIPS 2024 | `neurips_2024` | Single-column, `neurips_2024` style |
-| ICLR 2026 | `iclr_2026` | Single-column, `iclr2026_conference` style |
-| ICLR 2025 | `iclr_2025` | Single-column, `iclr2025_conference` style |
-| ICML 2026 | `icml_2026` | Double-column, `icml2026` style |
-| ICML 2025 | `icml_2025` | Double-column, `icml2025` style |
-
-Short aliases are also accepted: `neurips` (→ 2025), `iclr` (→ 2026), `icml` (→ 2026).
-
-The Markdown-to-LaTeX converter handles:
-- Section headings (`#`, `##`, `###`)
-- Inline and display math (`$...$`, `$$...$$`)
-- Bold and italic text
-- Ordered and unordered lists
-- Tables
-- Code blocks
-- Citation references (`[cite_key]` → `\cite{cite_key}`)
-
-### Compiling the LaTeX
-
-```bash
-# Using tectonic (recommended)
-tectonic artifacts/<run-id>/stage-22/paper.tex
-
-# Using pdflatex
-cd artifacts/<run-id>/stage-22/
-pdflatex paper.tex
-bibtex paper
-pdflatex paper.tex
-pdflatex paper.tex
-```
-
----
-
-## 9. OpenClaw Bridge (Advanced)
-
-For deeper integration with OpenClaw, AutoResearchClaw includes a bridge adapter system. Each flag in the config activates a typed protocol interface:
-
-```yaml
-openclaw_bridge:
-  use_cron: true              # Scheduled research runs
-  use_message: true           # Progress notifications (Discord/Slack/Telegram)
-  use_memory: true            # Cross-session knowledge persistence
-  use_sessions_spawn: true    # Spawn parallel sub-sessions for concurrent stages
-  use_web_fetch: true         # Live web search during literature review
-  use_browser: false          # Browser-based paper collection
-```
-
-### What Each Adapter Does
-
-| Adapter | Protocol | Use Case |
-|---------|----------|----------|
-| **Cron** | `CronAdapter.schedule_resume(run_id, stage_id, reason)` | Schedule pipeline resumption (e.g., daily re-runs) |
-| **Message** | `MessageAdapter.notify(channel, subject, body)` | Send progress updates to chat platforms |
-| **Memory** | `MemoryAdapter.append(namespace, content)` | Persist knowledge across sessions |
-| **Sessions** | `SessionsAdapter.spawn(name, command)` | Run pipeline stages in parallel sub-sessions |
-| **WebFetch** | `WebFetchAdapter.fetch(url)` | Fetch web pages during literature search |
-| **Browser** | `BrowserAdapter.open(url)` | Open and interact with web pages |
-
-When OpenClaw provides a capability (e.g., message sending), the adapter consumes it automatically. When running standalone, recording stubs capture all calls for debugging without side effects.
-
-This is an **extension point** — you don't need to configure it for basic usage.
-
----
-
-## 10. MetaClaw Integration (Cross-Run Learning)
-
-[MetaClaw](https://github.com/aiming-lab/MetaClaw) adds **cross-run knowledge transfer** to AutoResearchClaw. When enabled, the pipeline automatically captures lessons from failures and converts them into reusable skills that improve subsequent runs.
-
-### Architecture
-
-```
-┌──────────────────────────────────────────────────────┐
-│              AutoResearchClaw Pipeline                │
-│  Stage 1 → 2 → ... → 23                             │
-│                                                      │
-│  ┌─────────────┐    ┌──────────────────────────────┐ │
-│  │ LLMClient   │───▶│ MetaClaw Integration Layer   │ │
-│  │             │    │ (metaclaw_bridge module)      │ │
-│  └─────────────┘    └──────────┬───────────────────┘ │
-│                                │                     │
-│  ┌─────────────┐    ┌──────────▼───────────────────┐ │
-│  │ Evolution   │◀──▶│ Lesson ↔ Skill Bridge        │ │
-│  │ Store       │    └─────────────────────────────┘ │
-│  └─────────────┘                                     │
-└──────────────────────────┬───────────────────────────┘
-                           │
-            ┌──────────────▼──────────────┐
-            │     MetaClaw Proxy Server    │
-            │     (optional, :30000)       │
-            │  ┌────────────────────────┐  │
-            │  │ SkillManager (40+ skills)│ │
-            │  │ + arc-* learned skills   │ │
-            │  └────────────────────────┘  │
-            └─────────────────────────────┘
-```
-
-### How It Works
-
-1. **Lesson Capture**: During each pipeline run, the `EvolutionStore` automatically records failures, warnings, and anomalies as structured lessons in `evolution/lessons.jsonl`.
-
-2. **Lesson → Skill Conversion**: After a run completes, lessons above a configurable severity threshold are converted into `arc-*` skill files stored in `~/.metaclaw/skills/`. Each skill contains: trigger conditions, failure root cause, and actionable guidance.
-
-3. **Skill Injection**: On the next run, `build_overlay()` reads all `arc-*` skills and injects them into the LLM prompt for every stage via the `evolution_overlay` parameter. The LLM receives explicit instructions to avoid previously encountered pitfalls.
-
-4. **Proxy Routing (Optional)**: When the MetaClaw proxy is running, LLM requests are routed through it for additional skill matching and session tracking. If the proxy is unavailable, requests automatically fall back to the direct LLM endpoint.
-
-### Setup
-
-#### Step 1: Install MetaClaw
-
-```bash
-pip install metaclaw
-# Or clone from source:
-git clone https://github.com/aiming-lab/MetaClaw.git
-cd metaclaw && pip install -e .
-```
-
-#### Step 2: Configure
-
-Add the `metaclaw_bridge` section to your `config.arc.yaml`:
-
-```yaml
-metaclaw_bridge:
-  enabled: true
-  proxy_url: "http://localhost:30000/v1"    # MetaClaw proxy (optional)
-  skills_dir: "~/.metaclaw/skills"          # Skill storage directory
-  fallback_url: "https://api.openai.com/v1" # Direct LLM fallback
-  fallback_api_key_env: "OPENAI_API_KEY"
-  lesson_to_skill:
-    enabled: true
-    min_severity: "warning"                 # Convert warnings + errors
-    max_skills_per_run: 5                   # Max new skills per run
-```
-
-#### Step 3: Run
-
-```bash
-# First run — captures lessons, generates initial skills
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-
-# Check generated skills
-ls ~/.metaclaw/skills/arc-*/SKILL.md
-
-# Second run — skills from Run 1 are automatically injected
-researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve
-```
-
-#### Optional: Start MetaClaw Proxy
-
-For full skill matching and session tracking:
-
-```bash
-metaclaw start --mode skills_only --port 30000
-# Or use the provided script:
-bash scripts/metaclaw_start.sh
-```
-
-The proxy is optional — without it, the pipeline still benefits from skill injection via `build_overlay()` and falls back to your configured LLM endpoint.
-
-### Experiment Results
-
-In controlled A/B experiments (same topic, same LLM, same configuration):
-
-| Metric | Baseline | With MetaClaw | Improvement |
-|--------|----------|---------------|-------------|
-| Stage retry rate | 10.5% | 7.9% | **-24.8%** |
-| Refine cycle count | 2.0 | 1.2 | **-40.0%** |
-| Pipeline stage completion | 18/19 | 19/19 | **+5.3%** |
-| Overall robustness score (composite) | 0.714 | 0.845 | **+18.3%** |
-
-> Composite robustness score is a weighted average of stage completion rate (40%), retry reduction (30%), and refine cycle efficiency (30%).
-
-### Key Files
-
-| File | Purpose |
-|------|---------|
-| `researchclaw/metaclaw_bridge/` | Integration module (config, session, lesson_to_skill, prm_gate, skill_feedback) |
-| `researchclaw/evolution.py` | `build_overlay()` — reads intra-run lessons + cross-run arc-* skills |
-| `researchclaw/llm/client.py` | Proxy routing with automatic fallback |
-| `~/.metaclaw/skills/arc-*/SKILL.md` | Learned skill files (auto-generated) |
-| `scripts/metaclaw_start.sh` | Helper script to launch MetaClaw proxy |
-
-### Backward Compatibility
-
-- **Default: OFF.** Without `metaclaw_bridge.enabled: true`, the pipeline is completely unchanged.
-- **No new required dependencies.** MetaClaw is optional.
-- **All 1,823 existing tests pass** with the integration code.
-
----
-
-## 11. Other AI Platforms
-
-AutoResearchClaw works with any AI coding assistant that can read project context files.
-
-### Claude Code
-
-Claude Code automatically reads `RESEARCHCLAW_CLAUDE.md` (if present) when you open the project. It also loads the skill definition from `.claude/skills/researchclaw/SKILL.md`.
-
-> **Note:** `RESEARCHCLAW_CLAUDE.md` is generated locally and listed in `.gitignore`. The `.claude/skills/researchclaw/SKILL.md` file is always available in the repo.
-
-```
-You: Research the impact of attention mechanisms on speech recognition
-Claude: [Reads project context, runs the pipeline, returns results]
-```
-
-### Copilot CLI (GitHub)
-
-GitHub Copilot can be used as an ACP agent via the `gh` CLI command (GitHub CLI with Copilot extension). Set the ACP agent to `gh` in your config:
-
-```yaml
-llm:
-  provider: "acp"
-  acp:
-    agent: "gh"
-    cwd: "."
-```
-
-Prerequisites:
-1. Install [GitHub CLI](https://cli.github.com/) (`gh`)
-2. Install the Copilot extension: `gh extension install github/gh-copilot`
-3. Authenticate: `gh auth login`
-
-### OpenCode
-
-OpenCode loads skills from `.claude/skills/`. The `researchclaw` skill activates on research-related queries and guides the agent through the pipeline.
-
-### Any AI CLI
-
-Provide `RESEARCHCLAW_AGENTS.md` (if generated locally) or `README.md` as context to any AI assistant. `RESEARCHCLAW_AGENTS.md` contains:
-- The agent role definition (research orchestrator)
-- Quick setup instructions
-- Pipeline stage reference
-- Decision guide for common scenarios
-
-The agent reads this file and knows how to install, configure, and run the pipeline. If the file is not present, the `README.md` and `.claude/skills/researchclaw/SKILL.md` provide sufficient context for any AI assistant to operate the pipeline.
-
----
-
-## 12. Python API
-
-For programmatic use or custom integrations:
-
-```python
-from researchclaw.pipeline.runner import execute_pipeline
-from researchclaw.config import RCConfig
-from researchclaw.adapters import AdapterBundle
-from pathlib import Path
-
-# Load configuration
-config = RCConfig.load("config.yaml", check_paths=False)
-
-# Run the full pipeline
-results = execute_pipeline(
-    run_dir=Path("artifacts/my-run"),
-    run_id="run-001",
-    config=config,
-    adapters=AdapterBundle(),
-    auto_approve_gates=True,
-)
-
-# Check results
-for result in results:
-    print(f"Stage {result.stage.name}: {result.status.value}")
-```
-
-### Iterative Pipeline (Multiple Paper Revisions)
-
-```python
-from researchclaw.pipeline.runner import execute_iterative_pipeline
-
-results = execute_iterative_pipeline(
-    run_dir=Path("artifacts/my-run"),
-    run_id="run-001",
-    config=config,
-    adapters=AdapterBundle(),
-    max_iterations=3,       # Re-run paper writing up to 3 times
-    convergence_rounds=2,   # Stop if quality stabilizes for 2 rounds
-)
-```
-
-### Literature Search Only
-
-```python
-from researchclaw.literature.search import search_papers
-
-papers = search_papers("transformer attention mechanisms", limit=20)
-for p in papers:
-    print(f"{p.title} ({p.year}) — cited {p.citation_count}x")
-    print(p.to_bibtex())
-```
-
----
-
-## 13. Troubleshooting
-
-### Pre-Run Diagnostics
-
-```bash
-# Check everything: Python version, dependencies, API connectivity, config validity
-researchclaw doctor --config config.yaml
-```
-
-### Common Issues
-
-| Problem | Cause | Solution |
-|---------|-------|----------|
-| `Missing required field: llm.base_url` | Config incomplete | Set `llm.base_url` and `llm.api_key` (or `api_key_env`) |
-| `Config validation FAILED` | Invalid YAML or missing fields | Run `researchclaw validate -c config.yaml` for details |
-| `Preflight check... FAILED` | LLM API unreachable | Check `base_url`, API key, and network connectivity |
-| Sandbox execution fails | Python path wrong or missing packages | Verify `experiment.sandbox.python_path` exists; ensure numpy is installed |
-| Code validation rejects all attempts | LLM generates unsafe code | Switch to `simulated` mode, or try a more capable model |
-| Gate stage blocks pipeline | Manual approval required | Use `--auto-approve` for autonomous mode |
-| Pipeline fails mid-run | Transient API error | Run with `--resume` to continue from the last checkpoint |
-| Citations marked HALLUCINATED | LLM invented fake references | This is expected — Stage 23 catches these. Use `references_verified.bib` instead |
-| LaTeX won't compile | Missing style packages | Install the conference style files, or use `tectonic` which auto-downloads them |
-
-### Resuming a Failed Run
-
-```bash
-# Resume from the exact point of failure
-researchclaw run -c config.yaml --resume --auto-approve
-
-# Or restart from a specific stage
-researchclaw run -c config.yaml --from-stage EXPERIMENT_RUN --auto-approve --output artifacts/<run-id>
-```
-
-### Reading a Run Report
-
-```bash
-researchclaw report --run-dir artifacts/rc-20260310-143200-a1b2c3
-```
-
-This prints a human-readable summary: which stages passed, which failed, key metrics, and paper quality scores.
-
----
-
-## 14. FAQ
-
-**Q: How much does a full pipeline run cost in API credits?**
-A: Depends on your model and topic complexity. A typical run with GPT-4o makes ~35-60 API calls across all 23 stages (paper drafting now uses 3 sequential calls for section-by-section writing). Expect roughly $3-12 per run. Simulated mode uses slightly fewer tokens since it doesn't generate real experiment code.
-
-**Q: Can I use a local LLM (Ollama, vLLM, etc.)?**
-A: Yes — any OpenAI-compatible endpoint works. Set `llm.base_url` to your local server (e.g., `http://localhost:11434/v1` for Ollama). Quality depends heavily on the model's capabilities.
-
-**Q: Can I run only part of the pipeline?**
-A: Yes. Use `--from-stage STAGE_NAME` to start from any stage. The stage reads its inputs from previously generated artifacts, so the earlier stages must have completed at least once.
-
-**Q: Are the literature references real?**
-A: Yes. Stage 4 uses a multi-source strategy (arXiv-first, then Semantic Scholar) with query expansion to find real papers with real titles, DOIs, and citation counts. The pipeline typically collects 100-200 candidates and aims for 30-60 references in the final paper. Stage 23 then verifies every reference to catch any that the LLM might have hallucinated during paper writing.
-
-**Q: Can I use this for a real paper submission?**
-A: AutoResearchClaw is a research tool, not a paper mill. The output is a strong first draft that should be reviewed, improved, and validated by a human researcher before submission. Think of it as an extremely thorough research assistant.
-
-**Q: What happens if the LLM API goes down mid-run?**
-A: The pipeline checkpoints after every stage. Use `--resume` to pick up where it left off. Failed stages are retried according to the `max_retries` setting in each stage's contract.
-
-**Q: Can I change the research topic mid-run?**
-A: Not recommended — the pipeline builds on prior stages' outputs. Start a new run with the new topic instead.
-
----
-
-*Last updated: May 2026 · AutoResearchClaw v0.5.0*
diff --git a/docs/showcase/SHOWCASE.md b/docs/showcase/SHOWCASE.md
deleted file mode 100644
index 0a1c0bbd..00000000
--- a/docs/showcase/SHOWCASE.md
+++ /dev/null
@@ -1,583 +0,0 @@
-<h1 align="center">🏆 Generated Paper Showcase</h1>
-
-<p align="center">
-  <i>From a one-line idea to a conference-ready paper — fully autonomous, zero human intervention.</i>
-</p>
-
-<p align="center">
-  <img src="https://img.shields.io/badge/🔬_Pipeline-23_Stages-0969DA?style=for-the-badge&labelColor=1a1a2e" alt="23 Stages">&nbsp;
-  <img src="https://img.shields.io/badge/📄_Papers-8_Completed-2ea043?style=for-the-badge&labelColor=1a1a2e" alt="8 Papers">&nbsp;
-  <img src="https://img.shields.io/badge/💻_Code-54%2C348_Lines-f97316?style=for-the-badge&labelColor=1a1a2e" alt="54k LOC">&nbsp;
-  <img src="https://img.shields.io/badge/⏱️_Runtime-~27_Hours-a855f7?style=for-the-badge&labelColor=1a1a2e" alt="~27h Runtime">
-</p>
-
-<p align="center">
-  <img src="https://img.shields.io/badge/📚_Literature-1%2C547%2B_Papers_Surveyed-3b82f6?style=flat-square" alt="1547+ papers">&nbsp;
-  <img src="https://img.shields.io/badge/📊_Figures-50_Auto--Generated-10b981?style=flat-square" alt="50 figures">&nbsp;
-  <img src="https://img.shields.io/badge/📑_Output-121_Pages_Total-ef4444?style=flat-square" alt="121 pages">&nbsp;
-  <img src="https://img.shields.io/badge/🔗_References-291_Cited_(99.7%25_Verified)-8b5cf6?style=flat-square" alt="291 refs">
-</p>
-
----
-
-Below are **eight papers** generated **entirely by AutoResearchClaw** — each starting from nothing more than a topic sentence. The pipeline autonomously searched literature, designed experiments, wrote and executed code, generated figures, and produced NeurIPS-formatted LaTeX papers with verified references.
-
-> 📌 **Two batches, eight domains** — Batch A covers mathematics, statistics, biology, and numerical computing; Batch B covers NLP, reinforcement learning, computer vision, and knowledge distillation — demonstrating the pipeline's cross-domain generality.
-
----
-
-## 🔄 How It Works
-
-<table>
-<tr>
-<td align="center" width="12%">
-
-**💡**<br>**Idea**
-</td>
-<td align="center" width="3%">➜</td>
-<td align="center" width="12%">
-
-**📚**<br>**Literature**<br><sub>300–470 papers</sub>
-</td>
-<td align="center" width="3%">➜</td>
-<td align="center" width="12%">
-
-**🧪**<br>**Hypothesis**<br><sub>experiment design</sub>
-</td>
-<td align="center" width="3%">➜</td>
-<td align="center" width="12%">
-
-**💻**<br>**Code**<br><sub>2K–15K lines</sub>
-</td>
-<td align="center" width="3%">➜</td>
-<td align="center" width="12%">
-
-**🔬**<br>**Execute**<br><sub>sandbox + refine</sub>
-</td>
-<td align="center" width="3%">➜</td>
-<td align="center" width="12%">
-
-**📝**<br>**Write**<br><sub>review & audit</sub>
-</td>
-<td align="center" width="3%">➜</td>
-<td align="center" width="12%">
-
-**📄**<br>**Paper**<br><sub>NeurIPS PDF</sub>
-</td>
-</tr>
-</table>
-
-<p align="center"><sub>Each run traverses <b>23 autonomous stages</b> with iterative self-healing, multi-agent peer review, and citation verification — no human in the loop.</sub></p>
-
----
-
-<h2 align="center">📘 Batch A &nbsp;·&nbsp; Mathematics, Statistics & Sciences</h2>
-
-<p align="center"><sub>Generated on Machine A &nbsp;·&nbsp; 4 papers across 4 non-ML domains</sub></p>
-
----
-
-### 📄 Paper I &nbsp;·&nbsp; Random Matrix Theory &ensp; <img src="https://img.shields.io/badge/domain-mathematics-blue?style=flat-square" alt="math">
-
-> **Finite-Dimensional Corrections to the Marchenko–Pastur Distribution in Random Wishart Matrices**
-
-<table>
-<tr>
-<td width="340">
-<a href="papers/paper_I_random_matrix.pdf">
-<img src="thumbnails/paper_I_random_matrix-01.png" width="320" alt="Paper I First Page" style="border: 1px solid #e1e4e8; border-radius: 6px;">
-</a>
-<p align="center"><sub>👆 Click to read the full paper</sub></p>
-</td>
-<td>
-
-#### 💡 Idea
-Systematically quantify pre-asymptotic, finite-*N* deviations of empirical eigenvalue densities from the Marchenko–Pastur law across *N* = 50 to 5,000, decomposing error into bulk vs. edge components and testing lightweight correction models.
-
-#### ⚙️ Pipeline Journey
-
-| | |
-|:---|:---|
-| 🔗 **Stages** | 23 stages + 2 refinement iterations |
-| 📚 **Literature** | 473 papers collected → 26 cited |
-| 💻 **Code** | 10,290 lines of Python |
-| ⏱️ **Runtime** | ~2 h 25 min |
-| 📊 **Figures** | 5 auto-generated charts |
-| 📑 **Pages** | 16 pages (NeurIPS format) |
-
-#### 🎯 Key Result
-Produced a finite-*N* correction atlas showing convergence rates of spectral densities, with edge deviations persisting significantly longer than bulk errors — providing practical guidance for when the MP law is "close enough."
-
-<a href="papers/paper_I_random_matrix.pdf"><img src="https://img.shields.io/badge/📄_Read_Full_Paper-PDF-d73a49?style=for-the-badge" alt="Read PDF"></a>
-
-</td>
-</tr>
-</table>
-
-<details>
-<summary>🖼️ <b>Auto-Generated Framework Diagram</b> — MPCX Architecture</summary>
-<br>
-<p align="center">
-<img src="thumbnails/framework_I_random_matrix.png" width="90%" alt="MPCX Framework Diagram">
-</p>
-<p align="center"><sub>Finite-dimensional correction pipeline: Wishart matrix generation → empirical spectral density estimation → MP baseline comparison → bulk/edge error decomposition → correction model fitting. Entirely auto-generated by the FigureAgent subsystem.</sub></p>
-</details>
-
----
-
-### 📄 Paper II &nbsp;·&nbsp; Econometrics &ensp; <img src="https://img.shields.io/badge/domain-statistics-green?style=flat-square" alt="stats">
-
-> **Monte Carlo Evaluation of Instrumental Variable Estimators Under Weak Instruments**
-
-<table>
-<tr>
-<td width="340">
-<a href="papers/paper_II_weak_iv_estimators.pdf">
-<img src="thumbnails/paper_II_weak_iv_estimators-01.png" width="320" alt="Paper II First Page" style="border: 1px solid #e1e4e8; border-radius: 6px;">
-</a>
-<p align="center"><sub>👆 Click to read the full paper</sub></p>
-</td>
-<td>
-
-#### 💡 Idea
-Reframe the classical 2SLS / LIML / Fuller-*k* / JIVE comparison around decision-relevant *risk surfaces*, mapping finite-sample phase diagrams that show where each estimator is preferred under realistic weak-IV conditions.
-
-#### ⚙️ Pipeline Journey
-
-| | |
-|:---|:---|
-| 🔗 **Stages** | 23 stages + 2 refinement iterations |
-| 📚 **Literature** | 366 papers collected → 41 cited |
-| 💻 **Code** | 10,062 lines of Python |
-| ⏱️ **Runtime** | ~2 h 56 min |
-| 📊 **Figures** | 6 auto-generated charts |
-| 📑 **Pages** | 14 pages (NeurIPS format) |
-
-#### 🎯 Key Result
-Generated estimator-switching phase diagrams revealing that Fuller-*k* dominates in specific small-*n*, many-instrument regions, while JIVE's bias reduction is systematically offset by variance inflation — providing actionable guidance for empirical researchers.
-
-<a href="papers/paper_II_weak_iv_estimators.pdf"><img src="https://img.shields.io/badge/📄_Read_Full_Paper-PDF-d73a49?style=for-the-badge" alt="Read PDF"></a>
-
-</td>
-</tr>
-</table>
-
-<details>
-<summary>🖼️ <b>Auto-Generated Framework Diagram</b> — IVX Architecture</summary>
-<br>
-<p align="center">
-<img src="thumbnails/framework_II_weak_iv_estimators.png" width="90%" alt="IVX Framework Diagram">
-</p>
-<p align="center"><sub>Monte Carlo IV evaluation pipeline: DGP specification → estimator suite (2SLS, LIML, Fuller-k, JIVE) → finite-sample risk surfaces → phase diagram construction. Entirely auto-generated by the FigureAgent subsystem.</sub></p>
-</details>
-
----
-
-### 📄 Paper III &nbsp;·&nbsp; Epidemiological Modeling &ensp; <img src="https://img.shields.io/badge/domain-biology-orange?style=flat-square" alt="bio">
-
-> **Structural Identifiability and Parameter Estimation in Compartmental Epidemic Models (SIR / SEIR)**
-
-<table>
-<tr>
-<td width="340">
-<a href="papers/paper_III_sir_seir_identifiability.pdf">
-<img src="thumbnails/paper_III_sir_seir_identifiability-01.png" width="320" alt="Paper III First Page" style="border: 1px solid #e1e4e8; border-radius: 6px;">
-</a>
-<p align="center"><sub>👆 Click to read the full paper</sub></p>
-</td>
-<td>
-
-#### 💡 Idea
-Map the boundary between structural and practical identifiability in SIR vs. SEIR models across realistic observation regimes, and quantify when Fisher Information Matrix gives false confidence relative to profile likelihood.
-
-#### ⚙️ Pipeline Journey
-
-| | |
-|:---|:---|
-| 🔗 **Stages** | 23 stages + 2 refinement iterations |
-| 📚 **Literature** | 388 papers collected → 29 cited |
-| 💻 **Code** | 9,374 lines of Python |
-| ⏱️ **Runtime** | ~2 h 23 min |
-| 📊 **Figures** | 6 auto-generated charts |
-| 📑 **Pages** | 18 pages (NeurIPS format) |
-
-#### 🎯 Key Result
-Demonstrated that parameterization and observer design choices affect identifiability diagnostics more strongly than the choice between SIR and SEIR structure — with FIM producing overconfident estimates in specific observation-limited regimes where profile likelihood correctly flags non-identifiability.
-
-<a href="papers/paper_III_sir_seir_identifiability.pdf"><img src="https://img.shields.io/badge/📄_Read_Full_Paper-PDF-d73a49?style=for-the-badge" alt="Read PDF"></a>
-
-</td>
-</tr>
-</table>
-
-<details>
-<summary>🖼️ <b>Auto-Generated Framework Diagram</b> — PRIM Architecture</summary>
-<br>
-<p align="center">
-<img src="thumbnails/framework_III_sir_seir_identifiability.png" width="90%" alt="PRIM Framework Diagram">
-</p>
-<p align="center"><sub>PRIM benchmark workflow: synthetic outbreak generation (SIR/SEIR) → parameter estimation → profile likelihood vs. FIM diagnostics → identifiability regime mapping. Entirely auto-generated by the FigureAgent subsystem.</sub></p>
-</details>
-
----
-
-### 📄 Paper IV &nbsp;·&nbsp; Numerical Linear Algebra &ensp; <img src="https://img.shields.io/badge/domain-computing-purple?style=flat-square" alt="computing">
-
-> **Comparative Analysis of Preconditioning Strategies for Krylov Subspace Methods on Sparse Linear Systems**
-
-<table>
-<tr>
-<td width="340">
-<a href="papers/paper_IV_krylov_preconditioners.pdf">
-<img src="thumbnails/paper_IV_krylov_preconditioners-01.png" width="320" alt="Paper IV First Page" style="border: 1px solid #e1e4e8; border-radius: 6px;">
-</a>
-<p align="center"><sub>👆 Click to read the full paper</sub></p>
-</td>
-<td>
-
-#### 💡 Idea
-Go beyond "which preconditioner wins" — build a feature-conditioned decision map for ILU / Jacobi / SSOR / AMG with CG / GMRES / BiCGSTAB, stratified by sparsity-graph structure and matrix pathology under realistic setup-vs-solve cost budgets.
-
-#### ⚙️ Pipeline Journey
-
-| | |
-|:---|:---|
-| 🔗 **Stages** | 23 stages + 2 refinement iterations |
-| 📚 **Literature** | 320 papers collected → 33 cited |
-| 💻 **Code** | 14,557 lines of Python |
-| ⏱️ **Runtime** | ~2 h 30 min |
-| 📊 **Figures** | 4 auto-generated charts |
-| 📑 **Pages** | 16 pages (NeurIPS format) |
-
-#### 🎯 Key Result
-Produced a setup-vs-solve tradeoff analysis showing that methods considered "best" under solve-time alone are often suboptimal under realistic memory and setup budgets — with AMG dominance limited to specific elliptic SPD matrix families.
-
-<a href="papers/paper_IV_krylov_preconditioners.pdf"><img src="https://img.shields.io/badge/📄_Read_Full_Paper-PDF-d73a49?style=for-the-badge" alt="Read PDF"></a>
-
-</td>
-</tr>
-</table>
-
-<details>
-<summary>🖼️ <b>Auto-Generated Framework Diagram</b> — Krylov Preconditioner Architecture</summary>
-<br>
-<p align="center">
-<img src="thumbnails/framework_IV_krylov_preconditioners.png" width="90%" alt="Krylov Preconditioner Framework Diagram">
-</p>
-<p align="center"><sub>Feature-conditioned preconditioner evaluation: sparse matrix collection → structural descriptor extraction → solver–preconditioner grid (CG/GMRES/BiCGSTAB × ILU/Jacobi/SSOR/AMG) → setup-vs-solve tradeoff analysis → decision map. Entirely auto-generated by the FigureAgent subsystem.</sub></p>
-</details>
-
----
-
-<h2 align="center">📙 Batch B &nbsp;·&nbsp; Machine Learning & AI</h2>
-
-<p align="center"><sub>Generated on Machine B &nbsp;·&nbsp; NVIDIA RTX 6000 Ada (48 GB) &nbsp;·&nbsp; 4 papers across 4 ML sub-fields</sub></p>
-
----
-
-### 📄 Paper V &nbsp;·&nbsp; Parameter-Efficient Fine-Tuning &ensp; <img src="https://img.shields.io/badge/domain-NLP%20/%20PEFT-blue?style=flat-square" alt="NLP">
-
-> **GARD: Gradient-Spectral Rank Allocation for LoRA Fine-Tuning**
-
-<table>
-<tr>
-<td width="340">
-<a href="papers/paper_V_gard_lora.pdf">
-<img src="thumbnails/paper_V_gard_lora-01.png" width="320" alt="Paper V First Page" style="border: 1px solid #e1e4e8; border-radius: 6px;">
-</a>
-<p align="center"><sub>👆 Click to read the full paper</sub></p>
-</td>
-<td>
-
-#### 💡 Idea
-Most LoRA configurations use a fixed, uniform rank across all layers. GARD proposes using the *spectrum of layer-wise gradients* — eigenvalues of gradient covariance — to dynamically allocate rank where it matters most, under a strict parameter budget.
-
-#### ⚙️ Pipeline Journey
-
-| | |
-|:---|:---|
-| 🔗 **Stages** | 23 stages + 2 refinement iterations |
-| 📚 **Literature** | 60 references cited (100% verified) |
-| 💻 **Code** | 2,894 lines of Python (5 files) |
-| ⏱️ **Runtime** | ~50 min |
-| 📊 **Figures** | 7 auto-generated charts |
-| 📑 **Pages** | 17 pages (NeurIPS format) |
-
-#### 🎯 Key Contribution
-A principled alternative to uniform rank allocation: GARD links intrinsic gradient dimensionality to low-rank adapter capacity, periodically updating ranks during training using smoothed spectra.
-
-<a href="papers/paper_V_gard_lora.pdf"><img src="https://img.shields.io/badge/📄_Read_Full_Paper-PDF-d73a49?style=for-the-badge" alt="Read PDF"></a>
-
-</td>
-</tr>
-</table>
-
-<details>
-<summary>🖼️ <b>Auto-Generated Framework Diagram</b> — GARD Architecture</summary>
-<br>
-<p align="center">
-<img src="thumbnails/framework_V_gard_lora.png" width="90%" alt="GARD Framework Diagram">
-</p>
-<p align="center"><sub>Gradient spectral analysis → layer-wise rank scoring → dynamic rank allocation under budget constraint. Entirely auto-generated by the FigureAgent subsystem.</sub></p>
-</details>
-
----
-
-### 📄 Paper VI &nbsp;·&nbsp; Reinforcement Learning &ensp; <img src="https://img.shields.io/badge/domain-RL%20/%20Exploration-green?style=flat-square" alt="RL">
-
-> **LACE: Learned Abstractions for Count-Based Exploration in Sparse-Reward RL**
-
-<table>
-<tr>
-<td width="340">
-<a href="papers/paper_VI_lace_exploration.pdf">
-<img src="thumbnails/paper_VI_lace_exploration-01.png" width="320" alt="Paper VI First Page" style="border: 1px solid #e1e4e8; border-radius: 6px;">
-</a>
-<p align="center"><sub>👆 Click to read the full paper</sub></p>
-</td>
-<td>
-
-#### 💡 Idea
-Count-based exploration in RL relies on state visitation counts, but raw state spaces are too large for effective counting. LACE designs *online-learned, task-aware state abstractions* optimized specifically for count-based exploration in sparse-reward environments.
-
-#### ⚙️ Pipeline Journey
-
-| | |
-|:---|:---|
-| 🔗 **Stages** | 23 stages + 2 refinement iterations |
-| 📚 **Literature** | 25 references cited (100% verified) |
-| 💻 **Code** | 2,067 lines of Python (4 files) |
-| 🐳 **Experiment** | 32 min GPU sandbox execution |
-| ⏱️ **Runtime** | ~6.8 hrs total |
-| 📊 **Figures** | 6 auto-generated charts |
-| 📑 **Pages** | 11 pages (NeurIPS format) |
-
-#### 🎯 Key Result
-DQN baseline achieves **356.7 mean reward** in sparse-reward gridworld tasks. The paper analyzes the trade-off between abstraction compactness for counting and information sufficiency for downstream control.
-
-<a href="papers/paper_VI_lace_exploration.pdf"><img src="https://img.shields.io/badge/📄_Read_Full_Paper-PDF-d73a49?style=for-the-badge" alt="Read PDF"></a>
-
-</td>
-</tr>
-</table>
-
-<details>
-<summary>🖼️ <b>Auto-Generated Framework Diagram</b> — LACE Architecture</summary>
-<br>
-<p align="center">
-<img src="thumbnails/framework_VI_lace_exploration.png" width="90%" alt="LACE Framework Diagram">
-</p>
-<p align="center"><sub>Learned state abstraction module integrated with count-based exploration in the DQN agent loop. Entirely auto-generated by the FigureAgent subsystem.</sub></p>
-</details>
-
----
-
-### 📄 Paper VII &nbsp;·&nbsp; Efficient Vision Transformers &ensp; <img src="https://img.shields.io/badge/domain-Computer_Vision-orange?style=flat-square" alt="CV">
-
-> **FAME: Frequency-Aware Progressive Token Merging for Efficient ViT Inference**
-
-<table>
-<tr>
-<td width="340">
-<a href="papers/paper_VII_fame_token_merging.pdf">
-<img src="thumbnails/paper_VII_fame_token_merging-01.png" width="320" alt="Paper VII First Page" style="border: 1px solid #e1e4e8; border-radius: 6px;">
-</a>
-<p align="center"><sub>👆 Click to read the full paper</sub></p>
-</td>
-<td>
-
-#### 💡 Idea
-Existing ViT token pruning methods reduce tokens based on attention or saliency without considering *frequency content*. FAME uses DCT/FFT-based spectral filters to distinguish high-frequency detail tokens from low-frequency background tokens, merging progressively across layers.
-
-#### ⚙️ Pipeline Journey
-
-| | |
-|:---|:---|
-| 🔗 **Stages** | 23 stages + 2 refinement iterations |
-| 📚 **Literature** | 40 references cited (100% verified) |
-| 💻 **Code** | 2,873 lines of Python (5 files) |
-| 🐳 **Experiment** | 32 min GPU sandbox execution |
-| ⏱️ **Runtime** | ~3.3 hrs total |
-| 📊 **Figures** | 7 auto-generated charts |
-| 📑 **Pages** | 10 pages (NeurIPS format) |
-
-#### 🎯 Key Result
-ViT-B/16 baseline: **56.54% accuracy** (3 seeds). Detailed analysis of the accuracy-efficiency tradeoff and per-layer metric breakdowns for frequency-aware vs. similarity-based merging.
-
-<a href="papers/paper_VII_fame_token_merging.pdf"><img src="https://img.shields.io/badge/📄_Read_Full_Paper-PDF-d73a49?style=for-the-badge" alt="Read PDF"></a>
-
-</td>
-</tr>
-</table>
-
-<details>
-<summary>🖼️ <b>Auto-Generated Framework Diagram</b> — FAME Architecture</summary>
-<br>
-<p align="center">
-<img src="thumbnails/framework_VII_fame_token_merging.png" width="90%" alt="FAME Framework Diagram">
-</p>
-<p align="center"><sub>Frequency-aware token merging applied progressively across ViT layers with DCT-based spectral filtering. Entirely auto-generated by the FigureAgent subsystem.</sub></p>
-</details>
-
----
-
-### 📄 Paper VIII &nbsp;·&nbsp; Knowledge Distillation &ensp; <img src="https://img.shields.io/badge/domain-Robustness%20/%20KD-purple?style=flat-square" alt="KD">
-
-> **CRAFT: Contrastive Feature Alignment for Robust Distillation Under Distribution Shift**
-
-<table>
-<tr>
-<td width="340">
-<a href="papers/paper_VIII_craft_distillation.pdf">
-<img src="thumbnails/paper_VIII_craft_distillation-01.png" width="320" alt="Paper VIII First Page" style="border: 1px solid #e1e4e8; border-radius: 6px;">
-</a>
-<p align="center"><sub>👆 Click to read the full paper</sub></p>
-</td>
-<td>
-
-#### 💡 Idea
-Standard knowledge distillation transfers teacher knowledge assuming train/test distributions match. CRAFT introduces *reliability-aware contrastive feature alignment* that aligns teacher-student features across clean and corrupted views, while suppressing fragile teacher directions via a de-alignment loss.
-
-#### ⚙️ Pipeline Journey
-
-| | |
-|:---|:---|
-| 🔗 **Stages** | 23 stages + 2 refinement iterations |
-| 📚 **Literature** | 37 references cited (97% verified) |
-| 💻 **Code** | 2,231 lines of Python (4 files) |
-| 🐳 **Experiment** | 33 min GPU sandbox execution |
-| ⏱️ **Runtime** | ~5.8 hrs total |
-| 📊 **Figures** | 9 auto-generated charts |
-| 📑 **Pages** | 19 pages (NeurIPS format) |
-
-#### 🎯 Key Result
-
-| Method | Clean Acc | Robust Acc |
-|:---|:---:|:---:|
-| ERM (baseline) | 81.22% | 62.96% |
-| LogitKD | 82.33% | 64.68% |
-| **AttentionKD** | **82.08%** | **65.95%** |
-| CRD | 68.03% | 50.57% |
-
-Attention-based feature KD improves robustness by **+3 pts** over ERM, while naive CRD degrades it by **-12 pts** — motivating CRAFT's reliability-aware design.
-
-<a href="papers/paper_VIII_craft_distillation.pdf"><img src="https://img.shields.io/badge/📄_Read_Full_Paper-PDF-d73a49?style=for-the-badge" alt="Read PDF"></a>
-
-</td>
-</tr>
-</table>
-
-<details>
-<summary>🖼️ <b>Auto-Generated Framework Diagram</b> — CRAFT Architecture</summary>
-<br>
-<p align="center">
-<img src="thumbnails/framework_VIII_craft_distillation.png" width="90%" alt="CRAFT Framework Diagram">
-</p>
-<p align="center"><sub>Reliability-aware contrastive feature alignment between teacher and student across clean and corrupted views, with de-alignment on fragile teacher directions. Entirely auto-generated by the FigureAgent subsystem.</sub></p>
-</details>
-
----
-
-## 📊 Aggregate Statistics
-
-<table>
-<tr>
-<th align="left">📋 Metric</th>
-<th align="center">I</th>
-<th align="center">II</th>
-<th align="center">III</th>
-<th align="center">IV</th>
-<th align="center">V</th>
-<th align="center">VI</th>
-<th align="center">VII</th>
-<th align="center">VIII</th>
-<th align="center">🏆 Total</th>
-</tr>
-<tr>
-<td>🏷️ <b>Domain</b></td>
-<td align="center"><sub>Math</sub></td>
-<td align="center"><sub>Stats</sub></td>
-<td align="center"><sub>Bio</sub></td>
-<td align="center"><sub>NumLA</sub></td>
-<td align="center"><sub>NLP</sub></td>
-<td align="center"><sub>RL</sub></td>
-<td align="center"><sub>CV</sub></td>
-<td align="center"><sub>KD</sub></td>
-<td align="center"><b>8 fields</b></td>
-</tr>
-<tr>
-<td>💻 <b>Code (LOC)</b></td>
-<td align="center">10,290</td>
-<td align="center">10,062</td>
-<td align="center">9,374</td>
-<td align="center">14,557</td>
-<td align="center">2,894</td>
-<td align="center">2,067</td>
-<td align="center">2,873</td>
-<td align="center">2,231</td>
-<td align="center"><b>54,348</b></td>
-</tr>
-<tr>
-<td>⏱️ <b>Pipeline Time</b></td>
-<td align="center">2h25m</td>
-<td align="center">2h56m</td>
-<td align="center">2h23m</td>
-<td align="center">2h30m</td>
-<td align="center">50m</td>
-<td align="center">6h48m</td>
-<td align="center">3h18m</td>
-<td align="center">5h48m</td>
-<td align="center"><b>~27 hrs</b></td>
-</tr>
-<tr>
-<td>🔗 <b>References</b></td>
-<td align="center">26</td>
-<td align="center">41</td>
-<td align="center">29</td>
-<td align="center">33</td>
-<td align="center">60</td>
-<td align="center">25</td>
-<td align="center">40</td>
-<td align="center">37</td>
-<td align="center"><b>291 cited</b></td>
-</tr>
-<tr>
-<td>📊 <b>Figures</b></td>
-<td align="center">5</td>
-<td align="center">6</td>
-<td align="center">6</td>
-<td align="center">4</td>
-<td align="center">7</td>
-<td align="center">6</td>
-<td align="center">7</td>
-<td align="center">9</td>
-<td align="center"><b>50 figs</b></td>
-</tr>
-<tr>
-<td>📑 <b>Pages</b></td>
-<td align="center">16</td>
-<td align="center">14</td>
-<td align="center">18</td>
-<td align="center">16</td>
-<td align="center">17</td>
-<td align="center">11</td>
-<td align="center">10</td>
-<td align="center">19</td>
-<td align="center"><b>121 pages</b></td>
-</tr>
-</table>
-
----
-
-<h3 align="center">🚀 Try It Yourself</h3>
-
-<p align="center">Every paper above was generated by a single command:</p>
-
-```bash
-researchclaw run --topic "Your research idea here" --auto-approve
-```
-
-<p align="center">
-  <a href="../../README.md"><img src="https://img.shields.io/badge/←_Back_to_README-Main-gray?style=for-the-badge" alt="Back"></a>&nbsp;
-  <a href="https://github.com/aiming-lab/AutoResearchClaw"><img src="https://img.shields.io/badge/⭐_Star_on_GitHub-AutoResearchClaw-181717?style=for-the-badge&logo=github" alt="GitHub"></a>
-</p>
diff --git a/docs/showcase/papers/paper_III_sir_seir_identifiability.pdf b/docs/showcase/papers/paper_III_sir_seir_identifiability.pdf
deleted file mode 100644
index 62d071cf..00000000
Binary files a/docs/showcase/papers/paper_III_sir_seir_identifiability.pdf and /dev/null differ
diff --git a/docs/showcase/papers/paper_II_weak_iv_estimators.pdf b/docs/showcase/papers/paper_II_weak_iv_estimators.pdf
deleted file mode 100644
index f736d34c..00000000
Binary files a/docs/showcase/papers/paper_II_weak_iv_estimators.pdf and /dev/null differ
diff --git a/docs/showcase/papers/paper_IV_krylov_preconditioners.pdf b/docs/showcase/papers/paper_IV_krylov_preconditioners.pdf
deleted file mode 100644
index 5029567d..00000000
Binary files a/docs/showcase/papers/paper_IV_krylov_preconditioners.pdf and /dev/null differ
diff --git a/docs/showcase/papers/paper_I_random_matrix.pdf b/docs/showcase/papers/paper_I_random_matrix.pdf
deleted file mode 100644
index f07d568e..00000000
Binary files a/docs/showcase/papers/paper_I_random_matrix.pdf and /dev/null differ
diff --git a/docs/showcase/papers/paper_VIII_craft_distillation.pdf b/docs/showcase/papers/paper_VIII_craft_distillation.pdf
deleted file mode 100644
index c8a6e7e8..00000000
Binary files a/docs/showcase/papers/paper_VIII_craft_distillation.pdf and /dev/null differ
diff --git a/docs/showcase/papers/paper_VII_fame_token_merging.pdf b/docs/showcase/papers/paper_VII_fame_token_merging.pdf
deleted file mode 100644
index 52fa4c0c..00000000
Binary files a/docs/showcase/papers/paper_VII_fame_token_merging.pdf and /dev/null differ
diff --git a/docs/showcase/papers/paper_VI_lace_exploration.pdf b/docs/showcase/papers/paper_VI_lace_exploration.pdf
deleted file mode 100644
index 3ec79f22..00000000
Binary files a/docs/showcase/papers/paper_VI_lace_exploration.pdf and /dev/null differ
diff --git a/docs/showcase/papers/paper_V_gard_lora.pdf b/docs/showcase/papers/paper_V_gard_lora.pdf
deleted file mode 100644
index 4726a5e9..00000000
Binary files a/docs/showcase/papers/paper_V_gard_lora.pdf and /dev/null differ
diff --git a/docs/showcase/thumbnails/framework_III_sir_seir_identifiability.png b/docs/showcase/thumbnails/framework_III_sir_seir_identifiability.png
deleted file mode 100644
index 79b83105..00000000
Binary files a/docs/showcase/thumbnails/framework_III_sir_seir_identifiability.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/framework_II_weak_iv_estimators.png b/docs/showcase/thumbnails/framework_II_weak_iv_estimators.png
deleted file mode 100644
index a7c0e1bb..00000000
Binary files a/docs/showcase/thumbnails/framework_II_weak_iv_estimators.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/framework_IV_krylov_preconditioners.png b/docs/showcase/thumbnails/framework_IV_krylov_preconditioners.png
deleted file mode 100644
index fe373474..00000000
Binary files a/docs/showcase/thumbnails/framework_IV_krylov_preconditioners.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/framework_I_random_matrix.png b/docs/showcase/thumbnails/framework_I_random_matrix.png
deleted file mode 100644
index 533269f5..00000000
Binary files a/docs/showcase/thumbnails/framework_I_random_matrix.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/framework_VIII_craft_distillation.png b/docs/showcase/thumbnails/framework_VIII_craft_distillation.png
deleted file mode 100644
index 0e2e9457..00000000
Binary files a/docs/showcase/thumbnails/framework_VIII_craft_distillation.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/framework_VII_fame_token_merging.png b/docs/showcase/thumbnails/framework_VII_fame_token_merging.png
deleted file mode 100644
index f8a673ac..00000000
Binary files a/docs/showcase/thumbnails/framework_VII_fame_token_merging.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/framework_VI_lace_exploration.png b/docs/showcase/thumbnails/framework_VI_lace_exploration.png
deleted file mode 100644
index cfdcc400..00000000
Binary files a/docs/showcase/thumbnails/framework_VI_lace_exploration.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/framework_V_gard_lora.png b/docs/showcase/thumbnails/framework_V_gard_lora.png
deleted file mode 100644
index 384d30bb..00000000
Binary files a/docs/showcase/thumbnails/framework_V_gard_lora.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/paper_III_sir_seir_identifiability-01.png b/docs/showcase/thumbnails/paper_III_sir_seir_identifiability-01.png
deleted file mode 100644
index 6928f0dd..00000000
Binary files a/docs/showcase/thumbnails/paper_III_sir_seir_identifiability-01.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/paper_II_weak_iv_estimators-01.png b/docs/showcase/thumbnails/paper_II_weak_iv_estimators-01.png
deleted file mode 100644
index f0eab72f..00000000
Binary files a/docs/showcase/thumbnails/paper_II_weak_iv_estimators-01.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/paper_IV_krylov_preconditioners-01.png b/docs/showcase/thumbnails/paper_IV_krylov_preconditioners-01.png
deleted file mode 100644
index 9f2bb89f..00000000
Binary files a/docs/showcase/thumbnails/paper_IV_krylov_preconditioners-01.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/paper_I_random_matrix-01.png b/docs/showcase/thumbnails/paper_I_random_matrix-01.png
deleted file mode 100644
index a9840715..00000000
Binary files a/docs/showcase/thumbnails/paper_I_random_matrix-01.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/paper_VIII_craft_distillation-01.png b/docs/showcase/thumbnails/paper_VIII_craft_distillation-01.png
deleted file mode 100644
index 5469d3da..00000000
Binary files a/docs/showcase/thumbnails/paper_VIII_craft_distillation-01.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/paper_VII_fame_token_merging-01.png b/docs/showcase/thumbnails/paper_VII_fame_token_merging-01.png
deleted file mode 100644
index 93440a41..00000000
Binary files a/docs/showcase/thumbnails/paper_VII_fame_token_merging-01.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/paper_VI_lace_exploration-01.png b/docs/showcase/thumbnails/paper_VI_lace_exploration-01.png
deleted file mode 100644
index be807f6d..00000000
Binary files a/docs/showcase/thumbnails/paper_VI_lace_exploration-01.png and /dev/null differ
diff --git a/docs/showcase/thumbnails/paper_V_gard_lora-01.png b/docs/showcase/thumbnails/paper_V_gard_lora-01.png
deleted file mode 100644
index e5ae8e5a..00000000
Binary files a/docs/showcase/thumbnails/paper_V_gard_lora-01.png and /dev/null differ
diff --git a/researchclaw/config.py b/researchclaw/config.py
index ce7f76e1..ef79949b 100644
--- a/researchclaw/config.py
+++ b/researchclaw/config.py
@@ -188,6 +188,7 @@ class AcpConfig:
     acpx_command: str = ""
     session_name: str = "researchclaw"
     timeout_sec: int = 1800
+    max_turns: int = 10
 
 
 @dataclass(frozen=True)
@@ -1153,6 +1154,7 @@ def _parse_llm_config(data: dict[str, Any]) -> LlmConfig:
             acpx_command=acp_data.get("acpx_command", ""),
             session_name=acp_data.get("session_name", "researchclaw"),
             timeout_sec=int(acp_data.get("timeout_sec", 1800)),
+            max_turns=int(acp_data.get("max_turns", 10)),
         ),
     )
 
diff --git a/researchclaw/llm/acp_client.py b/researchclaw/llm/acp_client.py
index 9577c7c5..9404fd39 100644
--- a/researchclaw/llm/acp_client.py
+++ b/researchclaw/llm/acp_client.py
@@ -42,6 +42,7 @@ class ACPConfig:
     acpx_command: str = ""  # auto-detect if empty
     session_name: str = "researchclaw"
     timeout_sec: int = 1800  # per-prompt timeout
+    max_turns: int = 10
 
 
 def _find_acpx() -> str | None:
@@ -91,6 +92,7 @@ def from_rc_config(cls, rc_config: Any) -> ACPClient:
             acpx_command=getattr(acp, "acpx_command", ""),
             session_name=getattr(acp, "session_name", "researchclaw"),
             timeout_sec=getattr(acp, "timeout_sec", 1800),
+            max_turns=getattr(acp, "max_turns", 10),
         ))
 
     # ------------------------------------------------------------------
@@ -245,7 +247,7 @@ def _ensure_session(self) -> None:
         )
         try:
             subprocess.run(
-                [acpx, "--approve-all", "--max-turns", "1",
+                [acpx, "--approve-all", "--max-turns", str(self.config.max_turns),
                  "--ttl", "0", "--cwd", self._abs_cwd(),
                  self.config.agent, "-s", self.config.session_name,
                  _warmup],
@@ -466,10 +468,11 @@ def _reader(stream: Any, buf: list[str]) -> None:
     def _send_prompt_cli(self, acpx: str, prompt: str) -> str:
         """Send prompt as a CLI argument (original path)."""
         cmd = [
-            acpx, "--approve-all", "--max-turns", "1",
+            acpx, "--approve-all", "--max-turns", str(self.config.max_turns),
             "--ttl", "0", "--cwd", self._abs_cwd(),
             self.config.agent, "-s", self.config.session_name, prompt,
         ]
+        logger.info("ACP CLI cmd max-turns=%s", self.config.max_turns)
         try:
             result = self._run_acp_with_heartbeat(cmd, label="ACP prompt (cli)")
         except subprocess.TimeoutExpired as exc:
@@ -479,18 +482,23 @@ def _send_prompt_cli(self, acpx: str, prompt: str) -> str:
 
         if result.returncode != 0:
             stderr = (result.stderr or "").strip()
-            raise RuntimeError(f"ACP prompt failed (exit {result.returncode}): {stderr}")
+            stdout = (result.stdout or "").strip()[-2000:]
+            raise RuntimeError(
+                f"ACP prompt failed (exit {result.returncode}): {stderr}"
+                + (f"\nstdout tail: {stdout}" if stdout else "")
+            )
 
         return self._extract_response(result.stdout)
 
     def _send_prompt_via_file(self, acpx: str, prompt: str) -> str:
         """Send prompt via stdin pipe (``-f -``) to avoid CLI arg limits."""
         cmd = [
-            acpx, "--approve-all", "--max-turns", "1",
+            acpx, "--approve-all", "--max-turns", str(self.config.max_turns),
             "--ttl", "0", "--cwd", self._abs_cwd(),
             self.config.agent, "-s", self.config.session_name,
             "-f", "-",
         ]
+        logger.info("ACP file cmd max-turns=%s", self.config.max_turns)
         try:
             result = self._run_acp_with_heartbeat(
                 cmd, label="ACP prompt (stdin)", input_data=prompt,
@@ -502,8 +510,10 @@ def _send_prompt_via_file(self, acpx: str, prompt: str) -> str:
 
         if result.returncode != 0:
             stderr = (result.stderr or "").strip()
+            stdout = (result.stdout or "").strip()[-2000:]
             raise RuntimeError(
                 f"ACP prompt failed (exit {result.returncode}): {stderr}"
+                + (f"\nstdout tail: {stdout}" if stdout else "")
             )
 
         return self._extract_response(result.stdout)
diff --git a/researchclaw/pipeline/runner.py b/researchclaw/pipeline/runner.py
index 67a48db8..0e5000cf 100644
--- a/researchclaw/pipeline/runner.py
+++ b/researchclaw/pipeline/runner.py
@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import dataclasses
 import json
 import importlib
 import logging
@@ -548,6 +549,21 @@ def execute_pipeline(
             except Exception:
                 pass
 
+        # ── Topic auto-refinement: if PROBLEM_DECOMPOSE chose a specific sub-topic, patch config ──
+        if stage == Stage.PROBLEM_DECOMPOSE and result.status == StageStatus.DONE:
+            try:
+                _eval_path = run_dir / "stage-02" / "topic_evaluation.json"
+                if _eval_path.exists():
+                    _eval_json = json.loads(_eval_path.read_text(encoding="utf-8"))
+                    _refined = _eval_json.get("refined_topic")
+                    if _refined and _refined != config.research.topic:
+                        _new_research = dataclasses.replace(config.research, topic=_refined)
+                        config = dataclasses.replace(config, research=_new_research)
+                        logger.info("Topic refined for pipeline: %s", _refined)
+                        print(f"[{run_id}] Topic refined → {_refined}")
+            except Exception:
+                logger.debug("Topic refinement patch skipped (non-blocking)")
+
         # ── ExperimentSpec: generate after design, validate after analysis ──
         if stage == Stage.EXPERIMENT_DESIGN and result.status == StageStatus.DONE:
             try:
diff --git a/researchclaw/pipeline/stage_impls/_topic.py b/researchclaw/pipeline/stage_impls/_topic.py
index ae89c75f..6c105f2d 100644
--- a/researchclaw/pipeline/stage_impls/_topic.py
+++ b/researchclaw/pipeline/stage_impls/_topic.py
@@ -166,10 +166,11 @@ def _execute_problem_decompose(
 """
     (stage_dir / "problem_tree.md").write_text(body, encoding="utf-8")
 
-    # IMP-35: Topic/title quality pre-evaluation
-    # Quick LLM check: is the topic well-scoped for a conference paper?
+    # IMP-35: Topic/title quality pre-evaluation + auto-refinement
+    # If the topic is too broad (score < 6), generate specific sub-topics and pick the best.
     if llm is not None:
         try:
+            _domain_label = _detect_domain(config.research.topic, config.research.domains)[1]
             _eval_resp = llm.chat(
                 [
                     {
@@ -177,7 +178,7 @@ def _execute_problem_decompose(
                         "content": (
                             "Evaluate this research topic for a top ML conference paper. "
                             "Score 1-10 on: (a) novelty, (b) specificity, (c) feasibility. "
-                            "If overall score < 5, suggest a refined topic.\n\n"
+                            "If overall score < 6, suggest a refined topic.\n\n"
                             f"Topic: {config.research.topic}\n\n"
                             "Reply as JSON: {\"novelty\": N, \"specificity\": N, "
                             "\"feasibility\": N, \"overall\": N, \"suggestion\": \"...\"}"
@@ -185,19 +186,55 @@ def _execute_problem_decompose(
                     }
                 ],
                 system=(
-                    f"You are a senior {_detect_domain(config.research.topic, config.research.domains)[1]} "
+                    f"You are a senior {_domain_label} "
                     f"researcher evaluating research topic quality."
                 ),
             )
             _eval_data = _safe_json_loads(_eval_resp.content, {})
             if isinstance(_eval_data, dict):
                 overall = _eval_data.get("overall", 10)
-                if isinstance(overall, (int, float)) and overall < 5:
+                if isinstance(overall, (int, float)) and overall < 6:
+                    # Topic is too broad — treat it as a direction and generate specific candidates
                     logger.warning(
-                        "IMP-35: Topic quality score %s/10 — consider refining: %s",
+                        "IMP-35: Topic too broad (score %s/10). Generating specific sub-topics...",
                         overall,
-                        _eval_data.get("suggestion", ""),
                     )
+                    try:
+                        _refine_resp = llm.chat(
+                            [
+                                {
+                                    "role": "user",
+                                    "content": (
+                                        f"The research direction '{config.research.topic}' is too broad "
+                                        "for a single conference paper. Generate 5 specific, publishable "
+                                        "research topics derived from this direction. Each must be concrete "
+                                        "enough for a single paper at a top ML/systems venue — specify the "
+                                        "mechanism, target system, and approach.\n\n"
+                                        "Reply as JSON: {\"candidates\": ["
+                                        "{\"topic\": \"...\", \"novelty\": N, \"specificity\": N, "
+                                        "\"feasibility\": N, \"overall\": N, \"rationale\": \"...\"}]}"
+                                    ),
+                                }
+                            ],
+                            system=(
+                                f"You are a senior {_domain_label} researcher helping scope a "
+                                "vague research direction into a publishable conference paper topic."
+                            ),
+                        )
+                        _refine_data = _safe_json_loads(_refine_resp.content, {})
+                        candidates = _refine_data.get("candidates", [])
+                        if candidates:
+                            best = max(candidates, key=lambda c: c.get("overall", 0))
+                            _eval_data["original_topic"] = config.research.topic
+                            _eval_data["refined_topic"] = best["topic"]
+                            _eval_data["candidates"] = candidates
+                            logger.warning(
+                                "IMP-35: Refined topic selected (score %s/10): %s",
+                                best.get("overall", "?"),
+                                best["topic"],
+                            )
+                    except Exception:  # noqa: BLE001
+                        logger.debug("IMP-35: Sub-topic generation failed (non-blocking)")
                 else:
                     logger.info("IMP-35: Topic quality score %s/10", overall)
                 (stage_dir / "topic_evaluation.json").write_text(

📄	`paper_draft.md`	ورقة أكاديمية كاملة (مقدمة، أعمال سابقة، المنهجية، التجارب، النتائج، الخاتمة)
📐	`paper.tex`	LaTeX جاهز للمؤتمرات (قوالب NeurIPS / ICLR / ICML)
📚	`references.bib`	مراجع BibTeX حقيقية من OpenAlex و Semantic Scholar و arXiv — مُنقّحة تلقائياً لمطابقة الاستشهادات المضمّنة
🔍	`verification_report.json`	تحقق من سلامة الاستشهادات على 4 طبقات + التحقق من الصلة (arXiv، CrossRef، DataCite، LLM)
🧪	`experiment runs/`	كود مُولّد + نتائج البيئة المعزولة + مقاييس JSON منظمة
📊	`charts/`	رسوم بيانية مُولّدة تلقائياً لمقارنة الظروف مع أشرطة الخطأ وفترات الثقة
📝	`reviews.md`	مراجعة أقران متعددة الوكلاء مع فحص اتساق المنهجية والأدلة
🧬	`evolution/`	دروس تعلّم ذاتي مستخلصة من كل تشغيل
📦	`deliverables/`	جميع المخرجات النهائية في مجلد واحد — جاهزة للترجمة على Overleaf
📄	`paper_draft.md`	完整学术论文（引言、相关工作、方法、实验、结果、结论）
📐	`paper.tex`	适配顶会模板的 LaTeX 文件（NeurIPS / ICLR / ICML）
📚	`references.bib`	来自 OpenAlex、Semantic Scholar 和 arXiv 的真实 BibTeX 引用——自动精简至与正文引用一致
🔍	`verification_report.json`	四层引用完整性 + 相关性核查（arXiv、CrossRef、DataCite、LLM）
🧪	`experiment runs/`	生成的代码 + 沙箱结果 + 结构化 JSON 指标
📊	`charts/`	自动生成的条件对比图（含误差线和置信区间）
📝	`reviews.md`	多 Agent 同行评审（含方法论-证据一致性检查）
🧬	`evolution/`	从每次运行中提取的自学习教训
📦	`deliverables/`	所有最终产出集中在一个文件夹——可直接上传 Overleaf 编译
📄	`paper_draft.md`	Vollstaendiges wissenschaftliches Paper (Einleitung, Verwandte Arbeiten, Methode, Experimente, Ergebnisse, Fazit)
📐	`paper.tex`	Konferenzfertiges LaTeX (NeurIPS / ICLR / ICML Templates)
📚	`references.bib`	Echte BibTeX-Referenzen von OpenAlex, Semantic Scholar und arXiv — automatisch bereinigt, um Inline-Zitationen zu entsprechen
🔍	`verification_report.json`	4-Schicht-Zitationsintegritaets- und Relevanzpruefung (arXiv, CrossRef, DataCite, LLM)
🧪	`experiment runs/`	Generierter Code + Sandbox-Ergebnisse + strukturierte JSON-Metriken
📊	`charts/`	Automatisch generierte Vergleichsdiagramme mit Fehlerbalken und Konfidenzintervallen
📝	`reviews.md`	Multi-Agenten-Peer-Review mit Methodik-Evidenz-Konsistenzpruefungen
🧬	`evolution/`	Selbstlernende Erkenntnisse aus jedem Durchlauf
📦	`deliverables/`	Alle finalen Ergebnisse in einem Ordner — kompilierbereit fuer Overleaf
📄	`paper_draft.md`	Articulo academico completo (Introduccion, Trabajo relacionado, Metodo, Experimentos, Resultados, Conclusion)
📐	`paper.tex`	LaTeX listo para conferencia (plantillas NeurIPS / ICLR / ICML)
📚	`references.bib`	Referencias BibTeX reales de OpenAlex, Semantic Scholar y arXiv — auto-depuradas para coincidir con las citas en linea
🔍	`verification_report.json`	Verificacion de integridad + relevancia de citas en 4 capas (arXiv, CrossRef, DataCite, LLM)
🧪	`experiment runs/`	Codigo generado + resultados en sandbox + metricas JSON estructuradas
📊	`charts/`	Graficos de comparacion de condiciones auto-generados con barras de error e intervalos de confianza
📝	`reviews.md`	Revision por pares multi-agente con verificacion de consistencia metodologia-evidencia
🧬	`evolution/`	Lecciones de auto-aprendizaje extraidas de cada ejecucion
📦	`deliverables/`	Todos los entregables finales en una sola carpeta — listos para compilar en Overleaf
📄	`paper_draft.md`	Article academique complet (Introduction, Travaux connexes, Methode, Experiences, Resultats, Conclusion)
📐	`paper.tex`	LaTeX pret pour les conferences (templates NeurIPS / ICLR / ICML)
📚	`references.bib`	References BibTeX reelles provenant d'OpenAlex, Semantic Scholar et arXiv — auto-elaguees pour correspondre aux citations dans le texte
🔍	`verification_report.json`	Verification d'integrite et de pertinence des citations sur 4 couches (arXiv, CrossRef, DataCite, LLM)
🧪	`experiment runs/`	Code genere + resultats sandbox + metriques JSON structurees
📊	`charts/`	Graphiques de comparaison de conditions auto-generes avec barres d'erreur et intervalles de confiance
📝	`reviews.md`	Relecture multi-agents avec verification de coherence methodologie-preuves
🧬	`evolution/`	Lecons d'auto-apprentissage extraites de chaque execution
📦	`deliverables/`	Tous les livrables finaux dans un seul dossier — pret a compiler pour Overleaf
📄	`paper_draft.md`	完全な学術論文（序論、関連研究、手法、実験、結果、結論）
📐	`paper.tex`	学会対応LaTeX（NeurIPS / ICLR / ICMLテンプレート）
📚	`references.bib`	OpenAlex、Semantic Scholar、arXivからの実際のBibTeX参考文献 — 本文中の引用に合わせて自動整理
🔍	`verification_report.json`	4層の引用整合性 + 関連性検証（arXiv、CrossRef、DataCite、LLM）
🧪	`experiment runs/`	生成されたコード + サンドボックス実行結果 + 構造化JSONメトリクス
📊	`charts/`	誤差棒と信頼区間付きの条件比較チャートを自動生成
📝	`reviews.md`	手法-証拠の一貫性チェック付きマルチエージェント査読
🧬	`evolution/`	各実行から抽出された自己学習の教訓
📦	`deliverables/`	すべての最終成果物を1フォルダに集約 — Overleafですぐにコンパイル可能
📄	`paper_draft.md`	완성된 학술 논문 (서론, 관련 연구, 방법론, 실험, 결과, 결론)
📐	`paper.tex`	학회 제출용 LaTeX (NeurIPS / ICLR / ICML 템플릿)
📚	`references.bib`	OpenAlex, Semantic Scholar, arXiv에서 가져온 실제 BibTeX 참고문헌 — 인라인 인용과 일치하도록 자동 정리
🔍	`verification_report.json`	4계층 인용 무결성 + 관련성 검증 (arXiv, CrossRef, DataCite, LLM)
🧪	`experiment runs/`	생성된 코드 + 샌드박스 결과 + 구조화된 JSON 메트릭
📊	`charts/`	오차 막대와 신뢰 구간이 포함된 자동 생성 조건 비교 차트
📝	`reviews.md`	방법론-증거 일관성 검사를 포함한 멀티 에이전트 피어 리뷰
🧬	`evolution/`	각 실행에서 추출된 자기 학습 교훈
📦	`deliverables/`	모든 최종 산출물을 하나의 폴더에 — Overleaf에 바로 컴파일 가능
📄	`paper_draft.md`	Artigo acadêmico completo (Introdução, Trabalhos Relacionados, Método, Experimentos, Resultados, Conclusão)
📐	`paper.tex`	LaTeX pronto para conferência (templates NeurIPS / ICLR / ICML)
📚	`references.bib`	Referências BibTeX reais do OpenAlex, Semantic Scholar e arXiv — auto-podadas para corresponder às citações inline
🔍	`verification_report.json`	Verificação de integridade + relevância de citações em 4 camadas (arXiv, CrossRef, DataCite, LLM)
🧪	`experiment runs/`	Código gerado + resultados do sandbox + métricas JSON estruturadas
📊	`charts/`	Gráficos de comparação de condições gerados automaticamente com barras de erro e intervalos de confiança
📝	`reviews.md`	Revisão por pares multi-agente com verificações de consistência metodologia-evidência
🧬	`evolution/`	Lições de autoaprendizagem extraídas de cada execução
📦	`deliverables/`	Todas as saídas finais em uma pasta — pronto para compilar no Overleaf
📄	`paper_draft.md`	Полная академическая статья (Введение, Обзор литературы, Метод, Эксперименты, Результаты, Заключение)
📐	`paper.tex`	Готовый LaTeX-код (шаблоны NeurIPS / ICLR / ICML)
📚	`references.bib`	Реальные BibTeX-ссылки из OpenAlex, Semantic Scholar и arXiv — автоматически отфильтрованные под цитаты в тексте
🔍	`verification_report.json`	4-уровневая проверка целостности и релевантности цитирования (arXiv, CrossRef, DataCite, LLM)
🧪	`experiment runs/`	Сгенерированный код + результаты из песочницы + структурированные JSON-метрики
📊	`charts/`	Автоматически сгенерированные графики сравнения с планками погрешностей и доверительными интервалами
📝	`reviews.md`	Мультиагентное рецензирование с проверкой согласованности методологии и результатов
🧬	`evolution/`	Уроки для самообучения, извлеченные из каждого запуска
📦	`deliverables/`	Все итоговые материалы в одной папке — готовы к загрузке в Overleaf
📋 Metric	I	II	III	IV	V	VI	VII	VIII	🏆 Total
🏷️ Domain	_Math	_Stats	_Bio	_NumLA	_NLP	_RL	_CV	_KD	8 fields
💻 Code (LOC)	10,290	10,062	9,374	14,557	2,894	2,067	2,873	2,231	54,348
⏱️ Pipeline Time	2h25m	2h56m	2h23m	2h30m	50m	6h48m	3h18m	5h48m	~27 hrs
🔗 References	26	41	29	33	60	25	40	37	291 cited
📊 Figures	5	6	6	4	7	6	7	9	50 figs
📑 Pages	16	14	18	16	17	11	10	19	121 pages