You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**[2026.03.06]** 📖 **Claw-R1 Documentation Released.** Project page and documentation are now available at [Claw-R1 Project Page](https://agentr1.github.io/) and [Claw-R1 docs](https://agentr1.github.io/Claw-R1/).
15
15
16
-
-**[2026.03.03]** 🚧 **Claw-R1 Project Init.** We are actively updating the framework. Stay tuned for more features and documentation.
16
+
-**[2026.03.03]** 🚧 **Claw-R1 Project Init.** We are actively developing the framework. Stay tuned for more features and documentation.
17
17
18
18
## Overview
19
19
20
-
**Agentic RL**has become the dominant approach for training powerful LLM agents. Meanwhile, **General Agents** (e.g., OpenClaw, Claude Code, Open Code, etc.) have emerged as game-changing systems that redefine what agents can do. Yet there remains critical gaps:
20
+
The **Agentic RL**ecosystem is thriving — frameworks like [verl](https://github.com/volcengine/verl), [Agent-R1](https://github.com/0russwest0/Agent-R1), and [MiniMax Forge](https://www.minimax.io/news/forge-scalable-agent-rl-framework-and-algorithm) have made remarkable progress in RL runtime and training algorithms. Meanwhile, **General Agents** (e.g., [OpenClaw](https://github.com/openclaw/openclaw), Claude Code, Open Code) are producing interaction data that is far richer and more complex than traditional ReAct trajectories.
21
21
22
-
-**General Agent for Agentic RL**: Traditional Agentic RL frameworks typically rely on simple agents like ReAct. General agents (e.g., OpenClaw, Claude Code, Open Code) offer far richer capabilities—but existing RL pipelines were not designed for them.
22
+
As agents grow more capable, a critical question emerges: **How do we systematically collect, evaluate, and curate high-quality training data from diverse agent interactions?** This is a relatively under-explored yet important direction — especially when human feedback is available as a natural quality signal.
23
23
24
-
-**Agentic RL for General Agent**: Modern base models have not been fully adapted to thrive inside general agent architectures. We aim to enable models to play a larger, more effective role within these next-generation agents.
25
-
26
-
**Claw-R1** is training framework that bridges this gap. It introduces a **Middleware Layer** (Gateway Server + DataPool) as the sole bridge between Agent Side and Training Side. Agents—white-box or black-box—access the framework via standard HTTP. This enables three modes: white-box offline, black-box offline, and black-box online service. No framework today adequately supports this paradigm—Claw-R1 is designed to fill that void.
24
+
**Claw-R1** provides the **data foundation** for Agentic RL. It introduces a Middleware Layer (Gateway + DataPool) between the Agent Side and the Training Side, focusing on data collection, evaluation, and curation rather than training algorithms themselves.
-**Asynchronous Training & Rollout**: Decouples RL training from rollout in the framework, enabling scalable and efficient data collection and model updates.
33
-
34
-
-**Agent–Training Decoupling**: Supports online-service agents where execution and training run independently. Data flows from live user requests into DataPool; the Trainer continuously fetches batches for training—no dataset required.
30
+
-**Universal Data Collection**: White-box agents submit Steps via API; black-box agents integrate by simply pointing `base_url` to the Gateway (zero code changes); online services collect data from live user interactions in real-time.
35
31
36
-
-**Zero-Code Intrusion**: Black-box agents (LangChain, AutoGen, CrewAI, etc.) integrate with zero modification—just point `base_url` to the Gateway. The framework automatically collects interaction data and trains models.
32
+
-**Data Evaluation & Curation**: Multi-dimensional reward system (rule-based / discriminative RM / generative RM), human feedback signal integration, policy version tracking for freshness-aware curation, and channel-based data partitioning.
37
33
34
+
-**Flexible Data Serving**: Pluggable `TrainingBackend` to convert curated data into any training engine's native format, with GRPO-aware grouping, train/val channel isolation, and real-time monitoring.
38
35
39
36
## Get Started
40
37
41
-
Explore our comprehensive documentation for setup, configuration, and advanced usage:
-[ ]**Data Quality Dashboard**: Visual monitoring of data quality metrics, reward distributions, and collection statistics.
45
+
-[ ]**Human Feedback Pipeline**: Structured pipeline for capturing and integrating explicit and implicit human feedback signals from online agent services.
46
+
-[ ]**Dataset Export & Versioning**: Export curated datasets with full provenance tracking for reproducibility and sharing.
@@ -52,16 +54,15 @@ Explore our comprehensive documentation for setup, configuration, and advanced u
52
54
53
55
**Affiliation**: State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
54
56
55
-
56
57
## Acknowledgements
57
58
58
-
Claw-R1 builds upon [Agent-R1](https://github.com/0russwest0/Agent-R1). We extend our gratitude to [MiniMax Forge](https://www.minimax.io/news/forge-scalable-agent-rl-framework-and-algorithm) for their architectural insights on the Middleware design, and to [rLLM](https://github.com/rllm-org/rllm) for their pioneering work on RL framework design for language agents. We also thank [OpenClaw](https://github.com/openclaw/openclaw) for their remarkable work on personal AI assistants—the modern agent paradigm that inspires our vision. We are grateful to the broader Agentic RL community and all contributors for their support.
59
+
We extend our gratitude to [Agent-R1](https://github.com/0russwest0/Agent-R1), [MiniMax Forge](https://www.minimax.io/news/forge-scalable-agent-rl-framework-and-algorithm), [verl](https://github.com/volcengine/verl), and [rLLM](https://github.com/rllm-org/rllm) for their pioneering work on Agentic RL training infrastructure. We also thank [OpenClaw](https://github.com/openclaw/openclaw) for their remarkable work on personal AI assistants. We are grateful to the broader Agentic RL community and all contributors for their support.
59
60
60
61
## Citation
61
62
62
63
```bibtex
63
64
@misc{clawr1-2026,
64
-
title={Claw-R1: Agentic RL for Modern Agents},
65
+
title={Claw-R1: The Data Foundation for Agentic Reinforcement Learning},
65
66
author={Wang, Daoyu and Ouyang, Jie and Yu, Shuo and Cheng, Mingyue and Liu, Qi},
0 commit comments