Skip to content

AQ-MedAI/MrlX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MrlX

Multi-Agent Reinforcement Learning Framework

In MrlX, Agent A and Agent B operate as independent agents, communicating via a message queue that enables cross-agent API calls, abstracts internal logic into external requests, and supports multi-turn interactions, inference result sharing, and collaborative decision-making.

At runtime, Agent A initiates multi-turn dialogue generation, while Agent B also engages in multi-turn responses. The coordination module evaluates both agents’ dialogues, calculates bilateral rewards, and drives iteration through the message queue. Each agent maintains a complete train–infer loop: the Data Buffer manages training samples, the SGLang Router schedules inference tasks, and Megatron executes model training, forming a “Generate → Train → Sync” flywheel mechanism.

Training data flows from the Data Buffer into Megatron, where updated weights are synchronized back to the inference service. This enables efficient knowledge transfer and continuous co-evolution between agents, transcending single-task limitations and allowing multi-agent systems to improve decision-making capabilities in dynamic environments.

Table of Contents

Architecture-Overview

framework

Module Description

  • training (Megatron): Responsible for the main training process; reads data from the Data Buffer, trains the model, and synchronizes updated parameters to the Rollout module
  • rollout (SGLang + router): Generates new data (including reward calculation and verification) and stores it in the data buffer
  • data buffer: Serves as a bridge between training and inference, managing prompt initialization, custom data loading, and rollout-generated content
  • custom rollout generation: Implements custom data generation logic, tailoring multi-turn interaction strategies, output formats
  • message queue: Transfers multi-turn interaction information between Agent A and Agent B, supports cross-agent API communication, task distribution, and state synchronization, and drives the iterative loop

Use-Cases

MrlX-TakesTwo

See MrlX-TakesTwo

MrlX-DeepResearch

See MrlX-DeepResearch

Acknowledgements

  • Special thanks to the following projects & communities: slime, Miles, SGLang, Megatron-LM, and others.

About

MrlX: A Multi-Agent Reinforcement Learning Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •