Releases · inclusionAI/AReaL

16 Apr 12:00

garrett4wade

v1.0.3

376ecbb

v1.0.3 Latest

Latest

What's Changed

chore(docker): add openclaw, ironclaw, zeroclaw, and nanobot-ai to runtime image by @garrett4wade in #1051
feat(agent-service): add Agent Service microservice infrastructure by @CormickKneey in #1048
feat(gateway): Add rollout gateway infrastructure with controller, router, and data proxy by @nuzant in #1043
feat: estimators for kl divergence by @NicolasArias in #1060
test(infra): speed up inference service integration tests by @nuzant in #1068
fix(infra): simplify RTensor serialization in data proxy by @garrett4wade in #1067
fix(rpc): resolve connection reset during RTensor fetch with large payloads by @pratyush618 in #1075
docs: add gitcgr code graph badge by @vitali87 in #1073
fix(openai): handle streaming responses in chat/completions endpoint by @Zijun9 in #1053
fix: add PIL image and processor serialization for VLM RPC by @Adiactive in #1070
refactor(api): migrate allocation_mode to per-engine backend fields by @garrett4wade in #1044
chore(agents): add Codex harness and align AI workflows by @rchardx in #1082
feat(platform): add NUMA CPU affinity binding for training engines by @HT-Yuan in #1083
feat(commands): add fork workflow support to create-pr skill by @guozhihao-224 in #1092
fix(rpc): batch HTTP RTensor fetches for large multimodal batches by @Wangxiaoxiaoa in #1077
fix(fsdp): stabilize Qwen-VL rope-index argument binding and dtype by @Adiactive in #1094
Refactor(vllm): use pause_generation from vllm instead of abort_all_req in areal_vllm_server by @HwVanICI in #1091
feat: add BailingMoeV2.5 support with Lightning Attention + MLA + MoE + CP by @dingzhiqiang in #1079
feat：support model training in IPv6-only environment by @TaoZex in #1072
fix: fix pad_packed_tensor_dict by @HKAB in #1104
feat: megatron bridge adaptation by @gursimar in #1056
fix(engine): remove duplicate trust_remote_code kwarg in MegatronBridge init by @rchardx in #1107
fix(dataloader): prevent data drop and padding during validation for accurate metrics by @Anguo-star in #1100
fix(archon): add missing POST /data/batch endpoint to data proxy by @rchardx in #1105
refactor(engine): abstract CUDA calls via current_platform in PerLayerOptimWrapper by @guozhihao-224 in #1108
perf(fsdp): pipeline distributed weight sync with a single pending bucket by @HT-Yuan in #1074
fix(engine): restore SGLang VLM training by @garrett4wade in #1098
feat(archon): add FP8 blockwise training support by @rchardx in #1087
chore(ci): update GCP CI image by @garrett4wade in #1115
feat(inference-service): complete vLLM backend support in inference service by @garrett4wade in #1112
fix(archon): harden FP8 blockwise training for TP and MoE scenarios by @rchardx in #1118
feat(inference_service): add VLM image input support to OpenAI-compatible API by @garrett4wade in #1119
feat(utils): add Trackio experiment tracking backend by @guozhihao-224 in #1113
refactor(infra): decompose rpc_server into shared guard + blueprints by @garrett4wade in #1126
refactor(agents): redesign review-pr taxonomy and sync flow by @rchardx in #1124
feat(infra): add client-side fetch buffer for RTensor by @guozhihao-224 in #1122
chore: fix gcp image to latest by @nuzant in #1130
docs: add Trackio configuration to CLI reference by @rchardx in #1131
feat(service): support online inference service by @nuzant in #1121
fix(engine): for broken tree training due to bad indent in PR #1056 by @gursimar in #1135
feat(service): add vllm backend support for inference service demo by @nuzant in #1136
fix(api): add mode validation for WandBConfig and SwanlabConfig by @guozhihao-224 in #1134
feat: enable LoRA RL-training in Megatron via megatron-bridge by @gursimar in #1123
fix: harden padded distributed eval across training engines by @rchardx in #1109
feat(ci): separate vllm and sglang pyproject.toml by @garrett4wade in #1141
fix(vllm_ext): clear multimodal caches after generation pause by @Adiactive in #1144
fix(ci): sync uv.vllm.lock with the current pyproject.vllm.toml by @garrett4wade in #1146
fix(vllm_ext): XCCL lora weights update when PP>1 by buffering and merging PP shards by @gursimar in #1145
chore: fix pre-commit by @garrett4wade in #1148
Fix #1040: [Feature] Fixed bugs in Archon LoRA Backend by @JiwaniZakir in #1139
feat(infra): add distributed data loading service by @garrett4wade in #1120
refactor(infra): standardize list-first trajectory batch dispatch by @garrett4wade in #1150
feat(infra): allow colocation with offloading and disk weight updates by @garrett4wade in #1157
refactor: replace manual JSON parsing with Pydantic models by @koladefaj in #1154
fix(engine): FSDP compute_logp fails for Qwen3.5 with dict attention_mask by @pratyush618 in #1153
chore: update readme and enforce license by @garrett4wade in #1170
chore: ensure SPDX license header in python source files by @garrett4wade in #1171
fix: add missing pre-commit check file by @garrett4wade in #1173
chore: add project governance for PyTorch ecosystem by @garrett4wade in #1174
feat(infra): add microservice-based training service (controller v2) by @garrett4wade in #1169
chore: renew qrcode by @garrett4wade in #1184
feat(archon): support multi-node inference in gateway controller by @guozhihao-224 in #1178
feat(agent-service): add Controller, Guard, and Claude Agent SDK example by @CormickKneey in #1177
fix _update_weights_from_disk function to prevent training to be stuck by @asif07hossain in #1181
refactor: mount data blueprint via WSGI and adopt Pydantic in engine blueprint by @koladefaj in #1179
fix(engine): use meta device for non-rank-0 in FSDP memory_efficient_load by @yulangz in #1182
ci: parallelize unit and integration tests across 4 GPU instances by @nuzant in #1185
chore: dump v1.0.3 by @garrett4wade in #1191

New Contributors

@pratyush618 made their first contribution in #1075
@vitali87 made their first contribution in #1073
@Adiactive made their first contribution in #1070
@guozhihao-224 made their first contribution in #1092
@TaoZex made their first contribution in #1072
@HKAB made their first contribution in #1104
@Anguo-star made their first contribution in #1100
@JiwaniZakir made their first contribution in #1139
@koladefaj made their first contribution in #1154
@asif07hossain made their first contribution in #1181

Full Changelog: v1.0.2...v1.0.3

Contributors

rchardx, gursimar, and 20 other contributors

Assets 2

17 Mar 14:19

garrett4wade

v1.0.2

9ca680b

v1.0.2

Release Note

A massive thank you to our newest contributors who joined us for this release! The strength of this project lies in the collective expertise of the open-source community, and your work is what moves us forward.

🚀 Model & Architecture Updates

Qwen3.5 Support: Added support for both dense and MoE (Mixture-of-Experts) variants of Qwen3.5 (archon backend, DP-only).
On-Policy Distillation: Introduced native support for on-policy distillation.
Added opt-in support for Hugging Face kernels and per-layer optimizer steps with a streaming H2D/D2H pipeline for FSDP.

🛠 Infrastructure & Scalability

Docker & Runtime: Split Docker images into specialized sglang and vllm variants to allow different torch verions and fast docker image update.

📖 Documentation & Localization

Bilingual Support: Launched comprehensive bilingual (EN/ZH) documentation, including a new translate-doc-zh command and fixed LaTeX rendering.
New Guides: Added an online proxy mode training guide and revised existing online RL training tutorials.

What's Changed

feat(models): shard vision encoder across Ulysses SP ranks by @aoshen524 in #929
docs: fix broken documentation links by adding /en/ language prefix by @garrett4wade in #986
[Feat] Add on-policy distillation support by @HwVanICI in #964
fix: keep PPO token stats consistent under context parallelism by @yash27-lab in #990
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #993
Upgrade GitHub Actions to latest versions by @salmanmkc in #994
chore: add Python 3.11 support (requires-python >=3.11,<3.13) by @garrett4wade in #991
chore(ci): replace format-check with pre-commit CI and add commit-msg hooks by @rchardx in #998
feat(docs): add bilingual documentation with translate-doc-zh command by @ZiyiTsang in #995
refactor: flatten sub-module imports to use parent package re-exports by @NJX-njx in #996
feat(infra): split Docker image into sglang and vllm variants by @garrett4wade in #985
docs: fix broken LaTeX rendering in bilingual docs by @rchardx in #1004
fix: CPU-only support on macOS by @zhanghaotong in #1003
feat(archon): add moe_router_dtype config for FP32 router gate GEMM by @rchardx in #1009
refactor: replace string literals with enums and fix logging issues- … by @HT-Yuan in #1008
feat(fsdp): add per-layer optimizer step with streaming H2D/D2H pipeline by @aoshen524 in #983
Update NPU doc for v1.0.1 release by @HwVanICI in #1022
feat(archon): add Qwen3.5 dense and MoE model support (DP-only) by @rchardx in #1012
fix: LoRA and XCCL openai_serving_models LoRA versioning bug by @TinLongYu in #1021
fix: unify RPC error response JSON key to "error" across server and s… by @HT-Yuan in #1019
fix: close sockets on bind failure, fix exit traceback in trainers by @mango766 in #1032
fix(vllm): harden runtime LoRA alias handling for XCCL updates by @Wangxiaoxiaoa in #1039
fix(ci): minor fixing ruff format by @garrett4wade in #1041
docs(github): improve PR template checklist and type-of-change section by @garrett4wade in #1042
docs: add online proxy mode training guide by @Zijun9 in #1006
Add opt-in support for Hugging Face kernels by @lewtun in #1033
fix(archon): Wrap router gate in nn.Module for DTensor hook compatibility by @fishcrap in #1029
chore(deps): bump sglang, vllm, megatron-core and restructure Dockerfile by @garrett4wade in #1010
ci(infra): add GCP image baking workflow for CI acceleration by @garrett4wade in #1045
refactor(infra): simplify RTensor to single-shard and adopt per-trajectory list pipeline by @fishcrap in #1017
ci(infra): add GCP image baking workflow and update CI image by @garrett4wade in #1047
docs: revise online RL training tutorial for EN and ZH by @garrett4wade in #1049
chore: prepare v1.0.2 release by @garrett4wade in #1050

New Contributors

@aoshen524 made their first contribution in #929
@yash27-lab made their first contribution in #990
@salmanmkc made their first contribution in #993
@NJX-njx made their first contribution in #996
@zhanghaotong made their first contribution in #1003
@HT-Yuan made their first contribution in #1008
@mango766 made their first contribution in #1032
@Wangxiaoxiaoa made their first contribution in #1039
@Zijun9 made their first contribution in #1006
@lewtun made their first contribution in #1033

Full Changelog: v1.0.1...v1.0.2

Contributors

rchardx, zhanghaotong, and 14 other contributors

Assets 2

04 Mar 14:18

garrett4wade

v1.0.1

6bc0830

v1.0.1

Release Note

A patch release that fixes a dependency issue in the docker image and enriches the documentation and testing of the OpenClaw example.

What's Changed

fix(config): Fix openclaw config typo and increase max_tokens_per_mb by @fishcrap in #959
docs(openclaw): Replace hardcoded admin key with placeholder in README by @fishcrap in #967
feat: Fully Support MIS/TIS to stablizing rollout-training mismatch by @ZiyiTsang in #930
refactor(api): move validation into config post_init methods by @rchardx in #970
fix(openai-proxy): return None for empty trajectory in online mode by @fishcrap in #971
Ray placement group refactor and preliminary architecture for multinode inference instances by @hlyli in #966
feat:Add chinese doc by @ZiyiTsang in #969
fix(api): replace Literal type with str for SchedulingSpec.ray_placement_strategy by @garrett4wade in #976
update readme by @xssstory in #974
test(examples): Add OpenClaw online RL integration test by @fishcrap in #977
fix: pinning torchao version to 0.15.0 by @garrett4wade in #981
bump v1.0.1 by @garrett4wade in #982

New Contributors

@hlyli made their first contribution in #966

Full Changelog: v1.0.0...v1.0.1

Contributors

rchardx, hlyli, and 4 other contributors

Assets 2

02 Mar 15:04

garrett4wade

v1.0.0

99ce534

v1.0.0

🚀 Key Highlights

Release Notes

Online RL Training

Seamlessly train any agents by configuring a base_url and api_key—no code changes required and no heavy dependencies.
Check out the OpenClaw RL training example for more details.

Archon Engine

A fully working, PyTorch-native 5D parallel training engine.
Includes features like:
- Automatic HF format conversion
- Zero-bubble pipelining
- torch.compile
- FSDP (Fully Sharded Data Parallel)
- Selective activation support

AI-Assisted Coding

Official commands and skills to streamline development and enable easy customization.

Infrastructure Upgrade

Transition from the previous SPMD architecture to a more efficient single-controller architecture.

uv Installation Support

Easily set up training environments by running the simple command: uv sync.

What's Changed

feat: replace legacy math parsing with math-verify by @rchardx in #739
Add installation instructions for Ascend NPU by @HwVanICI in #748
[Bug Fix] Fix Tools compatibility, max_token restrictions, and EOS token issues in Proxy mode by @yulangz in #736
refactor: modify engine and controllers to support the single-controller mode with the same trainer by @garrett4wade in #753
VLM Training on NPU by @HwVanICI in #746
refactor: move device utilities to platform classes and io_struct by @garrett4wade in #757
Fix: Implement get_device_stats() for train_controller by @HwVanICI in #762
refactor: single-source task_id generation in submit methods by @garrett4wade in #759
[Bug Fix] Camel example with wrong and missing agent arguments by @HwVanICI in #766
feat: use name_resolve for worker discovery and fix perf_tracer in the single-controller mode by @garrett4wade in #764
[Feature] Implement Single-Controller XCCL Weight Update by @HwVanICI in #754
feat: Implement slurm scheduler by @garrett4wade in #767
Ray Scheduler Implementation for Single Controller by @HwVanICI in #741
refactor: use callbacks to implement xccl weigh transfer and avoid busy waiting during rollout by @garrett4wade in #769
[Testing] Update GCP image to accelerate CI testing by @nuzant in #772
refactor: unifying launcher, scheduling spec, yaml configs, and training scripts by @garrett4wade in #770
feat: improve logging by @garrett4wade in #771
refactor: separate megatron imports and installation from FSDP by @garrett4wade in #773
minor fix: vLLM LoRA request cleanup for issue #751 by @TinLongYu in #765
fix: refactoring proximal logp recompute condition by @garrett4wade in #780
[Feat] Add FP8 training support by @fishcrap in #758
Enhance host IP detection in areal.utils.network by @HwVanICI in #778
chore: remove the ad-hoc should_broadcast parameter in rpc servers by @garrett4wade in #774
feat: support colocated engines in the single-controller mode by @garrett4wade in #779
chore: update readme by @garrett4wade in #782
docs: restructure AGENTS.md and add CLAUDE.md symlink by @rchardx in #783
chore: Expose error when lauch sglang server by @ZiyiTsang in #781
refactor: simplifying the implementation of customized workflow with context management by @garrett4wade in #785
chore: Expose error when launching vllm server by @garrett4wade in #790
refactor: allow dynamic batch size without the dynamic_filtering function by @garrett4wade in #786
fix: fix ray scheduler in the single-controller mode by @garrett4wade in #791
fix inference engine addr resolving logic by @garrett4wade in #792
chore: minor fix doc formula by @ZiyiTsang in #793
doc: update docs for grpo and related algorithms by @garrett4wade in #794
refactor: migrate grouped rollout from customized workflows to inference engines by @garrett4wade in #789
Single-controller LoRA RL fine-tuning with vLLM by @gursimar in #735
[Feature] Group-level data redistribution by @nuzant in #800
critical fix: passing is_eval and group_size from rollout controller to engines by @garrett4wade in #801
Update NPU doc by @HwVanICI in #803
feat: add Archon Engine - PyTorch native FSDP2 training backend by @rchardx in #799
[Feature] Tree training support (Megatron Engine) for agentic RL training by @nuzant in #804
Add NPU RLVR example by @HwVanICI in #798
[Bug Fix] XCCL weight synchronization fix for the single controller lora by @gursimar in #796
[Bug Fix] Fix import error introduced by tree training PR by @nuzant in #808
chore: remove legacy code, config, and documentation by @garrett4wade in #806
fix: update tree_attn function name to patch_bridge_for_tree_training by @rchardx in #809
fix: prevent fake PID killing in LocalScheduler tests by @rchardx in #810
feat(archon): add torch.compile support and profiling tools by @rchardx in #807
Add RayScheduler to sft.py by @HwVanICI in #814
fix: add lm_head.weight into index when index file exists by @jwhj in #816
feat: use subprocess to fork colocated workers by @garrett4wade in #815
feat(archon): add Context Parallelism (Ulysses SP) support by @rchardx in #817
refactor(data): simplify pad_mb_list alignment parameters by @rchardx in #820
refactor: unify HTTP client management in workflow_context by @garrett4wade in #819
feat(archon): enable TP + AC + compile compatibility with _WaitAsyncWrapper by @rchardx in #821
[FEAT] Add direct TE FP8-PyTorch FP8 conversion by @fishcrap in #802
refactor(core): simplify HTTP client lifecycle with event loop cleanup by @garrett4wade in #823
feat: Add AgentWorkflow API and migrate workflow resolution to RemoteInfEngine by @garrett4wade in #825
feat(scheduler): refactor fork_workers to public API with custom command support by @garrett4wade in #826
[FIX] correct vLLM config defaults for chunked prefill and prefix caching by @fishcrap in #827
refactor(openai): modularize proxy architecture and add inline mode by @garrett4wade in #829
chore(doc): update readme by @garrett4wade in #830
fix(test): Fix math-verify tests by @garrett4wade in #831
feat(archon): add Expert Parallelism (EP) support for MoE models by @rchardx in #833
fix(moe): correct histc max param by @rchardx in #835
feat(archon): add explicit FSDP prefetching for EP by @rchardx in #834
feat(archon): add EP-aware padding wrapper for MoE grouped_mm by @rchardx in #836
testing: fix CI, skip tests that cannot run on A100 GPUs by @nuzant in #838
feat(archon): add Expert Tensor Parallelism (ETP) support for MoE models by @rchardx in #839
feat: support tree training for FSDP engine by @nuzant in #837
fix: remove duplicate setup in gsm8k_rl by @v3nividiv1ci in #842
refactor(archon): cleanup parallel dims and FSDP config for pipeline parallelism by @rchardx in #841
refactor(tree_attn): decouple FSDP and Megatron implementations by @rchardx in https://github.com/inclusionAI/AReaL/pu...

Contributors

rchardx, gursimar, and 13 other contributors

Assets 2

06 Feb 13:59

garrett4wade

v1.0.0.rc1

dbc4938

v1.0.0.rc1 Pre-release

Pre-release

Pre-release for 1.0.0.

Assets 2

31 Jan 02:06

garrett4wade

v0.5.3

7cdef3c

v0.5.3 Pre-release

Pre-release

Highlights

This is a patch release primarily for delivering the latest docker image for testing.

We will include well-documented features in the next major release.

Assets 2

26 Jan 11:18

garrett4wade

v0.5.2

88a6020

v0.5.2 Pre-release

Pre-release

Highlights

This is a patch release primarily for delivering the latest docker image with torch 2.9.1, vllm 0.14.0, and sglang 0.5.7 supports.

We will include well-documented features in the next major release.

Assets 2

18 Dec 09:03

garrett4wade

v0.5.1

636ef26

v0.5.1

Highlights

This is a patched release upon v0.5.0.

A new docker image with math-verify and the latest ruff.
Support for PPO critic model support with Megatron engine.
Refactored FSDP/Megatron engine implementations.
Implement efficient RPC tensor transfer with RTensor (aka the original DistributedBatch).
Beam seach support for vLLM.

What's Changed

fix: change checkpoint cleanup flag to fix update_weights_from_disk in single-controller mode by @HwVanICI in #711
fix: prevent port overflow in vLLM server with high data parallelism (fixes #652) by @HsiaoTsan in #653
refactor: refactor train engine high level APIs by @aaaandychen in #658
[Fix] Fix the bug that experiments cannot properly exit in the TIR example by @nuzant in #712
chore: print more information in concat mode and handle empty tool calls for easy debugging by @nuzant in #713
chore: trim tests in CI by @garrett4wade in #714
refactor: enforce task_id creation, access, and manipulation in inference engines by @garrett4wade in #715
refactor: redesign TrainEngine API with cleaner abstractions by @rchardx in #719
[Testing] Add SFT/GRPO integration test for Megatron train engine. by @nuzant in #726
[FEAT] VLLM support for VLM training by @HwVanICI in #698
feat: Support beam_search in vllm backend by @ZiyiTsang in #721
fix: update multi-turn math test configuration by @rchardx in #727
fix: fix logic error in beam search support check by @rchardx in #728
feat: add PPO Critic model support for MegatronEngine by @rchardx in #729
feat: implement RTensor for metadata transfer in the single-controller mode by @garrett4wade in #731
fix: fix multi-turn proxy example by @dhh1995 in #733
minor fix: fix openai cache test, add it in CI test suite, and remove OOD todos/fixmes in Megatron engine by @garrett4wade in #732
[Feat] XCCL-updates for single LoRA functionality for ascend-vLLM by @gursimar in #679
fix: use group_size=1 for eval in proxy examples by @dhh1995 in #737
feat: add ignore_eos and skip_special_tokens generation params by @rchardx in #738
chore: update datasets to version 3.0.0 or higher for inner API compatibility by @ZiyiTsang in #720
feat: build the docker image with math-verify and the latest ruff by @garrett4wade in #744
bump v0.5.1 by @garrett4wade in #745

New Contributors

@HsiaoTsan made their first contribution in #653
@aaaandychen made their first contribution in #658
@gursimar made their first contribution in #679

Full Changelog: v0.5.0...v0.5.1

Contributors

rchardx, gursimar, and 7 other contributors

Assets 2

10 Dec 15:22

garrett4wade

v0.5.0

3b9eb54

v0.5.0

Highlights

The newly released v0.5.0 of AReaL introduces two core innovations: Seamless Agentic RL and the Single Controller architecture:

Seamless Agentic RL: AReaL provides a seamless intelligent agent training service via OpenAI-compatible APIs. This facilitates seamless collaboration among environment providers, algorithm developers, and system engineers, forming a zero-friction pipeline in complex engineering workflows and significantly boosting development efficiency and system maintainability.
Single Controller Architecture: Eliminates long-tail latency and data imbalance issues inherent in SPMD (Single Program, Multiple Data) models. This layered design enhances inference scalability, enables fine-grained system-level control, and preserves algorithmic flexibility while minimizing code migration costs for algorithm developers.

Other changes include:

Performance & Scalability: Major refactoring to streamline step detection, assignment logic, and workflow batching. Improved distributed training with fixes for NCCL timeouts, Gloo group barriers, and vocab-parallel logprobs for FSDP.
Model & Hardware Support: Added single LoRA functionality for Ascend-vLLM and improved handling for Vision-Language Models (VLMs).
Fixes & Refinements: Resolved numerous bugs related to data loading, reward timeouts, interaction caching, process cleanup, and tool call parsing. Significant code refactoring to merge duplicate logic, improve type hints, and centralize asset management. Project-wide code formatting switch to ruff.

Future Work

AReaL currently supports the basic Single Controller mode and Agentic RL training pipeline. Future enhancements include:

Optimized data flow and distributed launch capabilities under Single Controller mode;
Automatic scaling, fault recovery, and high-availability training;
Improved training-inference performance in agent-centric scenarios.

What's Changed

update readme for qwen3-vl by @garrett4wade in #578
[FIX] add recipe directory to pre-commit checks by @fishcrap in #580
[FIX] reduce reward timeout warning by @fishcrap in #579
[FIX] fix compute logp temperature by @fishcrap in #581
feat: rebuild step detection around global batches by @rchardx in #583
chore: extend wait timeout and hardens config checks by @rchardx in #585
feat: streamline step assignment logic by @rchardx in #584
fix: Use background threads to commit tasks and fetch results in workflow executor by @garrett4wade in #587
fix: reuse aiohttp.ClientSession in agenerate by @garrett4wade in #589
chore: automates session tracing context by @rchardx in #591
[feat] add Serializer for rpc server by @CormickKneey in #566
doc: improve tracer documentation with custom phase support and improved plotting by @rchardx in #594
[feature] Support concat export completions in proxy mode by @yulangz in #582
Fix trainer to use backend information from allocation mode by @dhh1995 in #596
fix: fix the stucking issue of rollout_batch by @garrett4wade in #595
fix: extends NCCL group timeout coverage by @rchardx in #598
chore: use typevar to type hint loaded config by @dhh1995 in #603
fix: safely close all ClientSessions with ContextVar by @garrett4wade in #605
chore: remove requirements.txt by @garrett4wade in #604
[Feat] Add train/rollout offload support by @fishcrap in #590
[FIx] Use gloo group barriers for distributed synchronization by @fishcrap in #607
feat: adds scheduled profiler tracing by @rchardx in #608
refactor: let WorkflowExecutor.wait return a list with None by @garrett4wade in #612
[feat] add local scheduler for single controller mode by @daihaowz in #610
refactor: separate BatchTaskDispatcher from WorkflowExecutor by @garrett4wade in #613
chore: upload paper to the repo by @garrett4wade in #616
chore: clarifies agent onboarding guide by @rchardx in #617
refactor: improves async coordination by @rchardx in #618
[FIX] fix enable_offload break change and add offload/onload API by @fishcrap in #625
refact: update gconfig to update stop token ids in workflows instead of in example scripts by @dhh1995 in #626
chore: improve workflow batching safeguards by @rchardx in #624
chore: ensures worker threads exit cleanly by @rchardx in #630
bug fix: correctly shuffling data with distributed sampler by @garrett4wade in #632
rename CompletionCache to InteractionCache by @dhh1995 in #631
refactor: merge base_hf_engine with fsdp_engine for code cleanup by @garrett4wade in #629
chore: format all files under areal/utils with ruff by @garrett4wade in #635
chore: format all tests with ruff by @garrett4wade in #636
chore: format remaining files under areal/ with ruff by @garrett4wade in #637
ci: update ci formatter to ruff by @garrett4wade in #638
chore: tunes NCCL IB settings by @rchardx in #640
[feat] implement train controller for single controller by @daihaowz in #614
fix: modify the default value of "shuffle" and "drop_last" for validation datasets by @garrett4wade in #633
[Feat] Single LoRA functionality for ascend-vLLM by @HwVanICI in #621
fix: prevent zombie vLLM processes when Ray launcher kills tasks by @zhshgmail in #623
refactor: add export_stats as engine's method by @garrett4wade in #643
[feat] impl rollout controller for single controller by @dingzhiqiang in #611
feat: implement proximal log-probability approximation for decoupled PPO by @zhshgmail in #600
fix: fixes CLI docs import order by @rchardx in #646
refactor: refines PPO/GRPO loss by @rchardx in #650
refactor: merge duplicate process termination functions into unified kill_process_tree by @garrett4wade in #648
feat: simplify openAI agent integration and allow training with any customized agent by @garrett4wade in #657
fix: tear down local inference servers when caling destroy by @garrett4wade in #659
fix: vlm input slicing by @HwVanICI in #651
refactor: move logprob and value computation into TrainEngine by @rchardx in #663
fix: fix drop last for data loader with distributed sampler by @dhh1995 in #665
[FIX] Initialize llm_addrs in Slurm launcher for SFT jobs by @fishcrap in #662
refactor: apply PPOTrainer and SFTTrainer in example scripts by @garrett4wade in #660
feature: implement vocab-parallel logprobs for FSDP by @rchardx in #667
refact: expose workflow executor in inference engine by @dhh1995 in #676
fix: raise AttributeError instead of returning None in Platform.getattr by @rchardx in #672
fix: add missing device_control_env_var to CpuPlatform by @rchardx in #681
fix: override workflow_executor property in MockInferenceEngine by @rchardx in #682
refactor: make processing multi_modal_input generic by @HwVanICI in #678
refactor: refactor attention mask generation logic for clarity by @rchardx in #685
[Feat] Implement GRPO trainer and weight exchange for single-controller mode by @dingzhiqiang in #666
refact: rename set_final_reward to set_last_reward, also fix openai gen args by excluding lora_name by @dhh1995 in #675
fix: fix CPU offloading in FSDP grad clipping and weight updates by @rchardx in https://github.com/inclus...

Contributors

rchardx, jwhj, and 11 other contributors

Assets 2

14 Nov 05:38

garrett4wade

v0.4.1

a3c5ac2

v0.4.1

What's Changed

feat: add raise_timeout parameter to allow quiet waiting for inference results by @garrett4wade in #547
Fix batch size in example examples/vlm/clevr_count_70k_grpo.yaml by @wangruohui in #549
chore: format dataset and reward folders with ruff by @garrett4wade in #551
refactor: rename the should_accept argument in rollout/prepare_batch to should_accept_fn by @garrett4wade in #555
chore: delete not-planned experimental features by @garrett4wade in #554
feat: add grpo trainer and simplify gsm8k grpo example by @dhh1995 in #552
feat: add launch_server and teardown_server in inference engine api by @garrett4wade in #550
[Refactor] refactor stats_tracker usage in engines and examples by @nuzant in #556
refactor: allow passing string paths and init kwargs as rollout workflows by @garrett4wade in #525
feat: introduces session-centric tracing APIs by @rchardx in #539
doc: Add notes about asynchronous RL training by @garrett4wade in #558
format: ruff format examples directory by @fishcrap in #559
feat: support proxy server and client for training openai-compatible agents by @dhh1995 in #500
chore: change type annotations and minor fixes for single-controller mode by @garrett4wade in #560
docs: add "Performance Profiling" guide to best practices by @rchardx in #538
add README for proxy_agent by @yulangz in #561
chore: extends engine perf instrumentation by @rchardx in #562
[FEAT] add pause/resume generation for vLLM server by @fishcrap in #563
doc: update AReaL design doc with the current dev status by @garrett4wade in #568
doc: update documentation to align the current dev status by @garrett4wade in #570
refactor: extend allocation mode to support allocation naming and composition by @garrett4wade in #565
feat: align perf_tracer with task hierarchy by @rchardx in #569
chore: add hint for the breaking change of allocation mode by @garrett4wade in #572
[FIX] fix atrace_session_phase in workflow by @fishcrap in #573
chore: Quick fix for GSPO missing in doc by @ZiyiTsang in #576
ci: build docker images with GCP by @garrett4wade in #564
refactor: restrict the usage scope of the rollout_batch method by @garrett4wade in #567
chore: add issue template for questsions by @garrett4wade in #571
ci: automatically tag the dev image upon new releases by @garrett4wade in #574
chore: remove the old script used for validating installation by @garrett4wade in #575
[FEAT] Add Qwen3-VL model support for fsdp by @fishcrap in #557
bump v0.4.1 by @garrett4wade in #577

New Contributors

@wangruohui made their first contribution in #549

Full Changelog: v0.4.0...v0.4.1

Contributors

rchardx, dhh1995, and 6 other contributors

Assets 2

Releases: inclusionAI/AReaL

v1.0.3

What's Changed

New Contributors

Contributors

Uh oh!

v1.0.2

Release Note

🚀 Model & Architecture Updates

🛠 Infrastructure & Scalability

📖 Documentation & Localization

What's Changed

New Contributors

Contributors

Uh oh!

v1.0.1

Release Note

What's Changed

New Contributors

Contributors

Uh oh!

v1.0.0

🚀 Key Highlights

Release Notes

Online RL Training

Archon Engine

AI-Assisted Coding

Infrastructure Upgrade

uv Installation Support

What's Changed

Contributors

Uh oh!

v1.0.0.rc1

Uh oh!

v0.5.3

Highlights

Uh oh!

v0.5.2

Highlights

Uh oh!

v0.5.1

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.0

Highlights

Future Work

What's Changed

Contributors

Uh oh!

v0.4.1

What's Changed

New Contributors

Contributors

Uh oh!