No ray, native torch async rl prototype #143

irenedea · 2025-08-08T05:11:51Z

Dummy example showing how the torch async rl would work without Ray. We use native torch async operations to communicate between the rollout and the trainer processes.

on 4 gpu interactive:

python test_no_ray.py

synchronous output (MAX_ASYNC_STEPS=0)

root@eede4788-aeb0-4d28-a785-5033baf7646e-0:/compose-rl# python test_no_ray.py 
[TRAIN] 2025-08-08 06:07:03,647: rank0[181235][MainThread]: INFO: __main__: Initializing model update process group
[TRAIN] 2025-08-08 06:07:05,820: rank0[181235][MainThread]: INFO: __main__: Starting iteration 1/2
[ROLLOUT] 2025-08-08 06:07:06,156: rank0[180919][MainThread]: INFO: __main__: Starting iteration 1/2
[ROLLOUT] 2025-08-08 06:07:06,474: rank0[180919][MainThread]: INFO: __main__: Weights are ready to update
[ROLLOUT] 2025-08-08 06:07:06,474: rank0[180919][MainThread]: INFO: __main__: Updating the model weights
[TRAIN] 2025-08-08 06:07:06,475: rank0[181235][MainThread]: INFO: __main__: Broadcasted is_ready_to_update tensor([1], device='cuda:0')
[TRAIN] 2025-08-08 06:07:06,475: rank0[181235][MainThread]: INFO: __main__: Broadcasted model weights tensor([10], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Updating the weights to tensor([10], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Generating rollouts!
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Sent experience buffer tensor([20])
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Completed iteration 1/2
[ROLLOUT] 2025-08-08 06:07:06,476: rank0[180919][MainThread]: INFO: __main__: Starting iteration 2/2
[TRAIN] 2025-08-08 06:07:06,476: rank0[181235][MainThread]: INFO: __main__: Got experience buffer tensor([20])
[rank0]:[W808 06:07:06.693400115 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
[TRAIN] 2025-08-08 06:07:06,750: rank0[181235][MainThread]: INFO: __main__: Training!
[TRAIN] 2025-08-08 06:07:26,750: rank0[181235][MainThread]: INFO: __main__: Completed iteration 1/2
[TRAIN] 2025-08-08 06:07:26,750: rank0[181235][MainThread]: INFO: __main__: Starting iteration 2/2
[TRAIN] 2025-08-08 06:07:26,751: rank0[181235][MainThread]: INFO: __main__: Broadcasted is_ready_to_update tensor([1], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:26,751: rank0[180919][MainThread]: INFO: __main__: Weights are ready to update
[ROLLOUT] 2025-08-08 06:07:26,751: rank0[180919][MainThread]: INFO: __main__: Updating the model weights
[TRAIN] 2025-08-08 06:07:26,751: rank0[181235][MainThread]: INFO: __main__: Broadcasted model weights tensor([11], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Updating the weights to tensor([11], device='cuda:0')
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Generating rollouts!
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Sent experience buffer tensor([21])
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Completed iteration 2/2
[ROLLOUT] 2025-08-08 06:07:26,753: rank0[180919][MainThread]: INFO: __main__: Waiting for the last experience buffer to be received
[TRAIN] 2025-08-08 06:07:26,753: rank0[181235][MainThread]: INFO: __main__: Got experience buffer tensor([21])
[TRAIN] 2025-08-08 06:07:26,753: rank0[181235][MainThread]: INFO: __main__: Training!
[TRAIN] 2025-08-08 06:07:46,754: rank0[181235][MainThread]: INFO: __main__: Completed iteration 2/2
[rank0]:[W808 06:07:47.533397252 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.

asynchronous output (MAX_ASYNC_STEPS=1)

root@eede4788-aeb0-4d28-a785-5033baf7646e-0:/compose-rl# python test_no_ray.py 
[TRAIN] 2025-08-08 06:08:37,387: rank0[182499][MainThread]: INFO: __main__: Initializing model update process group
[TRAIN] 2025-08-08 06:08:38,658: rank0[182499][MainThread]: INFO: __main__: Starting iteration 1/2
[ROLLOUT] 2025-08-08 06:08:38,991: rank0[182128][MainThread]: INFO: __main__: Starting iteration 1/2
[TRAIN] 2025-08-08 06:08:39,349: rank0[182499][MainThread]: INFO: __main__: Broadcasted is_ready_to_update tensor([1], device='cuda:0')
[TRAIN] 2025-08-08 06:08:39,349: rank0[182499][MainThread]: INFO: __main__: Broadcasted model weights tensor([10], device='cuda:0')
[ROLLOUT] 2025-08-08 06:08:39,349: rank0[182128][MainThread]: INFO: __main__: Weights are ready to update
[ROLLOUT] 2025-08-08 06:08:39,349: rank0[182128][MainThread]: INFO: __main__: Updating the model weights
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Updating the weights to tensor([10], device='cuda:0')
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Generating rollouts!
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Sent experience buffer tensor([20])
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Completed iteration 1/2
[ROLLOUT] 2025-08-08 06:08:39,351: rank0[182128][MainThread]: INFO: __main__: Starting iteration 2/2
[ROLLOUT] 2025-08-08 06:08:39,352: rank0[182128][MainThread]: INFO: __main__: Generating rollouts!
[ROLLOUT] 2025-08-08 06:08:39,352: rank0[182128][MainThread]: INFO: __main__: Sent experience buffer tensor([21])
[ROLLOUT] 2025-08-08 06:08:39,352: rank0[182128][MainThread]: INFO: __main__: Completed iteration 2/2
[TRAIN] 2025-08-08 06:08:39,352: rank0[182499][MainThread]: INFO: __main__: Got experience buffer tensor([20])
[ROLLOUT] 2025-08-08 06:08:39,352: rank0[182128][MainThread]: INFO: __main__: Waiting for the last experience buffer to be received
[rank0]:[W808 06:08:39.568937243 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
[TRAIN] 2025-08-08 06:08:39,616: rank0[182499][MainThread]: INFO: __main__: Training!
[TRAIN] 2025-08-08 06:08:59,617: rank0[182499][MainThread]: INFO: __main__: Completed iteration 1/2
[TRAIN] 2025-08-08 06:08:59,617: rank0[182499][MainThread]: INFO: __main__: Starting iteration 2/2
[TRAIN] 2025-08-08 06:08:59,618: rank0[182499][MainThread]: INFO: __main__: Broadcasted is_ready_to_update tensor([1], device='cuda:0')
[TRAIN] 2025-08-08 06:08:59,618: rank0[182499][MainThread]: INFO: __main__: Broadcasted model weights tensor([11], device='cuda:0')
[TRAIN] 2025-08-08 06:08:59,618: rank0[182499][MainThread]: INFO: __main__: Got experience buffer tensor([21])
[TRAIN] 2025-08-08 06:08:59,619: rank0[182499][MainThread]: INFO: __main__: Training!
[TRAIN] 2025-08-08 06:09:19,619: rank0[182499][MainThread]: INFO: __main__: Completed iteration 2/2
[rank0]:[W808 06:09:20.501714194 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.

irenedea requested review from gupta-abhay, abaheti95, jdchang1 and bowenyang008 as code owners August 8, 2025 05:11

irenedea changed the title ~~Irene/no ray~~ No ray, native toch async Aug 8, 2025

irenedea and others added 8 commits August 8, 2025 05:13

test no ray

aaee108

works well

a4dbd01

async ops

968e0f9

simplify

ec532c4

works with a certain number of iterations!

49400b4

correct training to be blocking

52d02b5

wait for the last experience buffer to be received

2240f6d

simple prototype is done and working on single node

d443a1b

irenedea force-pushed the irene/no-ray branch from 55014be to d443a1b Compare August 8, 2025 05:13

irenedea changed the title ~~No ray, native toch async~~ No ray, native torch async rl prototype Aug 8, 2025

irenedea requested a review from rithwik-db August 8, 2025 05:25

clean up comments

8d77d8f

irenedea force-pushed the irene/no-ray branch from 51a7023 to 0b73ca5 Compare August 8, 2025 05:35

update comments

8cdd5f8

irenedea force-pushed the irene/no-ray branch from 0b73ca5 to 8cdd5f8 Compare August 8, 2025 05:38

irenedea added 4 commits August 8, 2025 05:47

Add MAX_ASYNC_STEPS

b407f00

add comments

2b17f08

update

2fa1b53

add comment

bfc3877

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No ray, native torch async rl prototype #143

No ray, native torch async rl prototype #143

irenedea commented Aug 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

No ray, native torch async rl prototype #143

Are you sure you want to change the base?

No ray, native torch async rl prototype #143

Conversation

irenedea commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

irenedea commented Aug 8, 2025 •

edited

Loading